Regular Expressions in Bash with =~
Bash's =~ operator matches ERE regular expressions and captures groups into BASH_REMATCH. Use it for validation, parsing, and conditional logic without spawning subshells.
Bash's [[ str =~ pattern ]] operator matches Extended Regular Expressions (ERE) without launching an external grep or sed — it's fast and captures groups into the BASH_REMATCH array. This makes it ideal for input validation, parsing structured strings, and conditional logic.
Basic =~ Matching
str="Hello, World 2026!"
# Check if string matches a pattern
if [[ "$str" =~ World ]]; then
echo "Contains 'World'"
fi
# Match a number
if [[ "$str" =~ [0-9]+ ]]; then
echo "Contains a number: ${BASH_REMATCH[0]}" # 2026
fi
# Anchor to start and end
if [[ "hello" =~ ^[a-z]+$ ]]; then
echo "All lowercase"
fiBASH_REMATCH — Capture Groups
date_str="2026-01-31"
# Capture groups with ()
if [[ "$date_str" =~ ^([0-9]{4})-([0-9]{2})-([0-9]{2})$ ]]; then
echo "Full match: ${BASH_REMATCH[0]}" # 2026-01-31
echo "Year: ${BASH_REMATCH[1]}" # 2026
echo "Month: ${BASH_REMATCH[2]}" # 01
echo "Day: ${BASH_REMATCH[3]}" # 31
else
echo "Invalid date format"
fiStore regex in a variable
Don't quote the regex in =~ — quotes make it a literal string match.
pattern='^[0-9]+$' # ✅ Store in variable — not quoted
[[ "$input" =~ $pattern ]] # ✅ No quotes on variable in the [[ ]]
[[ "$input" =~ '^[0-9]+$' ]] # ❌ Quotes make it literal — doesn't work as regex
Storing the regex in a variable also makes complex patterns cleaner to read.
ERE Quick Reference
Common ERE elements
| Pattern | Matches |
|---|---|
. | Any single character |
* | Zero or more of preceding |
+ | One or more of preceding |
? | Zero or one of preceding |
^str | Anchored to start |
str$ | Anchored to end |
[abc] | a, b, or c |
[^abc] | Not a, b, or c |
[a-z] | a through z |
(pat) | Capture group |
| `pat1 | pat2` |
{n,m} | Between n and m repetitions |
[[:alpha:]] | POSIX: any letter |
[[:digit:]] | POSIX: any digit |
[[:space:]] | POSIX: any whitespace |
Practical Validation Examples
#!/usr/bin/env bash
function validate_email {
local email="$1"
local pattern='^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,}$'
[[ "$email" =~ $pattern ]]
}
function validate_ip {
local ip="$1"
local pattern='^([0-9]{1,3}).([0-9]{1,3}).([0-9]{1,3}).([0-9]{1,3})$'
if [[ "$ip" =~ $pattern ]]; then
for octet in "${BASH_REMATCH[@]:1}"; do # skip [0] (full match)
(( octet >= 0 && octet <= 255 )) || return 1
done
return 0
fi
return 1
}
function is_integer {
[[ "$1" =~ ^-?[0-9]+$ ]]
}
# Test them
validate_email "alice@example.com" && echo "Valid email"
validate_ip "192.168.1.1" && echo "Valid IP"
is_integer "-42" && echo "Is integer"After `[[ "foo123bar" =~ ([0-9]+) ]]`, what is `${BASH_REMATCH[0]}` and `${BASH_REMATCH[1]}`?
Write a function parse_connection_string that:
- Takes a connection string like
postgres://alice:secret@db.example.com:5432/mydb - Uses
=~with capture groups to extract: user, password, host, port, database - Prints each component on its own line
- Returns 1 if the string doesn't match the expected format