UTILYARD
Esta guía solo está disponible en inglés.
guides

Regex Tutorial

A practical introduction to regular expressions — from basic matching to groups, flags, and real-world patterns.

What is a regular expression?

A regular expression (regex, or regexp) is a sequence of characters that defines a search pattern. Given a pattern and a string, a regex engine can find matches, validate format, split text, or drive search-and-replace operations — all without writing a loop. They are supported in virtually every programming language and appear constantly in tasks like form validation, log parsing, and code search.

The concept originates in formal language theory — Stephen Kleene described regular languages in the 1950s, and Ken Thompson later implemented them in the Unix editor ed. From there they spread to grep, sed, awk, Perl, and eventually every modern language. The syntax has been refined over decades, but the core ideas have stayed the same.

Basic matching

The simplest regex is plain text. The pattern cat matches the literal characters "c", "a", "t" in sequence. It would find a match inside "concatenate" because that string contains "cat" as a substring. Matching is case-sensitive by default, so it would not match "Cat" or "CAT".

In JavaScript, you write a regex literal between forward slashes. The three most common methods are:

const str = 'The cat sat on the mat'

// Test for a match — returns true or false
/cat/.test(str)           // true

// Find the first match — returns a match object or null
str.match(/cat/)          // ['cat', index: 4, ...]

// Replace all matches (with the g flag)
str.replace(/at/g, 'ot')  // 'The cot sot on the mot'

The dot wildcard

The . (dot) is the most basic special character. It matches any single character except a newline. The pattern /c.t/ matches "cat", "cut", "c9t", "c@t" — anything that starts with "c", has one character in the middle, and ends with "t".

To match a literal dot (for example in a domain name like "example.com"), you must escape it: \.. Without the backslash, the dot is a wildcard and would match any character in that position.

/c.t/.test('cat')   // true
/c.t/.test('cut')   // true
/c.t/.test('c9t')   // true
/c.t/.test('ct')    // false — needs exactly one middle char

// Matching a literal dot
/3.14/.test('3.14')   // true
/3.14/.test('3x14')   // false

Character classes [ ]

Square brackets define a character class: a set of characters where the pattern matches any one member of that set. Unlike the dot, a character class gives you precise control over which characters are allowed.

// Match any vowel
/[aeiou]/.test('hello')     // true (matches 'e')

// Range syntax — any lowercase letter
/[a-z]/.test('m')           // true

// Combine ranges
/[a-zA-Z0-9]/.test('Z')    // true

// Negation — ^ inside brackets means "NOT"
/[^aeiou]/.test('b')        // true — 'b' is not a vowel
/[^aeiou]/.test('a')        // false — 'a' is a vowel

// Match only hex digits
/[0-9a-fA-F]/

Note that most special characters lose their special meaning inside square brackets. A dot inside [.] matches a literal dot, not any character. The exceptions are ^ (negation at start), - (range), ] (close bracket), and \ (escape).

Shorthand character classes

Common character classes appear so often that most regex flavors provide shorthand notation. These can be used both on their own and inside square brackets.

ShorthandMeaningEquivalent class
\dDigit[0-9]
\DNon-digit[^0-9]
\wWord character[a-zA-Z0-9_]
\WNon-word character[^a-zA-Z0-9_]
\sWhitespace[ \t\n\r\f]
\SNon-whitespace[^ \t\n\r\f]
// Match a digit
/d/.test('abc3')     // true

// Match a word character
/w+/.test('hello')   // true

// Combine shorthand inside brackets
/[ds]/.test(' ')    // true — matches whitespace or digit

Quantifiers

A quantifier controls how many times the preceding character, class, or group must appear. Without a quantifier, each element in a pattern matches exactly once.

// * — 0 or more
/go*gle/.test('ggle')    // true (zero o's)
/go*gle/.test('google')  // true (two o's)

// + — 1 or more (requires at least one)
/go+gle/.test('ggle')    // false (needs at least one o)
/go+gle/.test('google')  // true

// ? — 0 or 1 (makes a character optional)
/colou?r/.test('color')   // true
/colou?r/.test('colour')  // true

// {n} — exactly n times
/d{4}/.test('2024')     // true
/d{4}/.test('202')      // false

// {n,m} — between n and m times (inclusive)
/d{2,4}/.test('123')    // true

// {n,} — n or more times
/d{3,}/.test('12345')   // true

Anchors

Anchors don't match characters — they match positions within the string. This lets you assert that a pattern appears at the start, end, or at a word boundary, rather than anywhere in the string.

// ^ — must start at the beginning of the string
/^cat/.test('cat sat')     // true — starts with "cat"
/^cat/.test('the cat')     // false — "cat" is not at start

// $ — must reach the end of the string
/cat$/.test('the cat')     // true — ends with "cat"
/cat$/.test('cat sat')     // false — "cat" is not at end

// ^...$ — full string match
/^cat$/.test('cat')            // true — exactly "cat"
/^cat$/.test('concatenate')    // false — extra characters

//  — word boundary
// Matches between a word character and a non-word character
/cat/.test('the cat sat')  // true
/cat/.test('concatenate')  // false — "cat" is inside a word

Word boundaries (\b) are especially useful when you want to match whole words. Without them, searching for "cat" in "concatenate" returns a match because the substring exists — even though the word "cat" does not appear as a standalone word.

Groups and capturing

Parentheses serve two purposes: grouping characters so quantifiers apply to the whole group, and capturing the matched text so you can reference it later. A non-capturing group (?:...) groups without capturing — useful when you need grouping for alternation or quantifiers but don't need the captured value.

// Alternation — match "cat" or "dog"
/(cat|dog)/.test('I have a dog')    // true

// Quantifier on a group — match "ha" repeated
/(ha)+/.test('hahaha')             // true

// Capturing groups — extract the match
const m = '2024-05-27'.match(/(d{4})-(d{2})-(d{2})/)
// m[0] = '2024-05-27'  (full match)
// m[1] = '2024'        (first group)
// m[2] = '05'          (second group)
// m[3] = '27'          (third group)

// Use captured groups in replace with $1, $2, ...
'2024-05-27'.replace(/(d{4})-(d{2})-(d{2})/, '$2/$3/$1')
// → '05/27/2024'

// Non-capturing group — groups without polluting the match array
/(?:https?://)?([w-]+)/.exec('https://example.com')
// Captures 'example' without capturing 'https://'

Flags

Flags modify how the entire pattern behaves. In JavaScript they appear after the closing slash: /pattern/flags. Multiple flags can be combined in any order.

FlagEffect
gGlobal — find all matches, not just the first
iCase-insensitive — treats uppercase and lowercase as equal
mMultiline — ^ and $ match the start and end of each line, not just the whole string
sdotAll — makes . match newlines in addition to other characters
dIndices — includes start/end indices for each capture group in the match result
// g — return all matches
'cat and cat'.match(/cat/g)      // ['cat', 'cat']
'cat and cat'.match(/cat/)       // ['cat'] (first only)

// i — case-insensitive
/hello/i.test('Hello World')     // true

// m — multiline anchors
const text = 'line1
line2
line3'
text.match(/^w+/gm)             // ['line1', 'line2', 'line3']

// s — dotAll (dot matches newlines)
/start.end/s.test('start
end')  // true

Quick reference

PatternMeaningExample match
.Any character except newline"a", "1", "@"
\dDigit [0-9]"0" through "9"
\DNon-digit"a", " ", "@"
\wWord character [a-zA-Z0-9_]"a"–"z", "A"–"Z", "0"–"9", "_"
\WNon-word character"@", " ", "-"
\sWhitespace (space, tab, newline)" ", "\t", "\n"
\SNon-whitespace"a", "1", "@"
^Start of string
$End of string
\bWord boundary
*0 or more of preceding
+1 or more of preceding
?0 or 1 of preceding (optional)
{n}Exactly n of preceding
{n,m}Between n and m of preceding
[abc]Any one of: a, b, c"a", "b", or "c"
[^abc]Any character NOT a, b, or c"d", "1", " "
[a-z]Any character in range a–z"a", "m", "z"
(abc)Capturing group
(?:abc)Non-capturing group
a|ba or b (alternation)"a" or "b"

Practical examples

These are patterns you'll encounter (and reach for) regularly. Each is a reasonable baseline — not guaranteed to cover every edge case for every standard, but robust enough for most real-world validation and extraction tasks.

// Email address (basic)
// Requires: non-whitespace/@ chars, @, domain, dot, TLD
/^[^s@]+@[^s@]+.[^s@]+$/

// US phone number
// Matches: (555) 123-4567 | 555-123-4567 | 555.123.4567
/^(?d{3})?[-.s]?d{3}[-.s]?d{4}$/

// URL (http or https)
/^https?://[w-]+(.[w-]+)+([w.,@?^=%&:/~+#-]*[w@?^=%&/~+#-])?$/

// Hex color
// Matches: #fff | #a1b2c3
/^#([0-9a-fA-F]{3}|[0-9a-fA-F]{6})$/

// IPv4 address (format check, not range check)
// Use additional validation to verify each octet is 0–255
/^(d{1,3}.){3}d{1,3}$/

// ISO date (YYYY-MM-DD)
/^d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]d|3[01])$/

// Slug (URL-safe identifier)
/^[a-z0-9]+(?:-[a-z0-9]+)*$/
Try it: Regex Tester
Test any regular expression with live match highlighting.
Open tool →

Frequently asked questions

What's the difference between greedy and lazy matching?
By default, quantifiers are greedy — they match as much as possible. Adding ? after a quantifier makes it lazy: it matches as little as possible. For example, /<.+>/ on "<a>text</a>" matches the entire string from the first < to the last >, but /<.+?>/ matches just "<a>". Lazy quantifiers are essential when matching delimited content like HTML tags or quoted strings.
Why does my regex work in one language but not another?
Regex syntax is mostly standard but has dialect differences. JavaScript didn't support lookbehind assertions until ES2018, Python's re module handles verbose mode and named groups slightly differently, and PCRE (used by PHP and many others) has features that only recently landed in JavaScript. When porting a pattern, always test it in the target language and runtime.
How do I match a literal special character?
Escape it with a backslash. To match a literal dot, use \. — without the backslash, . is a wildcard. To match a literal parenthesis, use \(. The characters that need escaping are: . * + ? ^ $ { } [ ] | ( ) \
Is regex the right tool for parsing HTML?
Generally no. HTML is not a regular language — it allows arbitrary nesting, optional closing tags, and context-sensitive rules that make regex patterns fragile and hard to maintain. Use a proper HTML parser (such as DOMParser in browsers or cheerio in Node.js) for any real HTML processing. Regex is fine for simple pattern matching on specific attribute values or known template output, but not for general HTML parsing.
Tutorial de Regex — UtilYard