Regular Expressions (by Example)
A regular expression is a pattern for finding text. Different tools support different regex features, so the same pattern may need a different flag or regex engine depending on where it runs.
Ripgrep examples in this note use Rust’s
regex crate by default. That engine is designed for predictable search time
and deliberately leaves out features such as lookaround and backreferences.1
For ripgrep-specific flags, output formats, and search behavior, see ripgrep.
Basics
Section titled “Basics”Pattern
Section titled “Pattern”The regex itself. In rg 'TODO|FIXME', the pattern is TODO|FIXME.
Haystack
Section titled “Haystack”The text being searched. With ripgrep, the haystack is usually each searched line in each searched file.
Literal
Section titled “Literal”Text that matches itself. TODO matches the four characters TODO.
Metacharacter
Section titled “Metacharacter”A character with regex meaning, such as |, (, ), [, ], *, +, ?,
^, $, and ..
Escape one with \ when you need the literal character.
Character class
Section titled “Character class”A set of characters where one character may match. [0-9] matches one ASCII
digit. [[:blank:]] matches a tab or space in Rust regex syntax.2
Quantifier
Section titled “Quantifier”A suffix that controls repetition. + means one or more, * means zero or
more, and ? means zero or one.3
Anchor
Section titled “Anchor”A position in the line rather than a character. ^ means the start of a line,
and $ means the end of a line.3
Parentheses collect part of a pattern. (TODO|FIXME) lets the alternation apply
to the whole set of words. Use (?:...) when you only need grouping and do not
need to capture the matched text.4
Alternation
Section titled “Alternation”x|y means “match x or y”. Alternation prefers the leftmost branch that
can make the overall regex match.5
Lookaround
Section titled “Lookaround”Lookaround checks text near the current match position without consuming that text. That means the lookaround can make the match succeed or fail, but the looked-at text is not part of the returned match.
PCRE2 supports four common lookaround forms:6
| Form | Name | Meaning |
|---|---|---|
(?=...) | Positive lookahead | The next text must match .... |
(?!...) | Negative lookahead | The next text must not match .... |
(?<=...) | Positive lookbehind | The previous text must match .... |
(?<!...) | Negative lookbehind | The previous text must not match .... |
For example:
grep -P 'TODO(?=:)' notes.txtThis matches TODO only when the next character is :. It matches the
TODO in TODO: fix parser, but the colon is not part of the match.
Another example:
grep -P '^\s*//(?![!/])' example.zigThe negative lookahead (?![!/]) checks the character after //. It allows
a plain // comment, but rejects //! and /// documentation comments.
Use lookaround when you need context without including that context in the match. If you only need to match the context too, a normal group is usually simpler.
For example, TODO: is simpler than TODO(?=:) when matching the colon is
acceptable.
Lookaround is not part of every regex engine. In GNU grep, -P enables
Perl-compatible regex syntax when PCRE support is available.7 In
ripgrep, --pcre2 or -P enables the PCRE2 engine.
Backreferences
Section titled “Backreferences”A backreference matches the same text that an earlier capture group matched.
For example:
grep -P '\b([A-Za-z]+)\s+\1\b' notes.txtBreakdown
Section titled “Breakdown”\bstarts at a word boundary.([A-Za-z]+)captures one ASCII word.\s+matches one or more whitespace characters.\1matches the exact text captured by the first group.- The final
\bends at a word boundary.
This finds repeated words such as the the and parser parser.
It does not match the them, because \1 must match the exact
text captured by ([A-Za-z]+).
Named groups can make backreferences clearer:
grep -P '\b(?<word>[A-Za-z]+)\s+\k<word>\b' notes.txtHere, (?<word>[A-Za-z]+) captures a word into a group named word, and \k<word>
matches that same captured text.
PCRE2 supports numeric and named backreference forms, including
\1, \g{1}, and \k<name>.8
Backreferences are useful for duplicate detection and “same thing appears again” checks. They are not part of every regex engine; ripgrep’s default Rust regex engine does not support them.1
Finding TODO Comments
Section titled “Finding TODO Comments”rg --no-messages --vimgrep -H --column --line-number \ --color never \ -e '(TODO|FIXME|BUG|HACK|XXX)' .(TODO|FIXME|BUG|HACK|XXX)Breakdown
Section titled “Breakdown”(...)groups the alternatives so they behave as one unit.TODO|FIXME|BUG|HACK|XXXmatches any one of those literal words.- Each word is a literal. The
|characters are metacharacters that separate alternatives.
This is intentionally broad. It will match TODO anywhere on a line, including
inside code, strings, prose, and comments. That can be useful for a first pass
because it has very few assumptions.
Command
Section titled “Command”-eprovides the search pattern. This is useful when the pattern starts with-or when a command has several patterns.--vimgrepprints matches in a Vim-friendly shape: file, line, column, and matched line.-H,--line-number, and--columnforce file names, line numbers, and columns in the output.--no-messageshides some file access errors.--color neverkeeps output plain for scripts or editor integrations.
Finding TODO Comments After Comment Prefixes
Section titled “Finding TODO Comments After Comment Prefixes”TAGS='BUG|HACK|FIXME|TODO|XXX|\[ \]|\[x\]'PREFIX='//|#|<!--|;|/\*|^|^[[:blank:]]*(-|[0-9]+\.)'
rg --no-messages --vimgrep -H --column --line-number --color never \ --max-columns=1000 --no-config \ -e "(${PREFIX})[[:space:]]*(${TAGS})" \ -g '!**/.git/**' \ -g '!**/node_modules/**' \ -g '!**/target/**' \ -g '!**/.build/**' \ .(//|#|<!--|;|/\*|^|^[[:blank:]]*(-|[0-9]+\.))[[:space:]]*(BUG|HACK|FIXME|TODO|XXX|\[ \]|\[x\])BUG|HACK|FIXME|TODO|XXX|\[ \]|\[x\]Breakdown
Section titled “Breakdown”BUG,HACK,FIXME,TODO, andXXXare literal tag words.|means any one of those tags may match.\[and\]match literal square brackets. Without the backslashes,[and]would start and end a character class.\[ \]matches the unchecked Markdown task marker[ ].\[x\]matches the checked Markdown task marker[x].
PREFIX
Section titled “PREFIX”//|#|<!--|;|/\*|^|^[[:blank:]]*(-|[0-9]+\.)Breakdown
Section titled “Breakdown”//matches slash-style comments in languages like JavaScript, Rust, Go, Zig, C, and C++.#matches hash-style comments in shells, Python, Ruby, YAML, and many config files.<!--matches the start of an HTML or Markdown HTML comment.;matches semicolon comments used in formats such as Lisp and INI-like files./\*matches the start of a block comment,/*. The*is escaped because bare*means “repeat the previous thing zero or more times”.^matches the beginning of the line. In this pattern, it allows bare tags at the start of a line, not just tags after comment syntax.^[[:blank:]]*(-|[0-9]+\.)matches Markdown list markers at the start of a line:^[[:blank:]]*means start of line, then zero or more spaces or tabs.-matches a bullet marker.[0-9]+\.matches a numbered marker like1.or23..[0-9]+means one or more ASCII digits, and\.matches a literal dot.
The Middle
Section titled “The Middle”[[:space:]]*[[:space:]] is an ASCII whitespace class in Rust regex syntax. It includes
space, tab, new line, vertical tab, form feed, and carriage return.2
The * allows zero or more whitespace characters between the prefix and tag.
For normal ripgrep line searches, this effectively means “allow optional spacing before the tag.” Newline is part of the class definition, but ripgrep searches line by line unless multiline mode is enabled.
Why The Whole Pattern Is Grouped
Section titled “Why The Whole Pattern Is Grouped”(${PREFIX})[[:space:]]*(${TAGS})The first group isolates the prefix alternatives. The second group isolates the
tag alternatives. Without those groups, the | operators would make the overall
pattern harder to reason about because alternation binds loosely: a|bc means
a or bc, not (a|b)c.
Finding Plain Zig Comments
Section titled “Finding Plain Zig Comments”rg -n --pcre2 '^\s*//(?![!/])' -g '*.zig'The regex:
^\s*//(?![!/])Breakdown:
^anchors the match at the start of the line.\s*matches zero or more whitespace characters.//matches the literal comment prefix.(?![!/])is a negative lookahead. It says: at this position, the next character must not match[!/].[!/]is a character class matching either!or/.
This matches ordinary Zig line comments like:
// regular commentIt does not match documentation comments like:
//! module doc comment/// declaration doc commentThe --pcre2 flag is required because ripgrep’s default regex engine does not
support look-around. PCRE2 supports negative lookahead with (?!...).6
Practical Rules
Section titled “Practical Rules”Start with the simplest regex flavor your tool supports. Reach for PCRE-style features like lookahead, lookbehind, and backreferences when they make the pattern clearer or when the simpler engine cannot express the check.
In ripgrep, the default engine handles many everyday patterns. Use -P or
--pcre2 when you need PCRE2 features.9
Prefer obvious literal pieces. TODO|FIXME is easier to maintain than a clever
pattern that saves a few characters.
Escape punctuation when you mean the punctuation itself. \. means a literal
dot, while . means any non-newline character in the default engine.3
Use anchors to encode position. TODO can match anywhere, while ^TODO only
matches at the beginning of a line.
Use character classes for “one of these characters” and alternation for “one of
these words or phrases.” [0-9] chooses one digit. TODO|FIXME chooses one
word.
Use non-capturing groups, (?:...), when you need grouping for precedence but
do not need to extract the group later.
Footnotes
Section titled “Footnotes”-
regexcrate documentation. https://docs.rs/regex/latest/regex/. Accessed 7 June 2026. ↩ ↩2 -
regexcrate character class reference. https://docs.rs/regex/latest/regex/#character-classes. Accessed 7 June 2026. ↩ ↩2 -
regexcrate syntax reference. https://docs.rs/regex/latest/regex/#syntax. Accessed 7 June 2026. ↩ ↩2 ↩3 -
regexcrate grouping and flags reference. https://docs.rs/regex/latest/regex/#grouping-and-flags. Accessed 7 June 2026. ↩ -
regexcrate composites reference. https://docs.rs/regex/latest/regex/#composites. Accessed 7 June 2026. ↩ -
PCRE2 syntax reference, “Lookahead and Lookbehind Assertions.” https://pcre2project.github.io/pcre2/doc/pcre2syntax/#SEC21. Accessed 7 June 2026. ↩ ↩2
-
GNU Grep manual, “Regular Expressions.” https://www.gnu.org/s/grep/manual/html_node/Regular-Expressions.html. Accessed 7 June 2026. ↩
-
PCRE2 syntax reference, “Backreferences.” https://pcre2project.github.io/pcre2/doc/pcre2syntax/#SEC23. Accessed 7 June 2026. ↩
-
BurntSushi.
ripgrepREADME. https://github.com/BurntSushi/ripgrep. Accessed 7 June 2026. ↩