6.22. Regex Cheatsheet

Also known as: "Regular Expressions", "Regular Expr", "regexp", "regex" or "re"

6.22.1. Syntax

a - exact
a|b - alternative
[abc] - enumerated character class
[a-z] - range character class
. - any character except a newline (changes meaning with re.DOTALL)
^ - start of line (changes meaning with re.MULTILINE)
$ - end of line (changes meaning with re.MULTILINE)
\A - start of text (doesn't change meaning with re.MULTILINE)
\Z - end of text (doesn't change meaning with re.MULTILINE)
[^] - negation
\d - digit (alias to [0-9])
\D - anything but digit (alias to [^0-9])
\s - whitespace (space, tab, newline, non-breaking space)
\S - anything but whitespace
\b - word boundary
\B - anything but word boundary
\w - any unicode alphabet character (lower or upper, also with diacritics (i.e. ąćęłńóśżź...), numbers and underscores
\W - anything but any unicode alphabet character (i.e. whitespace, dots, comas, dashes)
{n} - exactly n repetitions, exact
{,n} - maximum n repetitions, greedy (prefer longest)
{n,} - minimum n repetitions, greedy (prefer longest)
{n,m} - minimum n repetitions, maximum m times, greedy (prefer longest)
* - minimum 0 repetitions, no maximum, greedy (prefer longest), alias to {0,}
+ - minimum 1 repetitions, no maximum, greedy (prefer longest), alias to {1,}
? - minimum 0 repetitions, maximum 1 repetitions, greedy (prefer longest), alias to {0,1}
{,n}? - maximum n repetitions, lazy (prefer shorter)
{n,}? - minimum n repetitions, lazy (prefer shorter)
{n,m}? - minimum n repetitions, maximum m times, lazy (prefer shorter)
*? - minimum 0 repetitions, no maximum, lazy (prefer shorter), alias to {0,}?
+? - minimum 1 repetitions, no maximum, lazy (prefer shorter), alias to {1,}?
?? - minimum 0 repetitions, maximum 1 repetition, lazy (prefer shorter), alias to {0,1}?
() - matches whatever regular expression is inside the parentheses, and indicates the start and end of a group
(...) - unnamed group (positional)
(?P<mygroup>...) - named group mygroup
(?:...) - non-capturing group
(?#...) - comment
(?P=name) - backreferencing by group name
\g<number> - backreferencing by group number
\g<name> - backreferencing by group name

6.22.2. Python

re.findall(pattern, text) - all occurrences of a pattern, results as a list[str]
re.finditer(pattern, text) - all occurrences of a pattern, results as an Iterator[re.Match]
re.search(pattern, text) - check if pattern is in the text (stops after first match), results as re.Match | None
re.match(pattern, text) - checks if text matches pattern (validation, ie. email, ssn, tax id, phone), results as re.Match | None
re.split(pattern, text) - split text by a pattern, results as a list[str]
re.sub(pattern, replace, text) - replaces occurrences of a pattern in text with other text, results as a str
re.compile(pattern) - prepare pattern for further use for example in a loop, results as a re.Pattern

6.22.3. Flags

re.ASCII - perform ASCII-only matching instead of full Unicode matching
re.IGNORECASE - case-insensitive search
re.MULTILINE - match can start in one line, and end in another
re.DOTALL - dot (.) matches also newline characters
re.UNICODE - turns on unicode character support for \w
re.VERBOSE - ignores spaces (except \s) and allows for comments in in re.compile()
re.DEBUG - display debugging information during pattern compilation