6.22. Regex Cheatsheet
Also known as: "Regular Expressions", "Regular Expr", "regexp", "regex" or "re"
6.22.1. Syntax
a
- exacta|b
- alternative[abc]
- enumerated character class[a-z]
- range character class.
- any character except a newline (changes meaning withre.DOTALL
)^
- start of line (changes meaning withre.MULTILINE
)$
- end of line (changes meaning withre.MULTILINE
)\A
- start of text (doesn't change meaning withre.MULTILINE
)\Z
- end of text (doesn't change meaning withre.MULTILINE
)[^]
- negation\d
- digit (alias to[0-9]
)\D
- anything but digit (alias to[^0-9]
)\s
- whitespace (space, tab, newline, non-breaking space)\S
- anything but whitespace\b
- word boundary\B
- anything but word boundary\w
- any unicode alphabet character (lower or upper, also with diacritics (i.e. ąćęłńóśżź...), numbers and underscores\W
- anything but any unicode alphabet character (i.e. whitespace, dots, comas, dashes){n}
- exactly n repetitions, exact{,n}
- maximum n repetitions, greedy (prefer longest){n,}
- minimum n repetitions, greedy (prefer longest){n,m}
- minimum n repetitions, maximum m times, greedy (prefer longest)*
- minimum 0 repetitions, no maximum, greedy (prefer longest), alias to{0,}
+
- minimum 1 repetitions, no maximum, greedy (prefer longest), alias to{1,}
?
- minimum 0 repetitions, maximum 1 repetitions, greedy (prefer longest), alias to{0,1}
{,n}?
- maximum n repetitions, lazy (prefer shorter){n,}?
- minimum n repetitions, lazy (prefer shorter){n,m}?
- minimum n repetitions, maximum m times, lazy (prefer shorter)*?
- minimum 0 repetitions, no maximum, lazy (prefer shorter), alias to{0,}?
+?
- minimum 1 repetitions, no maximum, lazy (prefer shorter), alias to{1,}?
??
- minimum 0 repetitions, maximum 1 repetition, lazy (prefer shorter), alias to{0,1}?
()
- matches whatever regular expression is inside the parentheses, and indicates the start and end of a group(...)
- unnamed group (positional)(?P<mygroup>...)
- named group mygroup(?:...)
- non-capturing group(?#...)
- comment(?P=name)
- backreferencing by group name\g<number>
- backreferencing by group number\g<name>
- backreferencing by group name
6.22.2. Python
re.findall(pattern, text)
- all occurrences of a pattern, results as alist[str]
re.finditer(pattern, text)
- all occurrences of a pattern, results as anIterator[re.Match]
re.search(pattern, text)
- check if pattern is in the text (stops after first match), results asre.Match | None
re.match(pattern, text)
- checks if text matches pattern (validation, ie. email, ssn, tax id, phone), results asre.Match | None
re.split(pattern, text)
- split text by a pattern, results as alist[str]
re.sub(pattern, replace, text)
- replaces occurrences of a pattern in text with other text, results as astr
re.compile(pattern)
- prepare pattern for further use for example in a loop, results as are.Pattern
6.22.3. Flags
re.ASCII
- perform ASCII-only matching instead of full Unicode matchingre.IGNORECASE
- case-insensitive searchre.MULTILINE
- match can start in one line, and end in anotherre.DOTALL
- dot (.
) matches also newline charactersre.UNICODE
- turns on unicode character support for\w
re.VERBOSE
- ignores spaces (except\s
) and allows for comments in inre.compile()
re.DEBUG
- display debugging information during pattern compilation