7.27. Regex Recap
7.27.1. Literals
Also known as "Literal Characters"
Occurrence of that character in the string
Syntax:
a- exacta|b- alternative
Example:
1- number 1 anywhere in the string1|2|3- numbers 1, 2 or 3 anywhere in the string
7.27.2. Classes
Also known as "Character Classes"
One out of several characters
Syntax:
[abc]- enumeration[a-z]- range
Examples:
[12345]- numbers 1,2,3,4 or 5 anywhere in a string[0-9]- numbers from 0 to 9 anywhere in a string[a-z]- lowercase letters from a to z anywhere in a string[A-Z]- uppercase letters from A to Z anywhere in a string[a-zA-Z0-9]- uppercase and lowercase letters (from a to z) anywhere in a string
7.27.3. Metacharacters
Special characters
\- backslash^- caret$- dollar sign.- period or dot|- vertical bar or pipe symbol?- question mark*- asterisk or star+- plus sign(- opening parenthesis)- closing parenthesis[- opening square bracket[- closing square bracket{- opening curly brace}- closing curly brace
Example:
.- Any character anywhere in a string, by default does not match a newline (this changes withre.DOTALL)
7.27.4. Anchors
Match a position before, after, or between characters
Syntax:
^- start of line (changes meaning withre.MULTILINE)$- end of line (changes meaning withre.MULTILINE)\A- start of string (doesn't change meaning withre.MULTILINE)\Z- end of string (doesn't change meaning withre.MULTILINE)
Examples:
^[0-9]- digit at the line start[0-9]$- digit at the line end\A[0-9]- digit at the string start[0-9]\Z- digit at the string end
7.27.5. Negation
Negation logically inverts qualifier
Syntax:
[^]- negation
Examples:
[0-9]- digit anywhere in a string[^0-9]- anything but a digit anywhere in a string^[0-9]- digit at the beginning of a line^[^0-9]- not-a-digit at the beginning of a line
7.27.6. Shorthands
Shorthand Character Classes
Syntax:
\d- digit anywhere in a string, alias to[0-9]\D- anything but a digit anywhere in a string, alias to[^0-9]\s- whitespace character (space, tab, newline, non-breaking space), alias to[ \t\v\f\n\r\n]\S- anything but a whitespace\b- word boundary, anything but another letter (i.e. spaces, punctuations, brackets etc)\B- anything but word boundary\w- any unicode alphabet character (lowercase, uppercase, numbers, underscores, national characters (i.e. ąćęłńóśżźĄĆĘŁŃÓŚŻŹ...)\W- anything but any unicode alphabet character (i.e. whitespace, dots, comas, dashes, brackets)
7.27.7. Quantifiers
Repetition
How many occurrences of preceding token
Exact - exactly number of times
Greedy - prefer longest match, works better with numbers, (default)
Lazy - prefer shortest matches - works better with strings
Exact:
{n}- exactly n repetitions
Greedy:
{,n}- maximum n repetitions, prefer longer (greedy){n,}- minimum n repetitions, prefer longer (greedy){n,m}- minimum n repetitions, maximum m times, prefer longer (greedy)*- minimum 0 repetitions, no maximum, prefer longer (alias to{0,}) (greedy)+- minimum 1 repetitions, no maximum, prefer longer (alias to{1,}) (greedy)?- minimum 0 repetitions, maximum 1 repetitions, prefer longer (alias to{0,1}) (greedy)
Lazy:
{,n}?- maximum n repetitions, prefer shorter{n,}?- minimum n repetitions, prefer shorter{n,m}?- minimum n repetitions, maximum m times, prefer shorter*?- minimum 0 repetitions, no maximum, prefer shorter (alias to{0,}?)+?- minimum 1 repetitions, no maximum, prefer shorter (alias to{1,}?)??- minimum 0 repetitions, maximum 1 repetition, prefer shorter (alias to{0,1}?)
Examples:
\d{4}- digit exactly 4 times (exact)\d{2,4}- digit from 2 to 4 times (greedy, prefer longest)\d{2,}- digit from 2 to infinity times (greedy, prefer longest)\d{,4}- digit from 0 to 4 times (greedy, prefer longest)\d{1,}- at least one digit (greedy, prefer longest)\d+- at least one digit, alias to\d{1,}(greedy, prefer longest)\d{0,}- at least zero digit (greedy, prefer longest)\d*- at least zero digit, alias to\d{0,}(greedy, prefer longest)\d{0,1}- optional digit (greedy, prefer longest)\d?- optional digit, alias to\d{0,1}(greedy, prefer longest)\d{2,4}?- digit from 2 to 4 times (lazy, prefer shortest)\d{2,}?- digit from 2 to infinity times (lazy, prefer shortest)\d{,4}?- digit from 0 to 4 times (lazy, prefer shortest)\d{1,}?- at least one digit (lazy, prefer shortest)\d+?- at least one digit, alias to\d{1,}(lazy, prefer shortest)\d{0,}?- at least zero digit (lazy, prefer shortest)\d*?- at least zero digit, alias to\d{0,}(lazy, prefer shortest)\d{0,1}?- optional digit (lazy, prefer shortest)\d??- optional digit, alias to\d{0,1}(lazy, prefer shortest)
7.27.8. Groups
Catch expression results
Can be named or positional
Syntax:
(...)- unnamed group (positional)(?P<mygroup>...)- named group (with name: mygroup)(?:...)- non-capturing group(?#...)- comment
Examples:
(\d{1,2})- group with 1 or 2 digits (unnamed group)(?P<year>\d{4})- 4 digits in a group named "year" (named group)(?P<month>\w+)- three word characters in a group named "month" (named group)(?P<day>\d{1,2})- 1 or 2 digits in a group named "day" (named group)Nov (\d{1,2})- string "Nov" followed by 1 or 2 digits (unnamed group)Nov \d{2}(st|nd|th|rd)- string "Nov" followed by by 1 or 2 digits and one of: "st", "nd", "th" or "rd" - match the ordinalNov \d{2}(?:st|nd|th|rd)- string "Nov" followed by by 1 or 2 digits and one of: "st", "nd", "th" or "rd" - do not match the ordinalNov \d{2}st(?#ordinal)- string "Nov" followed by by 1 or 2 digits and one of: "st", "nd", "th" or "rd" and comment "ordinal"
7.27.9. Backreference
Match the same string as previously matched by a capturing group
Syntax:
\g<number>- backreferencing by group number\g<name>- backreferencing by group name(?P=name)- backreferencing by group name
Examples:
\g<2> \g<1> \g<3>\g<day> \g<month> \g<year><(?P<tagname>[a-z]+)>(.*)</(?P=tagname)>
7.27.10. Flags
re.ASCII- perform ASCII-only matching instead of full Unicode matchingre.IGNORECASE- case-insensitive searchre.LOCALE- case-insensitive matching dependent on the current locale (deprecated)re.MULTILINE- match can start in one line, and end in anotherre.DOTALL- dot (.) matches also newline charactersre.UNICODE- turns on unicode character support for\wre.VERBOSE- ignores spaces (except\s) and allows for comments in inre.compile()re.DEBUG- display debugging information during pattern compilation
7.27.11. Python
re.findall()- all matches at once, returnslist[str]re.finditer()- all matches one at a time, returnsIterator[re.Match]re.search()- whether string contains (stop after first match), returnsre.Match | Nonere.match()- whether string matches pattern (validation, np. email, ssn, tax id, phone), returnsre.Match | Nonere.split()- splits string by pattern, returnslist[str]re.sub()- replaces group matches in string (works best with named groups), returnsstrre.compile()- prepares pattern for further use (match against it), returnsre.Pattern