7.23. Regex Flavors
In other programming languages
PCRE - Perl Compatible Regular Expressions
7.23.1. SetUp
>>> import re
7.23.2. Literals
In Python we use raw-string (
r'...')In JavaScript we use
/pattern/flags
Python:
'hello' # unicode string literal
"hello" # unicode string literal
r'hello' # raw-string literal
r"hello" # raw-string literal
JavaScript:
'hello' // string literal
"hello" // string literal
`hello` // template literal
/hello/ // regular expression
7.23.3. Compile
Python:
result = re.compile(r'[a-z]')
JavaScript:
result = new RegExp("[a-z]")
7.23.4. Flags
In Python we use raw-string (
r'...')In JavaScript we use
/pattern/flags
Python:
re.compile(r'[a-z]+', flags=re.I)
re.compile(r'[a-z]+', flags=re.I|re.M)
re.compile(r'[a-z]+', flags=re.IGNORECASE)
re.compile(r'[a-z]+', flags=re.IGNORECASE|re.MULTILINE)
JavaScript:
/[a-z]+/i
/[a-z]+/im
JavaScript:
new RegExp("[a-z]", "i") new RegExp("[a-z]", "im")
7.23.5. Named Groups
In Python we use
(?P<name>...)In JavaScript we use
(?<name>...)
Python:
r'(?P<mygroup>[a-z]+)'
JavaScript:
/(?<mygroup>[a-z]+)/
7.23.6. Range
[a-Z]==[a-zA-Z][a-9]==[a-zA-Z0-9]Works in other languages, but not in Python
Python:
r'[a-z]' # ok
r'[A-Z]' # ok
r'[A-z]' # ok
r'[a-Z]' # re.PatternError: bad character range a-Z at position 1
JavaScript:
/[a-Z]/ // SyntaxError: Invalid regular expression: /[a-Z]/: Range out of order in character class
Perl:
/[a-Z]/ # works
7.23.7. Group Backreference
\g<name>- Python\g<1>- Python\1$1- grep, egrep, Jetbrains IDE
Python:
r'(?P<word>[a-z]+)\s+(?P=word)'
r'([a-z]+)\s+\1'
JavaScript:
/(?<word>[a-z]+)\s+\k<word>/
/([a-z]+)\s+\1/
7.23.8. Named Ranges
[:alpha:]- Alphabetic character[a-zA-Z][:alnum:]- Alphabetic and numeric character[a-zA-Z0-9][:blank:]- Space or tab[:cntrl:]- Control character[:digit:]- Digit[:graph:]- Non-blank character (excludes spaces, control characters, and similar)[:lower:]- Lowercase alphabetical character[:print:]- Like [:graph:], but includes the space character[:punct:]- Punctuation character[:space:]- Whitespace character ([:blank:], newline, carriage return, etc.)[:upper:]- Uppercase alphabetical[:xdigit:]- Digit allowed in a hexadecimal number (i.e., 0-9a-fA-F)[:word:]- A character in one of the following Unicode general categories Letter, Mark, Number, Connector_Punctuation[:ascii:]- A character in the ASCII character set
In Python those Named Ranges does not work. String [:alpha:] will be
interpreted literally as either: : or a or l or p or h
or a.
>>> string = 'Hello Alice'
>>>
>>> re.findall(r'[A-Z]', string)
['H', 'A']
>>>
>>> re.findall(r'[:upper:]', string)
['e', 'e']