6.12. Regex Syntax Flavors
In other programming languages
PCRE - Perl Compatible Regular Expressions
6.12.1. SetUp
>>> import re
6.12.2. Enclosing
In Python we use raw-string (
r'...'
)In JavaScript we use
/pattern/flags
ornew RegExp(pattern, flags)
Python:
r'[a-z]+'
JavaScript:
/[a-z]+/
JavaScript:
new RegExp("[a-z]")
6.12.3. Flags
In Python we use raw-string (
r'...'
)In JavaScript we use
/pattern/flags
ornew RegExp(pattern, flags)
Python:
re.findall(r'[a-z]+', TEXT, flags=re.I)
re.findall(r'[a-z]+', TEXT, flags=re.IGNORECASE)
JavaScript:
/[a-z]+/i
JavaScript:
new RegExp("[a-z]", 'i')
6.12.4. Range
[a-Z]
==[a-zA-Z]
[a-9]
==[a-zA-Z0-9]
Works in other languages, but not in Python
Python:
r'[a-Z]' # re.PatternError: bad character range a-Z at position 1
JavaScript:
/[a-Z]/ // SyntaxError: Invalid regular expression: /[a-Z]/: Range out of order in character class
Perl:
/[a-Z]/
6.12.5. Group Backreference
$1
- grep, egrep, Jetbrains IDE\1
\g<1>
- Python\g<name>
- Python
In JavaScript name groups don't have ?P
but only ?
:
Python:
r'(?P<name>\d+)'
JavaScript:
/(?<name>\d+)/
6.12.6. Named Ranges
[:alpha:]
- Alphabetic character[a-zA-Z]
[:alnum:]
- Alphabetic and numeric character[a-zA-Z0-9]
[:blank:]
- Space or tab[:cntrl:]
- Control character[:digit:]
- Digit[:graph:]
- Non-blank character (excludes spaces, control characters, and similar)[:lower:]
- Lowercase alphabetical character[:print:]
- Like [:graph:], but includes the space character[:punct:]
- Punctuation character[:space:]
- Whitespace character ([:blank:]
, newline, carriage return, etc.)[:upper:]
- Uppercase alphabetical[:xdigit:]
- Digit allowed in a hexadecimal number (i.e., 0-9a-fA-F)[:word:]
- A character in one of the following Unicode general categories Letter, Mark, Number, Connector_Punctuation[:ascii:]
- A character in the ASCII character set
In Python those Named Ranges does not work. String [:alpha:]
will be
interpreted literally as either: :
or a
or l
or p
or h
or a
.
>>> TEXT = 'hello world'
>>>
>>> re.findall(r'[:alpha:]', TEXT)
['h', 'l', 'l', 'l']