6.19. Regex RE Substitute

  • re.sub()

  • Replace matched substring with text

6.19.1. SetUp

>>> import re

6.19.2. Problem

>>> email = 'mwatney@nasa.gov'
>>>
>>> email.replace('@nasa.gov', '@esa.int')
'mwatney@esa.int'

What if there are multiple top-level domains (TLDs)?

>>> email = 'mwatney@nasa.gov'
>>> email = 'mwatney@nasa.com'
>>> email = 'mwatney@nasa.us'
>>> email = 'mwatney@nasa.pl'

String method str.replace() will fail...

6.19.3. Solution

>>> email = 'mwatney@nasa.gov'
>>>
>>> pattern = r'^(?P<username>[a-z]+)@nasa.[a-z]+$'
>>> replace = r'\g<username>@esa.int'
>>>
>>> re.sub(pattern, replace, email)
'mwatney@esa.int'

6.19.4. Use Case - 1

Usage of re.sub():

>>> import re
>>>
>>>
>>> text = 'Baked Beans And Spam'
>>> pattern = r'\s[a-z]{3}\s'
>>> replace = ' & '
>>>
>>> re.sub(pattern, replace, text, flags=re.IGNORECASE)
'Baked Beans & Spam'