Introduction
Over the past week I have been working on [fast.ai's SolveIt course](https://solveit.fast.ai). This course is full of amazing stuff, but when I saw `regex.search.captures` I just had to share it and make sure I don't forget this cool feature! We will look at a simple example that illustrates the power of `regex.search.captures`.
The regex library
The [regex](https://github.com/mrabarnett/mrab-regex) library was introduced in Python 3.8. It is a superset of [re](https://docs.python.org/3/library/re.html) with additional functionality and more thorough Unicode support. Somehow this library is still kind of an open secret. Many people are unaware of this library and end up writing unncessarily complex `re` code.
Let's look at a simple example for parsing ID and ints in `"Item 1: 4 3 2 1"`. We construct a regex segment to parse the number after `Item` and integers that follow separated by spaces. `search` will return a `Match` object. We can then use `.captures` to get individual parts of the regex match. This `.captures` behavior is unique to the `regex` library in that it returns all occurrences of a capture group. `re` only returns the last one.
from regex import search
# 0. Construct regex to parse Item ID and list of ints
# '\d+' matches a number of one or more digits
# ' +' matches one or more spaces
# 'Item +(\d+):' matches the word Item followed by a number and ending with a colon
# ' +(\d+ *)+' matches one or more numbers with one space between each number
rx = r'Item +(\d+): +(\d+ *)+'
# 1. Search data for the pattern
data = "Item 1: 4 3 2 1"
m = search(rx, data)
# 2. Get desired contents
# Full match (same as .captures(0))
m.captures() # ['Item 1: 4 3 2 1']
# Get ID
m.captures(1) # ['1']
# Get list of ints
[int(i) for i in m.captures(2)] # [4, 3, 2, 1]
# Get multiple groups as a tuple
idx, ints = m.captures(1, 2)
idx # ['1']
[int(i) for i in ints] # [4, 3, 2, 1]
Hope this introduction let's you appreciate the power of the `regex` library! Note that this is technically a third-party module, but it is recognized and recommended in the [Python Docs on re](https://docs.python.org/3/library/re.html) (Yellow "See also" box). To learn more about other features of `regex` check out [this cool tutorial](https://learnbyexample.github.io/py_regular_expressions/regex-module.html). Also check out [regex101.com](https://regex101.com) for a great browser-based tool to test regex patterns.