10. Regular Expression Matching (Hard)
Problem
Implement regular expression matching with support for:
., matches any single character.*, matches zero or more of the preceding element.
The match must cover the entire input string (not partial).
Example
s = "aa",p = "a"→falses = "aa",p = "a*"→trues = "ab",p = ".*"→trues = "mississippi",p = "mis*is*p*."→false
LeetCode 10 · Link · Hard
Approach 1: Recursive
Match character-by-character, special-casing *.
def is_match(s, p): def match(i, j): # L1: define recursive helper if j == len(p): # L2: O(1) base case check return i == len(s) first = i < len(s) and (p[j] == '.' or p[j] == s[i]) # L3: O(1) single-char match check if j + 1 < len(p) and p[j + 1] == '*': # L4: O(1) check for '*' # zero copies, or one more copy return match(i, j + 2) or (first and match(i + 1, j)) # L5: two recursive calls return first and match(i + 1, j + 1) # L6: one recursive call return match(0, 0)Where the time goes, line by line
Variables: m = len(s), n = len(p).
| Line | Per-call cost | Times executed | Contribution |
|---|---|---|---|
| L1-L4 (setup + checks) | O(1) | per call | O(1) each |
| L5 (two recursive branches on ’*‘) | O(1) work + 2 calls | up to O(2^(m+n)) calls total | O(2^(m+n)) ← dominates |
| L6 (one recursive branch) | O(1) work + 1 call | per call | O(2^(m+n)) same tree |
The exponential blowup comes from the * branch at L5: each * pair can spawn two calls. Without memoization the same (i, j) pair is recomputed many times.
Complexity
- Time: O(2^(m + n)) worst case due to overlapping subproblems (driven by L5 double-branching).
- Space: O(m + n) recursion depth.
Approach 2: Top-down memoized (canonical)
Cache by (i, j).
from functools import lru_cache
def is_match(s, p): @lru_cache(maxsize=None) # L1: O(1) cache decorator setup def match(i, j): if j == len(p): # L2: O(1) base case return i == len(s) first = i < len(s) and (p[j] == '.' or p[j] == s[i]) # L3: O(1) char match check if j + 1 < len(p) and p[j + 1] == '*': # L4: O(1) '*' check return match(i, j + 2) or (first and match(i + 1, j)) # L5: O(1) with cache return first and match(i + 1, j + 1) # L6: O(1) with cache return match(0, 0)Where the time goes, line by line
Variables: m = len(s), n = len(p).
| Line | Per-call cost | Times executed | Contribution |
|---|---|---|---|
| L1 (lru_cache) | O(1) | 1 | O(1) |
| L2-L4 (checks) | O(1) | once per unique (i,j) | O(m · n) total |
| L5, L6 (recursive calls) | O(1) per call (cache hit after first) | at most m · n unique states | O(m · n) ← dominates |
With memoization, each unique (i, j) pair is computed exactly once. There are (m+1) * (n+1) such pairs, each doing O(1) work.
Complexity
- Time: O(m · n), driven by L5/L6 over all unique (i,j) states.
- Space: O(m · n) for the memo table.
Approach 3: Bottom-up 2-D DP
dp[i][j] = does s[:i] match p[:j]?
def is_match(s, p): m, n = len(s), len(p) # L1: O(1) dp = [[False] * (n + 1) for _ in range(m + 1)] # L2: O(m*n) table init dp[0][0] = True # L3: O(1) base case
# Empty string vs. patterns like "a*", "a*b*" for j in range(1, n + 1): # L4: O(n) init loop if p[j - 1] == '*': dp[0][j] = dp[0][j - 2]
for i in range(1, m + 1): # L5: outer loop O(m) for j in range(1, n + 1): # L6: inner loop O(n) if p[j - 1] == '*': # L7: O(1) check # zero occurrences dp[i][j] = dp[i][j - 2] # L8: O(1) # one or more: previous char matches current s if p[j - 2] == '.' or p[j - 2] == s[i - 1]: # L9: O(1) dp[i][j] = dp[i][j] or dp[i - 1][j] # L10: O(1) else: if p[j - 1] == '.' or p[j - 1] == s[i - 1]: # L11: O(1) dp[i][j] = dp[i - 1][j - 1] # L12: O(1) return dp[m][n]Where the time goes, line by line
Variables: m = len(s), n = len(p).
| Line | Per-call cost | Times executed | Contribution |
|---|---|---|---|
| L1-L3 (init) | O(1) or O(m·n) | 1 | O(m · n) |
| L4 (base-case loop) | O(1) | n | O(n) |
| L5+L6 (double loop) | O(1) body | m · n | O(m · n) ← dominates |
| L7-L12 (table fills) | O(1) | once each | included above |
Every cell is filled in O(1) with a constant number of table lookups. The double loop at L5/L6 visits all m * n cells exactly once.
Complexity
- Time: O(m · n), driven by L5/L6 (the full double loop over all DP cells).
- Space: O(m · n) for the DP table.
Summary
| Approach | Time | Space |
|---|---|---|
| Naive recursion | O(2^(m + n)) | O(m + n) |
| Top-down memoized | O(m · n) | O(m · n) |
| Bottom-up 2-D DP | O(m · n) | O(m · n) |
This problem rewards memorization, the recurrence is fiddly and the edge cases around * at the start are easy to bungle. Interview-standard answer is memoized recursion for clarity.
Test cases
# Quick smoke tests, paste into a REPL or save as test_010.py and run.# Uses the canonical implementation (Approach 2: top-down memoized).
from functools import lru_cache
def is_match(s, p): @lru_cache(maxsize=None) def match(i, j): if j == len(p): return i == len(s) first = i < len(s) and (p[j] == '.' or p[j] == s[i]) if j + 1 < len(p) and p[j + 1] == '*': return match(i, j + 2) or (first and match(i + 1, j)) return first and match(i + 1, j + 1) return match(0, 0)
def _run_tests(): # problem statement examples assert is_match("aa", "a") == False assert is_match("aa", "a*") == True assert is_match("ab", ".*") == True assert is_match("mississippi", "mis*is*p*.") == False # edge: empty string vs empty pattern assert is_match("", "") == True # edge: empty string vs "a*" (zero occurrences) assert is_match("", "a*") == True print("all tests pass")
if __name__ == "__main__": _run_tests()Related data structures
- Strings, regex over prefix lengths