Skip to content

10. Regular Expression Matching (Hard)

Problem

Implement regular expression matching with support for:

  • ., matches any single character.
  • *, matches zero or more of the preceding element.

The match must cover the entire input string (not partial).

Example

  • s = "aa", p = "a"false
  • s = "aa", p = "a*"true
  • s = "ab", p = ".*"true
  • s = "mississippi", p = "mis*is*p*."false

LeetCode 10 · Link · Hard

Approach 1: Recursive

Match character-by-character, special-casing *.

def is_match(s, p):
def match(i, j): # L1: define recursive helper
if j == len(p): # L2: O(1) base case check
return i == len(s)
first = i < len(s) and (p[j] == '.' or p[j] == s[i]) # L3: O(1) single-char match check
if j + 1 < len(p) and p[j + 1] == '*': # L4: O(1) check for '*'
# zero copies, or one more copy
return match(i, j + 2) or (first and match(i + 1, j)) # L5: two recursive calls
return first and match(i + 1, j + 1) # L6: one recursive call
return match(0, 0)

Where the time goes, line by line

Variables: m = len(s), n = len(p).

LinePer-call costTimes executedContribution
L1-L4 (setup + checks)O(1)per callO(1) each
L5 (two recursive branches on ’*‘)O(1) work + 2 callsup to O(2^(m+n)) calls totalO(2^(m+n)) ← dominates
L6 (one recursive branch)O(1) work + 1 callper callO(2^(m+n)) same tree

The exponential blowup comes from the * branch at L5: each * pair can spawn two calls. Without memoization the same (i, j) pair is recomputed many times.

Complexity

  • Time: O(2^(m + n)) worst case due to overlapping subproblems (driven by L5 double-branching).
  • Space: O(m + n) recursion depth.

Approach 2: Top-down memoized (canonical)

Cache by (i, j).

from functools import lru_cache
def is_match(s, p):
@lru_cache(maxsize=None) # L1: O(1) cache decorator setup
def match(i, j):
if j == len(p): # L2: O(1) base case
return i == len(s)
first = i < len(s) and (p[j] == '.' or p[j] == s[i]) # L3: O(1) char match check
if j + 1 < len(p) and p[j + 1] == '*': # L4: O(1) '*' check
return match(i, j + 2) or (first and match(i + 1, j)) # L5: O(1) with cache
return first and match(i + 1, j + 1) # L6: O(1) with cache
return match(0, 0)

Where the time goes, line by line

Variables: m = len(s), n = len(p).

LinePer-call costTimes executedContribution
L1 (lru_cache)O(1)1O(1)
L2-L4 (checks)O(1)once per unique (i,j)O(m · n) total
L5, L6 (recursive calls)O(1) per call (cache hit after first)at most m · n unique statesO(m · n) ← dominates

With memoization, each unique (i, j) pair is computed exactly once. There are (m+1) * (n+1) such pairs, each doing O(1) work.

Complexity

  • Time: O(m · n), driven by L5/L6 over all unique (i,j) states.
  • Space: O(m · n) for the memo table.

Approach 3: Bottom-up 2-D DP

dp[i][j] = does s[:i] match p[:j]?

def is_match(s, p):
m, n = len(s), len(p) # L1: O(1)
dp = [[False] * (n + 1) for _ in range(m + 1)] # L2: O(m*n) table init
dp[0][0] = True # L3: O(1) base case
# Empty string vs. patterns like "a*", "a*b*"
for j in range(1, n + 1): # L4: O(n) init loop
if p[j - 1] == '*':
dp[0][j] = dp[0][j - 2]
for i in range(1, m + 1): # L5: outer loop O(m)
for j in range(1, n + 1): # L6: inner loop O(n)
if p[j - 1] == '*': # L7: O(1) check
# zero occurrences
dp[i][j] = dp[i][j - 2] # L8: O(1)
# one or more: previous char matches current s
if p[j - 2] == '.' or p[j - 2] == s[i - 1]: # L9: O(1)
dp[i][j] = dp[i][j] or dp[i - 1][j] # L10: O(1)
else:
if p[j - 1] == '.' or p[j - 1] == s[i - 1]: # L11: O(1)
dp[i][j] = dp[i - 1][j - 1] # L12: O(1)
return dp[m][n]

Where the time goes, line by line

Variables: m = len(s), n = len(p).

LinePer-call costTimes executedContribution
L1-L3 (init)O(1) or O(m·n)1O(m · n)
L4 (base-case loop)O(1)nO(n)
L5+L6 (double loop)O(1) bodym · nO(m · n) ← dominates
L7-L12 (table fills)O(1)once eachincluded above

Every cell is filled in O(1) with a constant number of table lookups. The double loop at L5/L6 visits all m * n cells exactly once.

Complexity

  • Time: O(m · n), driven by L5/L6 (the full double loop over all DP cells).
  • Space: O(m · n) for the DP table.

Summary

ApproachTimeSpace
Naive recursionO(2^(m + n))O(m + n)
Top-down memoizedO(m · n)O(m · n)
Bottom-up 2-D DPO(m · n)O(m · n)

This problem rewards memorization, the recurrence is fiddly and the edge cases around * at the start are easy to bungle. Interview-standard answer is memoized recursion for clarity.

Test cases

# Quick smoke tests, paste into a REPL or save as test_010.py and run.
# Uses the canonical implementation (Approach 2: top-down memoized).
from functools import lru_cache
def is_match(s, p):
@lru_cache(maxsize=None)
def match(i, j):
if j == len(p):
return i == len(s)
first = i < len(s) and (p[j] == '.' or p[j] == s[i])
if j + 1 < len(p) and p[j + 1] == '*':
return match(i, j + 2) or (first and match(i + 1, j))
return first and match(i + 1, j + 1)
return match(0, 0)
def _run_tests():
# problem statement examples
assert is_match("aa", "a") == False
assert is_match("aa", "a*") == True
assert is_match("ab", ".*") == True
assert is_match("mississippi", "mis*is*p*.") == False
# edge: empty string vs empty pattern
assert is_match("", "") == True
# edge: empty string vs "a*" (zero occurrences)
assert is_match("", "a*") == True
print("all tests pass")
if __name__ == "__main__":
_run_tests()
  • Strings, regex over prefix lengths