10. Regular Expression Matching (Hard)

Problem

Implement regular expression matching with support for:

., matches any single character.
*, matches zero or more of the preceding element.

The match must cover the entire input string (not partial).

Example

s = "aa", p = "a" → false
s = "aa", p = "a*" → true
s = "ab", p = ".*" → true
s = "mississippi", p = "mis*is*p*." → false

LeetCode 10 · Link · Hard

Approach 1: Recursive

Match character-by-character, special-casing *.

def is_match(s, p):
    def match(i, j):                                           # L1: define recursive helper
        if j == len(p):                                        # L2: O(1) base case check
            return i == len(s)
        first = i < len(s) and (p[j] == '.' or p[j] == s[i]) # L3: O(1) single-char match check
        if j + 1 < len(p) and p[j + 1] == '*':               # L4: O(1) check for '*'
            # zero copies, or one more copy
            return match(i, j + 2) or (first and match(i + 1, j))  # L5: two recursive calls
        return first and match(i + 1, j + 1)                  # L6: one recursive call
    return match(0, 0)

Where the time goes, line by line

Variables: m = len(s), n = len(p).

Line	Per-call cost	Times executed	Contribution
L1-L4 (setup + checks)	O(1)	per call	O(1) each
*L5 (two recursive branches on ’‘)**	O(1) work + 2 calls	up to O(2^(m+n)) calls total	O(2^(m+n)) ← dominates
L6 (one recursive branch)	O(1) work + 1 call	per call	O(2^(m+n)) same tree

The exponential blowup comes from the * branch at L5: each * pair can spawn two calls. Without memoization the same (i, j) pair is recomputed many times.

Complexity

Time: O(2^(m + n)) worst case due to overlapping subproblems (driven by L5 double-branching).
Space: O(m + n) recursion depth.

Approach 2: Top-down memoized (canonical)

Cache by (i, j).

from functools import lru_cache

def is_match(s, p):
    @lru_cache(maxsize=None)                                   # L1: O(1) cache decorator setup
    def match(i, j):
        if j == len(p):                                        # L2: O(1) base case
            return i == len(s)
        first = i < len(s) and (p[j] == '.' or p[j] == s[i]) # L3: O(1) char match check
        if j + 1 < len(p) and p[j + 1] == '*':               # L4: O(1) '*' check
            return match(i, j + 2) or (first and match(i + 1, j))  # L5: O(1) with cache
        return first and match(i + 1, j + 1)                  # L6: O(1) with cache
    return match(0, 0)

Where the time goes, line by line

Variables: m = len(s), n = len(p).

Line	Per-call cost	Times executed	Contribution
L1 (lru_cache)	O(1)	1	O(1)
L2-L4 (checks)	O(1)	once per unique (i,j)	O(m · n) total
L5, L6 (recursive calls)	O(1) per call (cache hit after first)	at most m · n unique states	O(m · n) ← dominates

With memoization, each unique (i, j) pair is computed exactly once. There are (m+1) * (n+1) such pairs, each doing O(1) work.

Complexity

Time: O(m · n), driven by L5/L6 over all unique (i,j) states.
Space: O(m · n) for the memo table.

Approach 3: Bottom-up 2-D DP

dp[i][j] = does s[:i] match p[:j]?

def is_match(s, p):
    m, n = len(s), len(p)                                  # L1: O(1)
    dp = [[False] * (n + 1) for _ in range(m + 1)]        # L2: O(m*n) table init
    dp[0][0] = True                                        # L3: O(1) base case

    # Empty string vs. patterns like "a*", "a*b*"
    for j in range(1, n + 1):                             # L4: O(n) init loop
        if p[j - 1] == '*':
            dp[0][j] = dp[0][j - 2]

    for i in range(1, m + 1):                             # L5: outer loop O(m)
        for j in range(1, n + 1):                         # L6: inner loop O(n)
            if p[j - 1] == '*':                           # L7: O(1) check
                # zero occurrences
                dp[i][j] = dp[i][j - 2]                  # L8: O(1)
                # one or more: previous char matches current s
                if p[j - 2] == '.' or p[j - 2] == s[i - 1]:  # L9: O(1)
                    dp[i][j] = dp[i][j] or dp[i - 1][j]      # L10: O(1)
            else:
                if p[j - 1] == '.' or p[j - 1] == s[i - 1]:  # L11: O(1)
                    dp[i][j] = dp[i - 1][j - 1]               # L12: O(1)
    return dp[m][n]

Where the time goes, line by line

Variables: m = len(s), n = len(p).

Line	Per-call cost	Times executed	Contribution
L1-L3 (init)	O(1) or O(m·n)	1	O(m · n)
L4 (base-case loop)	O(1)	n	O(n)
L5+L6 (double loop)	O(1) body	m · n	O(m · n) ← dominates
L7-L12 (table fills)	O(1)	once each	included above

Every cell is filled in O(1) with a constant number of table lookups. The double loop at L5/L6 visits all m * n cells exactly once.

Complexity

Time: O(m · n), driven by L5/L6 (the full double loop over all DP cells).
Space: O(m · n) for the DP table.

Summary

Approach	Time	Space
Naive recursion	O(2^(m + n))	O(m + n)
Top-down memoized	O(m · n)	O(m · n)
Bottom-up 2-D DP	O(m · n)	O(m · n)

This problem rewards memorization, the recurrence is fiddly and the edge cases around * at the start are easy to bungle. Interview-standard answer is memoized recursion for clarity.

Test cases

# Quick smoke tests, paste into a REPL or save as test_010.py and run.
# Uses the canonical implementation (Approach 2: top-down memoized).

from functools import lru_cache

def is_match(s, p):
    @lru_cache(maxsize=None)
    def match(i, j):
        if j == len(p):
            return i == len(s)
        first = i < len(s) and (p[j] == '.' or p[j] == s[i])
        if j + 1 < len(p) and p[j + 1] == '*':
            return match(i, j + 2) or (first and match(i + 1, j))
        return first and match(i + 1, j + 1)
    return match(0, 0)

def _run_tests():
    # problem statement examples
    assert is_match("aa", "a") == False
    assert is_match("aa", "a*") == True
    assert is_match("ab", ".*") == True
    assert is_match("mississippi", "mis*is*p*.") == False
    # edge: empty string vs empty pattern
    assert is_match("", "") == True
    # edge: empty string vs "a*" (zero occurrences)
    assert is_match("", "a*") == True
    print("all tests pass")

if __name__ == "__main__":
    _run_tests()

Strings, regex over prefix lengths

10. Regular Expression Matching (Hard)

Problem

Approach 1: Recursive

Approach 2: Top-down memoized (canonical)

Approach 3: Bottom-up 2-D DP

Summary

Test cases

Related data structures