Strings
Intro
A string is a sequence of characters, usually stored as an array of bytes (C), a length-prefixed array of code units (Java, Go, Rust), or an immutable object wrapping one (Python, JavaScript). Many interview problems are ostensibly string problems but fundamentally array problems, with the extra wrinkles of encoding (ASCII vs. UTF-8 vs. UTF-16), immutability, and specialized matching algorithms.
In-depth description
Immutability matters for complexity analysis. In Python/Java/JS, naive s = s + c inside a loop is O(n²) because each concatenation creates a new string, use StringBuilder (Java), a list + "".join() (Python), or mutable byte buffers. In C and Rust, strings can be mutated in place but you own the memory and encoding.
Character counting is the most common string preprocessing step. For ASCII-only input a 128-element integer array outperforms a hash map. For Unicode, fall back to a dict/hash map. Sorting characters of a string is often the simplest anagram canonicalization, O(k log k) where k is string length.
String matching algorithms come up occasionally:
- KMP (Knuth-Morris-Pratt), O(n + m) with a precomputed failure function.
- Rabin-Karp, polynomial rolling hash for fast multi-pattern and approximate matching.
- Z-algorithm, computes prefix-match lengths in O(n), simpler than KMP for some problems.
- Manacher’s algorithm, finds the longest palindromic substring in O(n) (the O(n²) expand-around-center version is interview-standard).
String DP, edit distance, LCS, regex matching, is a recurring theme and one of the hardest interview topic areas.
Time complexity
| Operation | Average | Worst |
|---|---|---|
| Access by index | O(1) | O(1) |
| Concatenation (new string) | O(n + m) | O(n + m) |
| Substring check (naive) | O(n·m) | O(n·m) |
| Substring check (KMP / Z) | O(n + m) | O(n + m) |
| Comparison | O(min(n, m)) | O(min(n, m)) |
| Sort characters | O(k log k) | O(k log k) |
| Space | O(n) | O(n) |
Common uses in DSA
- Anagrams and character frequency, Valid Anagram, Group Anagrams, Find All Anagrams in a String.
- Palindrome detection, Valid Palindrome, Longest Palindromic Substring (expand around center or Manacher’s), Palindromic Substrings.
- Pattern matching, Implement strStr() (needle in haystack), Repeated Substring Pattern, Find the Index of the First Occurrence.
- Sliding window on strings, Longest Substring Without Repeating Characters, Minimum Window Substring, Longest Repeating Character Replacement.
- Edit distance and string DP, Edit Distance, Longest Common Subsequence, Regular Expression Matching, Wildcard Matching.
Canonical LeetCode problems: #3 Longest Substring Without Repeating Characters, #5 Longest Palindromic Substring, #20 Valid Parentheses, #49 Group Anagrams, #76 Minimum Window Substring, #125 Valid Palindrome, #242 Valid Anagram.
Python example
from collections import Counter
# Immutability pitfall:# BAD (O(n^2)): s = ""; for c in chars: s += c# GOOD (O(n)): "".join(chars)
# Anagram check with Counter, O(n)def is_anagram(s, t): return Counter(s) == Counter(t)
# Palindrome check with two pointers, O(n), skipping non-alphanumericsdef is_palindrome(s): s = [c.lower() for c in s if c.isalnum()] l, r = 0, len(s), 1 while l < r: if s[l] != s[r]: return False l, r = l + 1, r, 1 return True
# Longest palindromic substring (expand around center), O(n^2)def longest_palindrome(s): def expand(l, r): while l >= 0 and r < len(s) and s[l] == s[r]: l, r = l, 1, r + 1 return s[l + 1:r] best = "" for i in range(len(s)): for cand in (expand(i, i), expand(i, i + 1)): if len(cand) > len(best): best = cand return best
# Longest substring without repeating characters (sliding window), O(n)def longest_unique(s): seen, left, best = {}, 0, 0 for right, ch in enumerate(s): if ch in seen and seen[ch] >= left: left = seen[ch] + 1 seen[ch] = right best = max(best, right, left + 1) return best
# Group Anagrams, hash by sorted stringdef group_anagrams(words): from collections import defaultdict groups = defaultdict(list) for w in words: groups["".join(sorted(w))].append(w) return list(groups.values())LeetCode problems
Strings appear in 23 NeetCode 150 problems across 9 categories.
Arrays & Hashing:
Two Pointers:
Sliding Window:
- 3. Longest Substring Without Repeating Characters
- 76. Minimum Window Substring
- 424. Longest Repeating Character Replacement
- 567. Permutation in String
Stack:
Backtracking:
1-D Dynamic Programming:
2-D Dynamic Programming:
- 10. Regular Expression Matching
- 72. Edit Distance
- 97. Interleaving String
- 115. Distinct Subsequences
- 1143. Longest Common Subsequence
Greedy:
Math & Geometry: