Hash Tables
Intro
A hash table (also: hash map, dictionary, associative array) stores key-value pairs with average-case O(1) insert, delete, and lookup. It’s the single most impactful data structure for interview problem-solving, the difference between a brute-force O(n²) solution and an optimal O(n) one is almost always a hash table.
In-depth description
A hash function maps keys of arbitrary type to integers in some bucket range. Each bucket points to zero or more entries; collisions happen when two keys hash to the same bucket. The two standard collision-handling approaches:
- Separate chaining, each bucket holds a linked list (or small array/tree) of entries. Simple, degrades gracefully.
- Open addressing, on collision, probe to another bucket (linear, quadratic, or double hashing). Better cache locality; requires tombstones for deletion.
The load factor (entries / buckets) determines collision likelihood. Most implementations resize and rehash when the load factor exceeds a threshold (~0.75 for Java HashMap, ~0.5 for open addressing). Resizing is amortized O(1) per insert.
Worst case is O(n), if every key hashes to the same bucket, lookup degrades to linear scan. For adversarial inputs, this is a real DoS risk (hash-flooding); hence randomized hash seeds in modern languages. Java 8 even upgrades chains to balanced trees at a threshold, giving O(log n) worst case.
A hash set is a hash table that stores only keys. Use it for “have I seen this?” questions. Critical interview insight: complementary-value lookups (for every x, check if target, x exists) turn pair-finding problems into linear-time scans.
Time complexity
| Operation | Average | Worst |
|---|---|---|
| Insert | O(1) | O(n) |
| Delete | O(1) | O(n) |
| Lookup | O(1) | O(n) |
| Iteration | O(n) | O(n) |
| Space | O(n) | O(n) |
(With Java 8+ treeified buckets: worst case is O(log n) for lookup.)
Common uses in DSA
- O(1) membership / frequency lookup, Two Sum, Contains Duplicate, First Unique Character, Intersection of Two Arrays.
- Complement / pair-finding, Two Sum variants, 4Sum II (two-sum on pair sums), Pairs of Songs With Total Durations Divisible by 60.
- Frequency counting, Top K Frequent Elements (with a heap), Valid Anagram, Ransom Note, Majority Element.
- Prefix sum with hash, Subarray Sum Equals K, Continuous Subarray Sum, Contiguous Array (count of 0s = 1s).
- Deduplication and grouping, Group Anagrams (key by sorted string or char-count tuple), Longest Consecutive Sequence, Longest Substring Without Repeating Characters.
Canonical LeetCode problems: #1 Two Sum, #49 Group Anagrams, #128 Longest Consecutive Sequence, #146 LRU Cache, #217 Contains Duplicate, #347 Top K Frequent Elements, #560 Subarray Sum Equals K.
Python example
from collections import defaultdict, Counter
# Basic dictd = {"a": 1, "b": 2}d["a"] # 1d.get("c", 0) # 0 (default for missing key)"a" in d # True, O(1) membership
# defaultdict for group-by / accumulationgroups = defaultdict(list)for word in ["eat", "tea", "tan", "ate", "nat", "bat"]: key = "".join(sorted(word)) groups[key].append(word)
# Counter for frequency countingc = Counter("mississippi")c.most_common(2) # [('i', 4), ('s', 4)]
# Two Sum: complement lookup in one pass, O(n)def two_sum(nums, target): seen = {} for i, x in enumerate(nums): if target, x in seen: return [seen[target, x], i] seen[x] = i return [-1, -1]
# Subarray Sum Equals K (prefix sum + hash), O(n)def subarray_sum(nums, k): count, total = 0, 0 prefix_counts = {0: 1} # empty prefix has sum 0 once for x in nums: total += x count += prefix_counts.get(total, k, 0) prefix_counts[total] = prefix_counts.get(total, 0) + 1 return count
# Longest Consecutive Sequence, O(n) via set lookupsdef longest_consecutive(nums): s = set(nums) best = 0 for x in s: if x, 1 in s: # only start from run-start candidates continue cur = x length = 1 while cur + 1 in s: cur += 1 length += 1 best = max(best, length) return bestLeetCode problems
Hash tables appear in 34 NeetCode 150 problems across 14 categories.
Arrays & Hashing:
- 1. Two Sum
- 36. Valid Sudoku
- 49. Group Anagrams
- 128. Longest Consecutive Sequence
- 217. Contains Duplicate
- 242. Valid Anagram
- 347. Top K Frequent Elements
Two Pointers:
- 15. 3Sum, hash-set variant for inner two-sum
Sliding Window:
- 3. Longest Substring Without Repeating Characters
- 76. Minimum Window Substring
- 424. Longest Repeating Character Replacement
- 567. Permutation in String
Stack:
Binary Search:
Linked List:
- 138. Copy List with Random Pointer
- 146. LRU Cache, hash map + doubly linked list
Trees:
Heap / Priority Queue:
Backtracking:
- 17. Letter Combinations of a Phone Number
- 51. N-Queens, columns/diagonals as sets
- 79. Word Search, Counter pruning
- 90. Subsets II
Graphs:
- 127. Word Ladder, pattern-bucket adjacency
- 133. Clone Graph
- 207. Course Schedule, adjacency list
- 269. Alien Dictionary
- 329. Longest Increasing Path in a Matrix, memoization cache
- 417. Pacific Atlantic Water Flow, reachability sets
1-D Dynamic Programming:
Greedy:
Math & Geometry:
- 202. Happy Number, seen-set variant
- 2013. Detect Squares