287. Find the Duplicate Number (Medium)
Problem
Given an array nums of n + 1 integers where each value is in the range [1, n], there is exactly one number that appears two or more times. Find that number.
Constraints:
- You must not modify the array.
- You must use only constant extra space.
- The runtime must be less than O(n²).
Example
nums = [1,3,4,2,2]→2nums = [3,1,3,4,2]→3nums = [3,3,3,3,3]→3
LeetCode 287 · Link · Medium
Approach 1: Brute force, sort
Sort the array; the duplicate sits next to one of its copies.
def find_duplicate(nums): nums_sorted = sorted(nums) # L1: O(n log n) sort for i in range(1, len(nums_sorted)): if nums_sorted[i] == nums_sorted[i - 1]: # L2: O(1) compare adjacent return nums_sorted[i] return -1Where the time goes, line by line
Variables: n = length of nums (n + 1 integers in range [1, n]).
| Line | Per-call cost | Times executed | Contribution |
|---|---|---|---|
| L1 (sort) | O(n log n) | 1 | O(n log n) ← dominates |
| L2 (scan adjacent) | O(1) | up to n | O(n) |
Sorting is the only superlinear step. The scan at L2 is O(n) but dominated by L1.
Complexity
- Time: O(n log n), driven by L1.
- Space: O(n) (Python sort copies).
Violates “don’t modify” in spirit (we’re using a sorted copy, but the problem usually permits non-destructive). Fails the O(1) space constraint.
Approach 2: Hash set
Walk once, checking a set.
def find_duplicate(nums): seen = set() for x in nums: if x in seen: # L1: O(1) set lookup return x seen.add(x) # L2: O(1) set insert return -1Where the time goes, line by line
Variables: n = length of nums (n + 1 integers in range [1, n]).
| Line | Per-call cost | Times executed | Contribution |
|---|---|---|---|
| L1 (set lookup) | O(1) | up to n + 1 | O(n) ← dominates |
| L2 (set insert) | O(1) | up to n | O(n) |
We return as soon as the duplicate is found, so at most n + 1 iterations.
Complexity
- Time: O(n), driven by L1/L2.
- Space: O(n).
Fails the O(1) space constraint.
Approach 3: Floyd’s cycle detection on array-as-linked-list (optimal)
Treat i → nums[i] as a linked-list transition. With values in [1, n] and an array of length n + 1, a duplicate value creates a cycle in this implicit linked list. The entry of the cycle is the duplicate.
Standard Floyd’s: phase 1 finds a point inside the cycle; phase 2 resets one pointer to the start and walks both by one, they meet at the cycle entrance.
def find_duplicate(nums): # Phase 1: find meeting point inside the cycle slow = nums[0] fast = nums[0] while True: slow = nums[slow] # L1: O(1) slow advances 1 step fast = nums[nums[fast]] # L2: O(1) fast advances 2 steps if slow == fast: break # Phase 2: find the entrance of the cycle slow = nums[0] while slow != fast: slow = nums[slow] # L3: O(1) slow advances 1 step fast = nums[fast] # L4: O(1) fast advances 1 step return slowWhere the time goes, line by line
Variables: n = length of nums minus 1 (values in range [1, n], array length n + 1).
| Line | Per-call cost | Times executed | Contribution |
|---|---|---|---|
| L1-L2 (phase 1) | O(1) | up to n + cycle length | O(n) ← dominates phase 1 |
| L3-L4 (phase 2) | O(1) | up to n | O(n) ← dominates phase 2 |
Phase 1 detects the cycle in O(n + c) steps where c is the cycle length. Phase 2 walks at most n steps to reach the cycle entrance. Total: O(n).
Complexity
- Time: O(n), driven by L1-L2 (phase 1) and L3-L4 (phase 2).
- Space: O(1).
Why array-as-linked-list works
nums[i] is in [1, n], so it’s always a valid next index. Starting from index 0, you can’t revisit 0 (since no nums[i] == 0), so any cycle you encounter must begin at the duplicate value, every other value has exactly one predecessor.
Test cases
# Quick smoke tests, paste into a REPL or save as test_287.py and run.# Uses Floyd's cycle detection approach (Approach 3).
def find_duplicate(nums): slow = nums[0] fast = nums[0] while True: slow = nums[slow] fast = nums[nums[fast]] if slow == fast: break slow = nums[0] while slow != fast: slow = nums[slow] fast = nums[fast] return slow
def _run_tests(): # Examples from problem statement assert find_duplicate([1,3,4,2,2]) == 2 assert find_duplicate([3,1,3,4,2]) == 3 # All same value assert find_duplicate([3,3,3,3,3]) == 3 # Duplicate at boundary assert find_duplicate([1,1]) == 1 assert find_duplicate([2,2,2,1]) == 2 print("all tests pass")
if __name__ == "__main__": _run_tests()Summary
| Approach | Time | Space | Meets constraints? |
|---|---|---|---|
| Sort | O(n log n) | O(n) | ✗ |
| Hash set | O(n) | O(n) | ✗ |
| Floyd’s on array-as-list | O(n) | O(1) | ✓ |
This is a classic “looks like an array problem, solved with a linked-list algorithm” puzzle. Recognizing the array-as-linked-list framing is the key.
Related data structures
- Arrays, input; indexed as implicit linked list
- Linked Lists, Floyd’s tortoise and hare