287. Find the Duplicate Number (Medium)

Problem

Given an array nums of n + 1 integers where each value is in the range [1, n], there is exactly one number that appears two or more times. Find that number.

Constraints:

You must not modify the array.
You must use only constant extra space.
The runtime must be less than O(n²).

Example

nums = [1,3,4,2,2] → 2
nums = [3,1,3,4,2] → 3
nums = [3,3,3,3,3] → 3

LeetCode 287 · Link · Medium

Approach 1: Brute force, sort

Sort the array; the duplicate sits next to one of its copies.

def find_duplicate(nums):
    nums_sorted = sorted(nums)                    # L1: O(n log n) sort
    for i in range(1, len(nums_sorted)):
        if nums_sorted[i] == nums_sorted[i - 1]:  # L2: O(1) compare adjacent
            return nums_sorted[i]
    return -1

Where the time goes, line by line

Variables: n = length of nums (n + 1 integers in range [1, n]).

Line	Per-call cost	Times executed	Contribution
L1 (sort)	O(n log n)	1	O(n log n) ← dominates
L2 (scan adjacent)	O(1)	up to n	O(n)

Sorting is the only superlinear step. The scan at L2 is O(n) but dominated by L1.

Complexity

Time: O(n log n), driven by L1.
Space: O(n) (Python sort copies).

Violates “don’t modify” in spirit (we’re using a sorted copy, but the problem usually permits non-destructive). Fails the O(1) space constraint.

Approach 2: Hash set

Walk once, checking a set.

def find_duplicate(nums):
    seen = set()
    for x in nums:
        if x in seen:          # L1: O(1) set lookup
            return x
        seen.add(x)            # L2: O(1) set insert
    return -1

Where the time goes, line by line

Variables: n = length of nums (n + 1 integers in range [1, n]).

Line	Per-call cost	Times executed	Contribution
L1 (set lookup)	O(1)	up to n + 1	O(n) ← dominates
L2 (set insert)	O(1)	up to n	O(n)

We return as soon as the duplicate is found, so at most n + 1 iterations.

Complexity

Time: O(n), driven by L1/L2.
Space: O(n).

Fails the O(1) space constraint.

Approach 3: Floyd’s cycle detection on array-as-linked-list (optimal)

Treat i → nums[i] as a linked-list transition. With values in [1, n] and an array of length n + 1, a duplicate value creates a cycle in this implicit linked list. The entry of the cycle is the duplicate.

Standard Floyd’s: phase 1 finds a point inside the cycle; phase 2 resets one pointer to the start and walks both by one, they meet at the cycle entrance.

def find_duplicate(nums):
    # Phase 1: find meeting point inside the cycle
    slow = nums[0]
    fast = nums[0]
    while True:
        slow = nums[slow]          # L1: O(1) slow advances 1 step
        fast = nums[nums[fast]]    # L2: O(1) fast advances 2 steps
        if slow == fast:
            break
    # Phase 2: find the entrance of the cycle
    slow = nums[0]
    while slow != fast:
        slow = nums[slow]          # L3: O(1) slow advances 1 step
        fast = nums[fast]          # L4: O(1) fast advances 1 step
    return slow

Where the time goes, line by line

Variables: n = length of nums minus 1 (values in range [1, n], array length n + 1).

Line	Per-call cost	Times executed	Contribution
L1-L2 (phase 1)	O(1)	up to n + cycle length	O(n) ← dominates phase 1
L3-L4 (phase 2)	O(1)	up to n	O(n) ← dominates phase 2

Phase 1 detects the cycle in O(n + c) steps where c is the cycle length. Phase 2 walks at most n steps to reach the cycle entrance. Total: O(n).

Complexity

Time: O(n), driven by L1-L2 (phase 1) and L3-L4 (phase 2).
Space: O(1).

Why array-as-linked-list works

nums[i] is in [1, n], so it’s always a valid next index. Starting from index 0, you can’t revisit 0 (since no nums[i] == 0), so any cycle you encounter must begin at the duplicate value, every other value has exactly one predecessor.

Test cases

# Quick smoke tests, paste into a REPL or save as test_287.py and run.
# Uses Floyd's cycle detection approach (Approach 3).

def find_duplicate(nums):
    slow = nums[0]
    fast = nums[0]
    while True:
        slow = nums[slow]
        fast = nums[nums[fast]]
        if slow == fast:
            break
    slow = nums[0]
    while slow != fast:
        slow = nums[slow]
        fast = nums[fast]
    return slow

def _run_tests():
    # Examples from problem statement
    assert find_duplicate([1,3,4,2,2]) == 2
    assert find_duplicate([3,1,3,4,2]) == 3
    # All same value
    assert find_duplicate([3,3,3,3,3]) == 3
    # Duplicate at boundary
    assert find_duplicate([1,1]) == 1
    assert find_duplicate([2,2,2,1]) == 2
    print("all tests pass")

if __name__ == "__main__":
    _run_tests()

Summary

Approach	Time	Space	Meets constraints?
Sort	O(n log n)	O(n)	✗
Hash set	O(n)	O(n)	✗
Floyd’s on array-as-list	O(n)	O(1)	✓

This is a classic “looks like an array problem, solved with a linked-list algorithm” puzzle. Recognizing the array-as-linked-list framing is the key.

Arrays, input; indexed as implicit linked list
Linked Lists, Floyd’s tortoise and hare

287. Find the Duplicate Number (Medium)

Problem

Approach 1: Brute force, sort

Approach 2: Hash set

Approach 3: Floyd’s cycle detection on array-as-linked-list (optimal)

Why array-as-linked-list works

Test cases

Summary

Related data structures