271. Encode and Decode Strings (Medium)

Problem

Design an algorithm to encode a list of strings into a single string, and a second algorithm to decode the single string back to the original list. The strings can contain any valid ASCII characters including delimiters and digits.

Example

Input: ["hello","world","foo","bar"]
encoded = encode(["hello","world","foo","bar"])
decoded = decode(encoded) → ["hello","world","foo","bar"]

LeetCode 271 (premium; free equivalent exists as LC 659 / 1923) · Link · Medium

Approach 1: Brute force, single-char delimiter + escape

Pick a rare character as a delimiter, escape any occurrences in the source.

def encode(strs: list[str]) -> str:          # L1: join with escaped delimiter
    return "\x1f".join(s.replace("\\", "\\\\").replace("\x1f", "\\u") for s in strs)

def decode(s: str) -> list[str]:             # L2: split + unescape
    result, parts = [], s.split("\x1f")
    return [p.replace("\\u", "\x1f").replace("\\\\", "\\") for p in parts]

Where the time goes, line by line

Variables: N = total number of characters across all strings, m = number of strings.

Line	Per-call cost	Times executed	Contribution
L1 (encode: escape + join)	O(N)	1	O(N) ← dominates
L2 (decode: split + unescape)	O(N)	1	O(N) ← dominates

Both encode and decode scan every character a constant number of times.

Complexity

Time: O(N) where N is the total number of characters (both encode and decode are linear in output size).
Space: O(N).

Fragile: if the input can contain ANY ASCII (or Unicode), picking a “rare” delimiter is a footgun. This is how you get a “works in tests, breaks in prod” bug.

Approach 2: JSON-serialize

Offload escaping to a proven serializer.

import json

def encode(strs: list[str]) -> str:
    return json.dumps(strs)        # L1: O(N) JSON encode

def decode(s: str) -> list[str]:
    return json.loads(s)           # L2: O(N) JSON decode

Where the time goes, line by line

Variables: N = total number of characters across all strings.

Line	Per-call cost	Times executed	Contribution
L1 (json.dumps)	O(N)	1	O(N) ← dominates
L2 (json.loads)	O(N)	1	O(N) ← dominates

JSON serialization and deserialization are linear in the total character count.

Complexity

Time: O(N).
Space: O(N).

Correct and robust. Not typically accepted on LeetCode because the problem wants you to design the scheme, but worth knowing for real-world code, the right answer unless there’s a reason to roll your own.

Approach 3: Length-prefix encoding (optimal, self-delimiting)

Prefix each string with its length and a fixed delimiter (e.g., #). The length tells the decoder exactly how many characters to take next, no escaping needed.

def encode(strs: list[str]) -> str:
    return "".join(f"{len(s)}#{s}" for s in strs)  # L1: O(N) one pass

def decode(s: str) -> list[str]:
    result, i = [], 0                   # L2: O(1) init
    while i < len(s):                   # L3: loop, advances by len(each string)+header
        j = s.index('#', i)             # L4: O(len_digits) scan for '#'
        length = int(s[i:j])            # L5: O(len_digits) parse int
        result.append(s[j + 1:j + 1 + length])  # L6: O(length) slice
        i = j + 1 + length              # L7: O(1) advance
    return result

Where the time goes, line by line

Variables: N = total number of characters across all strings, m = number of strings.

Line	Per-call cost	Times executed	Contribution
L1 (encode: format + join)	O(N)	1	O(N) ← dominates
L2 (init)	O(1)	1	O(1)
L3 (loop)	O(1)	m iterations	O(m)
L4 (index ’#‘)	O(digits)	m	O(m·d)
L6 (slice)	O(length)	m	O(N) total ← dominates decode
L7 (advance)	O(1)	m	O(m)

Each character in the original strings is visited exactly once during the decode slice (L6). The loop overhead is O(m) for headers.

Complexity

Time: O(N). Each character is visited a constant number of times.
Space: O(N).

This works for any character content, including the # delimiter, because the length prefix makes the scheme self-delimiting. The canonical interview answer.

Summary

Approach	Time	Space	Notes
Delimiter + escape	O(N)	O(N)	Fragile; easy to corrupt
JSON	O(N)	O(N)	Production-correct; not always accepted
Length prefix	O(N)	O(N)	Robust and self-delimiting

Length-prefix encoding is the pattern behind many real-world formats, Pascal strings, netstrings, Protocol Buffers’ length-delimited format, HTTP chunked encoding.

Test cases

# Quick smoke tests, paste into a REPL or save as test_encode_decode.py and run.
# Uses the canonical implementation (Approach 3: length-prefix encoding).

def encode(strs: list[str]) -> str:
    return "".join(f"{len(s)}#{s}" for s in strs)

def decode(s: str) -> list[str]:
    result, i = [], 0
    while i < len(s):
        j = s.index('#', i)
        length = int(s[i:j])
        result.append(s[j + 1:j + 1 + length])
        i = j + 1 + length
    return result

def _run_tests():
    cases = [
        ["hello", "world", "foo", "bar"],
        [""],
        ["a"],
        [],
        ["hello#world", "foo#bar"],   # '#' inside strings
        ["5#abc", "def"],             # digits + '#' inside strings
    ]
    for strs in cases:
        assert decode(encode(strs)) == strs, f"Failed on: {strs}"
    print("all tests pass")

if __name__ == "__main__":
    _run_tests()

Strings, input/output; immutability-aware concatenation
Arrays, the list container being serialized

271. Encode and Decode Strings (Medium)

Problem

Approach 1: Brute force, single-char delimiter + escape

Approach 2: JSON-serialize

Approach 3: Length-prefix encoding (optimal, self-delimiting)

Summary

Test cases

Related data structures