271. Encode and Decode Strings (Medium)
Problem
Design an algorithm to encode a list of strings into a single string, and a second algorithm to decode the single string back to the original list. The strings can contain any valid ASCII characters including delimiters and digits.
Example
- Input:
["hello","world","foo","bar"] encoded = encode(["hello","world","foo","bar"])decoded = decode(encoded)→["hello","world","foo","bar"]
LeetCode 271 (premium; free equivalent exists as LC 659 / 1923) · Link · Medium
Approach 1: Brute force, single-char delimiter + escape
Pick a rare character as a delimiter, escape any occurrences in the source.
def encode(strs: list[str]) -> str: # L1: join with escaped delimiter return "\x1f".join(s.replace("\\", "\\\\").replace("\x1f", "\\u") for s in strs)
def decode(s: str) -> list[str]: # L2: split + unescape result, parts = [], s.split("\x1f") return [p.replace("\\u", "\x1f").replace("\\\\", "\\") for p in parts]Where the time goes, line by line
Variables: N = total number of characters across all strings, m = number of strings.
| Line | Per-call cost | Times executed | Contribution |
|---|---|---|---|
| L1 (encode: escape + join) | O(N) | 1 | O(N) ← dominates |
| L2 (decode: split + unescape) | O(N) | 1 | O(N) ← dominates |
Both encode and decode scan every character a constant number of times.
Complexity
- Time: O(N) where N is the total number of characters (both encode and decode are linear in output size).
- Space: O(N).
Fragile: if the input can contain ANY ASCII (or Unicode), picking a “rare” delimiter is a footgun. This is how you get a “works in tests, breaks in prod” bug.
Approach 2: JSON-serialize
Offload escaping to a proven serializer.
import json
def encode(strs: list[str]) -> str: return json.dumps(strs) # L1: O(N) JSON encode
def decode(s: str) -> list[str]: return json.loads(s) # L2: O(N) JSON decodeWhere the time goes, line by line
Variables: N = total number of characters across all strings.
| Line | Per-call cost | Times executed | Contribution |
|---|---|---|---|
| L1 (json.dumps) | O(N) | 1 | O(N) ← dominates |
| L2 (json.loads) | O(N) | 1 | O(N) ← dominates |
JSON serialization and deserialization are linear in the total character count.
Complexity
- Time: O(N).
- Space: O(N).
Correct and robust. Not typically accepted on LeetCode because the problem wants you to design the scheme, but worth knowing for real-world code, the right answer unless there’s a reason to roll your own.
Approach 3: Length-prefix encoding (optimal, self-delimiting)
Prefix each string with its length and a fixed delimiter (e.g., #). The length tells the decoder exactly how many characters to take next, no escaping needed.
def encode(strs: list[str]) -> str: return "".join(f"{len(s)}#{s}" for s in strs) # L1: O(N) one pass
def decode(s: str) -> list[str]: result, i = [], 0 # L2: O(1) init while i < len(s): # L3: loop, advances by len(each string)+header j = s.index('#', i) # L4: O(len_digits) scan for '#' length = int(s[i:j]) # L5: O(len_digits) parse int result.append(s[j + 1:j + 1 + length]) # L6: O(length) slice i = j + 1 + length # L7: O(1) advance return resultWhere the time goes, line by line
Variables: N = total number of characters across all strings, m = number of strings.
| Line | Per-call cost | Times executed | Contribution |
|---|---|---|---|
| L1 (encode: format + join) | O(N) | 1 | O(N) ← dominates |
| L2 (init) | O(1) | 1 | O(1) |
| L3 (loop) | O(1) | m iterations | O(m) |
| L4 (index ’#‘) | O(digits) | m | O(m·d) |
| L6 (slice) | O(length) | m | O(N) total ← dominates decode |
| L7 (advance) | O(1) | m | O(m) |
Each character in the original strings is visited exactly once during the decode slice (L6). The loop overhead is O(m) for headers.
Complexity
- Time: O(N). Each character is visited a constant number of times.
- Space: O(N).
This works for any character content, including the # delimiter, because the length prefix makes the scheme self-delimiting. The canonical interview answer.
Summary
| Approach | Time | Space | Notes |
|---|---|---|---|
| Delimiter + escape | O(N) | O(N) | Fragile; easy to corrupt |
| JSON | O(N) | O(N) | Production-correct; not always accepted |
| Length prefix | O(N) | O(N) | Robust and self-delimiting |
Length-prefix encoding is the pattern behind many real-world formats, Pascal strings, netstrings, Protocol Buffers’ length-delimited format, HTTP chunked encoding.
Test cases
# Quick smoke tests, paste into a REPL or save as test_encode_decode.py and run.# Uses the canonical implementation (Approach 3: length-prefix encoding).
def encode(strs: list[str]) -> str: return "".join(f"{len(s)}#{s}" for s in strs)
def decode(s: str) -> list[str]: result, i = [], 0 while i < len(s): j = s.index('#', i) length = int(s[i:j]) result.append(s[j + 1:j + 1 + length]) i = j + 1 + length return result
def _run_tests(): cases = [ ["hello", "world", "foo", "bar"], [""], ["a"], [], ["hello#world", "foo#bar"], # '#' inside strings ["5#abc", "def"], # digits + '#' inside strings ] for strs in cases: assert decode(encode(strs)) == strs, f"Failed on: {strs}" print("all tests pass")
if __name__ == "__main__": _run_tests()