Skip to content

271. Encode and Decode Strings (Medium)

Problem

Design an algorithm to encode a list of strings into a single string, and a second algorithm to decode the single string back to the original list. The strings can contain any valid ASCII characters including delimiters and digits.

Example

  • Input: ["hello","world","foo","bar"]
  • encoded = encode(["hello","world","foo","bar"])
  • decoded = decode(encoded)["hello","world","foo","bar"]

LeetCode 271 (premium; free equivalent exists as LC 659 / 1923) · Link · Medium

Approach 1: Brute force, single-char delimiter + escape

Pick a rare character as a delimiter, escape any occurrences in the source.

def encode(strs: list[str]) -> str: # L1: join with escaped delimiter
return "\x1f".join(s.replace("\\", "\\\\").replace("\x1f", "\\u") for s in strs)
def decode(s: str) -> list[str]: # L2: split + unescape
result, parts = [], s.split("\x1f")
return [p.replace("\\u", "\x1f").replace("\\\\", "\\") for p in parts]

Where the time goes, line by line

Variables: N = total number of characters across all strings, m = number of strings.

LinePer-call costTimes executedContribution
L1 (encode: escape + join)O(N)1O(N) ← dominates
L2 (decode: split + unescape)O(N)1O(N) ← dominates

Both encode and decode scan every character a constant number of times.

Complexity

  • Time: O(N) where N is the total number of characters (both encode and decode are linear in output size).
  • Space: O(N).

Fragile: if the input can contain ANY ASCII (or Unicode), picking a “rare” delimiter is a footgun. This is how you get a “works in tests, breaks in prod” bug.

Approach 2: JSON-serialize

Offload escaping to a proven serializer.

import json
def encode(strs: list[str]) -> str:
return json.dumps(strs) # L1: O(N) JSON encode
def decode(s: str) -> list[str]:
return json.loads(s) # L2: O(N) JSON decode

Where the time goes, line by line

Variables: N = total number of characters across all strings.

LinePer-call costTimes executedContribution
L1 (json.dumps)O(N)1O(N) ← dominates
L2 (json.loads)O(N)1O(N) ← dominates

JSON serialization and deserialization are linear in the total character count.

Complexity

  • Time: O(N).
  • Space: O(N).

Correct and robust. Not typically accepted on LeetCode because the problem wants you to design the scheme, but worth knowing for real-world code, the right answer unless there’s a reason to roll your own.

Approach 3: Length-prefix encoding (optimal, self-delimiting)

Prefix each string with its length and a fixed delimiter (e.g., #). The length tells the decoder exactly how many characters to take next, no escaping needed.

def encode(strs: list[str]) -> str:
return "".join(f"{len(s)}#{s}" for s in strs) # L1: O(N) one pass
def decode(s: str) -> list[str]:
result, i = [], 0 # L2: O(1) init
while i < len(s): # L3: loop, advances by len(each string)+header
j = s.index('#', i) # L4: O(len_digits) scan for '#'
length = int(s[i:j]) # L5: O(len_digits) parse int
result.append(s[j + 1:j + 1 + length]) # L6: O(length) slice
i = j + 1 + length # L7: O(1) advance
return result

Where the time goes, line by line

Variables: N = total number of characters across all strings, m = number of strings.

LinePer-call costTimes executedContribution
L1 (encode: format + join)O(N)1O(N) ← dominates
L2 (init)O(1)1O(1)
L3 (loop)O(1)m iterationsO(m)
L4 (index ’#‘)O(digits)mO(m·d)
L6 (slice)O(length)mO(N) total ← dominates decode
L7 (advance)O(1)mO(m)

Each character in the original strings is visited exactly once during the decode slice (L6). The loop overhead is O(m) for headers.

Complexity

  • Time: O(N). Each character is visited a constant number of times.
  • Space: O(N).

This works for any character content, including the # delimiter, because the length prefix makes the scheme self-delimiting. The canonical interview answer.

Summary

ApproachTimeSpaceNotes
Delimiter + escapeO(N)O(N)Fragile; easy to corrupt
JSONO(N)O(N)Production-correct; not always accepted
Length prefixO(N)O(N)Robust and self-delimiting

Length-prefix encoding is the pattern behind many real-world formats, Pascal strings, netstrings, Protocol Buffers’ length-delimited format, HTTP chunked encoding.

Test cases

# Quick smoke tests, paste into a REPL or save as test_encode_decode.py and run.
# Uses the canonical implementation (Approach 3: length-prefix encoding).
def encode(strs: list[str]) -> str:
return "".join(f"{len(s)}#{s}" for s in strs)
def decode(s: str) -> list[str]:
result, i = [], 0
while i < len(s):
j = s.index('#', i)
length = int(s[i:j])
result.append(s[j + 1:j + 1 + length])
i = j + 1 + length
return result
def _run_tests():
cases = [
["hello", "world", "foo", "bar"],
[""],
["a"],
[],
["hello#world", "foo#bar"], # '#' inside strings
["5#abc", "def"], # digits + '#' inside strings
]
for strs in cases:
assert decode(encode(strs)) == strs, f"Failed on: {strs}"
print("all tests pass")
if __name__ == "__main__":
_run_tests()
  • Strings, input/output; immutability-aware concatenation
  • Arrays, the list container being serialized