What I learned from Google's Vehicle Routing Problem docs

Context

Spent an afternoon reading through Google OR-Tools’ routing docs and some surrounding papers. Writing up the shape of the problem and a few things that surprised me. Full structured notes are in my Vehicle Routing topic; this post is the narrative version.

The one-line summary

VRP is TSP with multiple vehicles and constraints. You have a depot, a fleet, customers to visit, and a cost matrix. Find the set of routes that minimizes total distance (or the longest single route), subject to constraints like capacity, time windows, and pickup/delivery pairs.

If TSP is “one salesperson, visit every city once and come home,” VRP is “five trucks leave the warehouse, each does its own loop, every customer served exactly once.” TSP is NP-hard, so VRP is too.

What actually surprised me

1. The “dimension” abstraction is really good design

OR-Tools could have bolted on capacity, time windows, breaks, and fuel as separate modules. Instead everything that accumulates along a route is a dimension:

Capacity dimension → CVRP (demand adds up, vehicle has a max)
Time dimension → VRPTW (travel + service accumulates, each node has a time window on its cumul var)
Fuel dimension → refueling variant
Driver hours dimension → break regulations

One abstraction unifies what textbook treatments split into 5–6 unrelated formulations. Reading CVRP and VRPTW source side-by-side, the structure is almost identical, swap the callback, rename the dimension.

2. Clarke-Wright savings (1964) is still in the toolbox

One of the first things I noticed: OR-Tools’ SAVINGS first-solution strategy is the Clarke-Wright savings algorithm, published in 1964. It’s almost human-readable:

Start with one route per customer: DEPOT → i → DEPOT.
For every pair (i, j), compute savings = c(DEPOT, i) + c(DEPOT, j) − c(i, j). That’s how much cost you save by not bouncing back to the depot between them.
Sort pairs by savings descending.
Merge greedily, respecting capacity.

It doesn’t give the best answer; it gives a good starting point that the local-search phase then improves. But 60 years later, it’s still the default initial solution for a huge swath of problems. This is a pattern, some heuristics are shaped like the problem and don’t go obsolete.

3. The phase-1 / phase-2 split is worth understanding before touching the API

Every metaheuristic VRP solver does the same two phases:

Phase 1, build a feasible solution (PATH_CHEAPEST_ARC, SAVINGS, PARALLEL_CHEAPEST_INSERTION, …).
Phase 2, improve via local search + metaheuristic (2-opt, Or-opt, Lin-Kernighan; Guided Local Search, Tabu, Simulated Annealing).

When solutions fail, the failure usually traces to one of these. “No feasible solution found” on a tight VRPTW = phase 1 couldn’t even get started; try PARALLEL_CHEAPEST_INSERTION. “Solutions are correct but bad” = phase 2 needs more time budget.

4. Time windows are qualitatively harder than capacity

Capacity is a per-route invariant: sum the demands on a route, check against Q. Time windows are ordering constraints that interact with sequencing. Two customers whose windows don’t overlap force a specific order. Moves that are fine under capacity (2-opt, swap, relocate) routinely break feasibility under time windows.

Practical consequence: on tight VRPTW, you often can’t use PATH_CHEAPEST_ARC as the first-solution strategy, it greedily extends routes and paints itself into corners that violate later windows. PARALLEL_CHEAPEST_INSERTION considers all routes simultaneously and is much more likely to find something feasible.

5. There’s a standard benchmark from 1987 that people still use

Solomon’s 1987 VRPTW benchmark, 56 instances, 100 customers each, categorized as R (random), C (clustered), RC (mixed), is the industry-standard test suite. Modern papers still report results on it. Similarly, CVRPLIB’s X-class (2014) is the current CVRP standard. These benchmarks are how you answer “is my VRP solver any good.”

If I were going to ship a production VRP solver…

Rough decision tree from what I’ve read:

~100 customers, few constraints, need fast feasibility → OR-Tools with PARALLEL_CHEAPEST_INSERTION + GUIDED_LOCAL_SEARCH + 30–60 s time limit
~1000 customers, tight time windows → OR-Tools with longer time limit, or commercial LNS (Hexaly, Jsprit at scale)
Must have provable optimality → VRPSolver (research tool) or Gurobi with explicit MILP formulation
Streaming / online updates → insertion heuristic on arrival + periodic batch re-optimize

For learning: OR-Tools is the right first tool. It’s free, Python-accessible, the docs are solid, and it runs the canonical metaheuristics well enough to beat a lot of production systems.

What’s in the topic notes

If you want to go deeper:

VRP hub, concept, modeling primitives, when VRP is (and isn’t) the right frame
CVRP, capacity constraint, full OR-Tools code sketch
VRPTW, time windows, Solomon benchmarks, why it’s harder than it looks
Pickup and Delivery, pairing constraints, ride-sharing applications
Solution approaches, all the metaheuristic machinery, OR-Tools API, when to upgrade

References I used

OR-Tools Routing, the starting point
Wikipedia VRP, variant taxonomy
Dantzig & Ramser 1959, Truck Dispatching Problem, the founding paper
Clarke & Wright 1964, savings algorithm
Solomon 1987, VRPTW benchmark
CVRPLIB, CVRP benchmark instances