Dynamic Time Warping
Rewritten and re-illustrated from my 2021 algorithm-class project. This article walks through Dynamic Time Warping (DTW) — a metric for measuring similarity between two numeric sequences. We start from Edit Distance over strings and trace how the same dynamic-programming skeleton extends to time series in a single thread.
In one line — instead of comparing two time series point-by-point at the same index, DTW asks how similar their shape is. Just as Edit Distance aligns two strings via add / delete / substitute, DTW aligns two sequences by warping the time axis.
1. What is DTW?
DTW is a metric for measuring the similarity between two continuous numeric series (patterns). The simplest comparison between two time series \(A, B\) is the Euclidean Distance (ED), which sums the squared differences point by point:
But ED is a strict 1:1 matching — it only compares same-index points, so it fails to capture overall shape similarity. If two series share the same pattern but with a time shift or local stretching, ED reports a large distance even though their shapes are essentially identical.
"DTW offers greater flexibility in measuring similarity (or distance) between a given pair of patterns S and T. This is similar in spirit to the edit distance and the Levenshtein distance." — The Dynamic Time Warping Algorithm, Eiji Mizutani
DTW therefore aligns the time axis non-linearly to ask whether the shapes themselves are similar. The same idea extends naturally to comparing probability distributions and other sequence-like objects.
2. Edit Algorithm (Text) — the starting point
DTW extends Levenshtein distance (= Edit Distance) from strings to numeric sequences. It helps to anchor the intuition in the string version first.
1) Basic operations
How do we quantify the similarity between two strings? Count the minimum number of operations required to transform one into the other. Three basic operations are allowed:
- Add — insert a character
- Delete — remove a character
- Substitute — replace a character with another
2) Example
Compare two pairs of strings. Red dashed lines mark substitutes, green solid lines mark characters kept in place.
Pair 1 needs \(A\to D,\; B\to E,\; C\to F\) — three substitutes — so \(\text{edit distance} = 3\). Pair 2 needs only \(C\to D\), so \(\text{edit distance} = 1\).
Takeaway. The smaller the edit distance (= the fewer changes required), the more similar the two strings.
3. Edit Distance Matrix
To compute edit distance systematically — i.e. to turn it into an algorithm — we use a matrix. Let's work out the edit distance between \(abcdefg\) and \(ayced\).
Given two strings, we look at every contiguous prefix. Character order matters (we are after sequence similarity, not bag-of-characters similarity), so prefixes are taken in order — never as arbitrary subsets.
Including the empty prefix \(\{\}\), the prefixes of \(abcdefg\) are:
Similarly for \(ayced\):
Place these along the row and column axes; each cell stores the edit distance of the corresponding prefix pair.
| {} | a | b | c | d | e | f | g | |
|---|---|---|---|---|---|---|---|---|
| {} | ||||||||
| a | ||||||||
| y | ||||||||
| c | ||||||||
| e | ||||||||
| d |
[Edit distance matrix] — stores the edit distance for every pair of contiguous prefixes of the two strings.
4. Step 1) Boundary condition
We start at the boundary — the row and column compared against the empty prefix \(\{\}\). The empty prefix transforms into any other prefix via "all adds" or "all deletes", so its edit distance equals the other prefix's length.
- \(\{\} \to \{\}\): distance = 0 (already equal)
- \(\{\} \to \{a\}\): distance = 1 (add \(a\))
- \(\{\} \to \{a, b\}\): distance = 2 (add \(a, b\))
- In general, \(\{\} \to \{x_1, \ldots, x_k\}\) has distance \(k\).
So the first row and first column are filled as follows:
| {} | a | b | c | d | e | f | g | |
|---|---|---|---|---|---|---|---|---|
| {} | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| a | 1 | |||||||
| y | 2 | |||||||
| c | 3 | |||||||
| e | 4 | |||||||
| d | 5 |
After the boundary condition is set. The first row equals the prefix lengths of \(abcdefg\); the first column equals the prefix lengths of \(ayced\).
5. Step 2) Fill in rows — first row
Now we fill the interior. Start with Row 1 (\(\{a\}\)). Compare \(\{a\}\) against each prefix of \(abcdefg\):
- \(\{a\} \leftrightarrow \{a\}\): identical → 0
- \(\{a\} \leftrightarrow \{a,b\}\): delete \(b\) → 1
- \(\{a\} \leftrightarrow \{a,b,c\}\): delete \(b, c\) → 2
- \(\{a\} \leftrightarrow \{a,b,c,d\}\): delete \(b, c, d\) → 3
- \(\{a\} \leftrightarrow \{a,b,c,d,e\}\): delete \(b, c, d, e\) → 4
- … continuing in the same way to the end
The first row ends up as:
| {} | a | b | c | d | e | f | g | |
|---|---|---|---|---|---|---|---|---|
| {} | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| a | 1 | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
Row 1 (\(\{a\}\)) filled. Notice how the distance dips when matching characters meet — this becomes the heart of the recurrence.
Rows 2 through 5 follow the same rule. Each cell still means the minimum number of operations to turn the row prefix into the column prefix. For example, filling Row 2 (\(\{a, y\}\)):
- \(\{a,y\} \leftrightarrow \{a\}\): add \(y\) → 1
- \(\{a,y\} \leftrightarrow \{a,b\}\): substitute \(b \to y\) → 1
- \(\{a,y\} \leftrightarrow \{a,b,c\}\): substitute \(b \to y\), delete \(c\) → 2
- \(\{a,y\} \leftrightarrow \{a,b,c,d\}\): substitute \(b \to y\), delete \(c, d\) → 3
- … resulting in \(2,\, 1,\, 1,\, 2,\, 3,\, 4,\, 5,\, 6\)
Rows 3, 4, 5 are filled identically. Once every row is in, the matrix is complete.
6. Step 3) Complete the matrix
| {} | a | b | c | d | e | f | g | |
|---|---|---|---|---|---|---|---|---|
| {} | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| a | 1 | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
| y | 2 | 1 | 1 | 2 | 3 | 4 | 5 | 6 |
| c | 3 | 2 | 2 | 1 | 2 | 3 | 4 | 5 |
| e | 4 | 3 | 3 | 2 | 2 | 2 | 3 | 4 |
| d | 5 | 4 | 4 | 3 | 2 | 3 | 3 | 4 |
The completed edit distance matrix. The bottom-right cell holds the final answer.
Final answer — \(D(7, 5) = 4\). The edit distance between \(abcdefg\) and \(ayced\) is 4: at minimum, four add / delete / substitute operations are needed to transform one into the other.
A natural question — why fill every cell, when only the bottom-right one is the answer we want?
A computer doesn't reason intuitively; it needs a structural rule it can apply mechanically. That rule is the recurrence relation, and to determine any single cell via the recurrence its neighbors must already be filled. Hence the full matrix.
7. Generalization — the meaning of \(D(i, j)\)
Denote an arbitrary cell as \(D(i, j)\). Its meaning is:
\(D(i, j)\) = the minimum edit distance between the first \(i\) characters of TextA and the first \(j\) characters of TextB.
Filling the matrix top-to-bottom, left-to-right: by the time we reach cell \(D(i, j)\), the three neighbors \(D(i-1, j-1),\; D(i-1, j),\; D(i, j-1)\) are already known. From those three together with the new characters \(A[i],\, B[j]\), we can determine \(D(i, j)\).
8. Deriving the recurrence
Case 1) \(A[i] = B[j]\) — last characters match
Example: \(\,ayc\) ↔ \(abc\). The last character \(c\) matches. If we already know the cost \(D(i-1, j-1)\) of aligning \(\,ay\) ↔ \(ab\), the matching last character costs nothing extra.
Case 2) \(A[i] \ne B[j]\) — last characters differ
Example: \(\,abcd\) ↔ \(\,ayce\). The last characters \(d, e\) differ. The cell can be reached by exactly one of three single-step operations:
- Substitute — \(D(i-1, j-1) + 1\). Replace \(A[i]\) with \(B[j]\) (e.g. \(d \to e\)).
- Delete — \(D(i-1, j) + 1\). Drop the last character \(A[i]\) on the TextA side.
- Add — \(D(i, j-1) + 1\). Insert \(B[j]\) on the TextA side.
The shortest of the three is the minimum edit distance.
The recurrence, in one box.
From these two cases, every cell beyond the boundary is determined, and \(D(|A|, |B|)\) is the edit distance of the full strings. This same recurrence skeleton carries directly over to numeric time series — which is the heart of DTW, covered next.
Work in progress — up next: Step 4) extending the recurrence to numeric time series (DTW proper), warping paths, distance metrics (\(L_1, L_2\)), and a worked example on real series.