What is a Suffix Array?
A suffix array is a sorted array of all suffixes of a string, stored as their starting indices. For the string “banana”, the six suffixes are “banana”, “anana”, “nana”, “ana”, “na”, and “a”. Sorting them lexicographically gives the suffix array [5, 3, 1, 0, 4, 2].
Once built, a suffix array enables O(m log n) substring search via binary search, where m is the pattern length and n is the string length. This makes it one of the most space-efficient data structures for full-text search.
Operations & Time Complexity
| Operation | Time | Description |
|---|---|---|
| Build (naive) | O(n² log n) | Sort all suffixes using standard comparison sort. |
| Build (optimized) | O(n log² n) | Rank-based sorting with doubling prefix lengths. |
| Build (SA-IS) | O(n) | Linear-time construction using induced sorting. |
| Pattern search | O(m log n) | Binary search on the sorted suffix array. |
| LCP array | O(n) | Kasai's algorithm after suffix array is built. |
| Longest repeated substring | O(n) | Max value in the LCP array. |
Suffix Array vs Suffix Tree vs Trie
| Feature | Suffix Array | Suffix Tree | Trie |
|---|---|---|---|
| Space | O(n) | O(n) (large constant) | O(total chars) |
| Build time | O(n) to O(n log n) | O(n) | O(total chars) |
| Pattern search | O(m log n) | O(m) | O(m) |
| Cache performance | Excellent | Poor | Moderate |
| Use case | Full-text search | Complex string ops | Prefix matching |
Suffix Arrays in Coding Interviews
- •Longest repeated substring — Build suffix array + LCP array. The answer is the maximum LCP value.
- •Number of distinct substrings — n(n+1)/2 minus the sum of all LCP values.
- •Pattern matching — Binary search on the suffix array. O(m log n) per query.
- •Longest common substring of two strings — Concatenate with a separator, build suffix array + LCP, find max LCP across the boundary.
- •Lexicographically smallest rotation — Concatenate the string with itself, build suffix array, find the smallest suffix of length n.
Frequently Asked Questions
What is a suffix array?add
A suffix array is a sorted array of all suffixes of a string, represented by their starting indices. For 'banana', the suffixes are 'banana', 'anana', 'nana', 'ana', 'na', 'a'. Sorted lexicographically: 'a'(5), 'ana'(3), 'anana'(1), 'banana'(0), 'na'(4), 'nana'(2). The suffix array is [5, 3, 1, 0, 4, 2].
What is a suffix array used for?add
Suffix arrays enable fast substring search via binary search in O(m log n) time where m is the pattern length and n is the string length. They are used in text search engines, bioinformatics (DNA sequence matching), data compression (BWT), and finding longest repeated substrings. They are a space-efficient alternative to suffix trees.
How do you build a suffix array efficiently?add
The naive approach sorts all suffixes in O(n^2 log n) time. The DC3/SA-IS algorithm builds suffix arrays in O(n) time. For interviews, the O(n log^2 n) approach using rank pairs and radix sort is the most practical to implement. The key insight is to sort by progressively longer prefixes, doubling the comparison length each pass.
What is the difference between a suffix array and a suffix tree?add
Both represent all suffixes of a string. A suffix tree is a compressed trie of all suffixes, using O(n) space but with a large constant factor. A suffix array is just a sorted array of indices, using much less memory. Combined with an LCP (longest common prefix) array, a suffix array can solve most problems that suffix trees solve, with better cache performance.