Fitness Landscape Navigation in GHC's ApplicativeDo: From O(n³) to Practical Optimization

When GHC's -foptimal-applicativedo flag was disabled by default, the reason was blunt: a 1,000-statement do block took 55 seconds to compile. The algorithm behind optimal scheduling is cubic, and in compiler development, cubic complexity is often a death sentence. Yet the problem itself—finding the best way to group independent statements while respecting dependencies—turns out to share deep structural similarities with RNA folding prediction in computational biology. This connection reveals not just a faster algorithm, but a framework for understanding how optimization landscapes can be navigated when the search space is shaped by specific constraints.

The Fitness Landscape of Tree Structures

At its core, the ApplicativeDo optimization is a search problem over binary tree structures. Given a sequence of statements with dependencies between them, the compiler must build a tree where each node is either sequential (Seq) or parallel (Par). Parallel nodes can only join subtrees with no cross-dependencies. The cost of a tree—what we're trying to minimize—is the number of execution rounds, where parallel branches execute concurrently but sequential branches add up.

This search space forms a fitness landscape where each valid tree is a point, and the "fitness" is the negative cost (we want lower costs). The naive algorithm explores this landscape exhaustively: for every span of statements from i to j, it tries every possible split point k, computing C[i,k] + C[k+1,j] for sequential combinations. This is exactly the dynamic programming approach used in RNA secondary structure prediction, where the goal is to find the folding with minimum free energy.

The landscape has structure that makes it tractable. The cost function is monotonically increasing: adding statements to a block can only increase its cost or leave it unchanged. This creates a gradient-like property where local information can guide global decisions. More importantly, the non-crossing constraint—dependencies cannot be reordered—collapses what would be an exponential search space into a polynomial one. This is the same constraint that makes RNA folding tractable: if base i pairs with base j, any base between them must pair within that interval.

Selection Pressure and Local Optima

The key insight from Ian Duncan's analysis is that the fitness landscape has a remarkable property: for "tangled" spans where the extreme cuts (peeling off the first or last statement) produce equal costs, those extreme cuts are provably optimal unless they tie. This is not immediately obvious. One might expect that the optimal split could lie anywhere in the span, requiring a linear search over all possibilities. But the monotonicity of the cost function creates selection pressure toward the boundaries.

Consider a span from i to j with cost c. If peeling off the first statement gives cost 1 + C[i+1,j] and peeling off the last gives C[i,j-1] + 1, any interior split at k produces C[i,k] + C[k+1,j]. By monotonicity, both C[i,k] and C[k+1,j] are at least as large as their boundary-peeled counterparts. The interior split sums two larger values, which cannot beat the boundary options unless those options themselves are tied. This means local optima at the extremes are often global optima, reducing the search from linear to constant time per span.

The selection pressure becomes even more pronounced when we introduce the longest-chain bound. The longest dependency chain through a span provides a lower bound on the cost. If this bound equals c + 1 (the upper bound in a tie situation), the cost is determined without any search at all. This is particularly effective for chain-like dependencies where each statement depends on the previous one—exactly the pattern that causes the worst-case cubic behavior. The bound acts as a fitness landscape constraint that collapses the search space to a single point.

Convergence Properties and Practical Trade-offs

The optimized algorithm combines three strategies, each effective in different regions of the fitness landscape:

Parallelism extraction: When a span can be split in parallel (no cross-dependencies), do so immediately. This is the "easy" region of the landscape where the structure is obvious.
Extreme-cut shortcut: For tangled spans, check only the first and last cuts. If they differ, the cheaper is optimal. This handles the majority of real-world code.
Longest-chain bound: When extreme cuts tie, check if the longest dependency chain equals the upper bound. If so, the cost is known without searching for the actual split point.

The empirical results are striking. On chain-like inputs, the optimized cut checks drop from 166,650 to 4,950 for 100 statements—a 33x reduction. On realistic code, the improvement is roughly an order of magnitude. The algorithm converges to near-quadratic behavior for practical inputs while maintaining optimality.

However, the landscape has pathological regions. The "hub" pattern—where every statement feeds into a single final sink—defeats the longest-chain bound. Each span ties at cost 2, but the longest chain is also 2, so the bound never closes and the optimizer must scan all possibilities. Dense dependency blobs with long spanning arrows similarly resist optimization. These are the local minima that prevent the algorithm from achieving true quadratic worst-case complexity.

Theoretical Limits and Biological Machinery

The connection to biology runs deeper than metaphor. Both problems use (min,+) operations—tropical matrix multiplication—over triangular tables. In 2016, Bringmann et al. showed that RNA folding could be solved in Õ(n^2.8244) time using fast bounded-difference min-plus product, breaking the cubic barrier. This suggests the O(n³) complexity is not fundamental.

Yet the theoretical improvement comes with caveats. The sub-cubic algorithm relies on fast rectangular matrix multiplication with enormous constant factors. For the do blocks humans actually write—rarely exceeding a few dozen statements—the crossover point where n^2.8244 beats a well-tuned n³ is never reached. The biological machinery is a proof of possibility, not a practical recipe.

For GHC, the practical solution remains the combination of extreme-cut shortcuts and longest-chain bounds. These are local optimizations that exploit the fitness landscape's structure without requiring global restructuring. They represent an evolutionary approach to search: rather than exploring the entire landscape, identify the regions where local information determines global optima and focus computation there.

Implications for Compiler Optimization

The ApplicativeDo case illustrates a broader principle for compiler optimization: when the search space has structure, exploit it. The fitness landscape framework—thinking in terms of local optima, selection pressure, and convergence properties—provides intuition for when exhaustive search is necessary and when shortcuts suffice.

For practitioners, the takeaway is twofold. First, algorithmic complexity is not just about big-O notation; it's about the shape of the input distribution. The worst-case O(n³) is rare in practice because real code tends to have structure—parallelism, locality, limited dependency depth—that the optimized algorithm exploits. Second, cross-domain insights can reveal hidden structure. The RNA folding connection didn't just provide a faster algorithm; it provided a language for understanding why the algorithm works.

The -foptimal-applicativedo flag may remain off by default, but the techniques developed to make it feasible—extreme-cut shortcuts, longest-chain bounds, and the recognition that local optima often equal global optima—are applicable whenever compilers must search over structured spaces. The fitness landscape, once understood, becomes navigable.

Sources:

Ian Duncan, "Stealing from Biologists to Compile Haskell Faster" (2026)
Bringmann et al., "Truly Sub-cubic Algorithms for Language Edit Distance and RNA Folding via Fast Bounded-Difference Min-Plus Product" (FOCS 2016)
Marlow et al., "Desugaring Haskell's do-notation Into Applicative Operations" (Microsoft Research, 2016)

compilers

内容声明：本文无广告投放、无付费植入。

如有事实性问题，欢迎发送勘误至 i@hotdrydog.com。