1University of Basel    2University of Helsinki    3Rice University
TL;DR

We propose a neural PDE solver built entirely from learned coordinate warps: no Fourier layers, no attention, (almost) no convolutions. Our compact 17M-parameter model consistently outperforms Fourier, convolutional, and attention-based baselines across all 16 tested benchmarks from The Well and PDEBench. Scaled to 150M parameters, it outperforms Poseidon-L (628M parameters, pretrained) on compressible Euler, training from scratch with 4× fewer parameters.

move cursor · reveal warp heads

Visualizing Flowers: To predict the next state of a PDE governed by conservation laws, we must look "upstream" in the flow and move the quantity from there to the point of interest. We propose a model that explicitly performs this action. Hover to reveal learned warp directions for the first layer, aggregated into 6 k-means clusters for visualization. Each arrow points from a grid location to where the model looks to fetch a value.

#Abstract

We introduce Flowers, a neural architecture for learning PDE solution operators built entirely from multihead warps. Aside from pointwise channel mixing and a multiscale scaffold, Flowers use no Fourier multipliers, no dot-product attention, and no convolutional mixing. Each head predicts a displacement field and warps the mixed input features. Motivated by physics and computational efficiency, displacements are predicted pointwise, without any spatial aggregation, and nonlocality enters only through sparse sampling at source coordinates, one per head. Stacking warps in multiscale residual blocks yields Flowers, which implement adaptive, global interactions at linear cost.

We theoretically motivate this design through three complementary lenses: flow maps for conservation laws, waves in inhomogeneous media, and a kinetic-theoretic continuum limit. Flowers achieve excellent performance on a broad suite of 2D and 3D time-dependent PDE benchmarks, particularly flows and waves. A compact 17M-parameter model consistently outperforms Fourier, convolution, and attention-based baselines of similar size, while a 150M-parameter variant improves over recent transformer-based foundation models with much more parameters, data, and training compute.

#Results

One-step prediction comparison on viscoelastic instability: ground truth vs FNO, CNUnet, scOT, Flower

Viscoelastic instability: one-step prediction of the conformation tensor entry Czz. Top row: ground truth and model predictions. Bottom row: last input frame and pointwise errors.

We benchmark on 16 datasets drawn from The Well and PDEBench, covering fluid dynamics, wave propagation, magnetohydrodynamics, 3D flows, and more. Models are compared at roughly equal parameter count (15–20M): FNO, a convolutional U-Net (CNUnet), an attention-based model (scOT), and Flower. Flower achieves the best next-step prediction on every dataset and the best 1:20 rollout on 15 of 16.

Collection Dataset Next-step (VRMSE ↓) 1:20 Rollout (VRMSE ↓)
FNO CNUnet scOT Flower FNO CNUnet scOT Flower
The Well acoustic_scattering_maze 0.14540.01290.0361 0.0064 0.41970.08740.19960.0489
active_matter 0.17490.06500.1050 0.0249 3.28621.77814.10551.3905
gray_scott_reaction_diffusion 0.03720.01880.0673 0.0102 0.91250.30590.41690.2074
MHD_64 0.34030.2062 0.1165 1.30070.95340.7580
planetswe 0.00700.00270.0041 0.0007 0.13160.06240.05180.0187
post_neutron_star_merger 0.44520.3391 0.3269 0.59800.65290.6223
rayleigh_benard 0.21040.21710.1863 0.0807 39.03812.5075.64862.1661
rayleigh_taylor_instability 0.17140.1351 0.0491 3.00575.38940.5862
shear_flow 0.07690.05940.1093 0.0463 1.12450.76320.89300.2246
supernova_explosion 0.43260.4316 0.2888 1.01622.29130.8113
turbulence_gravity_cooling 0.27200.2113 0.1700 3.25832.05101.2636
turbulent_radiative_layer_2D 0.32500.25590.3555 0.1930 1.23280.70510.82990.5491
turbulent_radiative_layer_3D 0.32610.3322 0.2073 0.92030.71390.6840
viscoelastic_instability 0.19140.16230.2017 0.0624 0.42840.35920.48900.3465
PDEBench Diffusion-Reaction 0.01910.00330.0150 0.0015 0.75630.02790.09490.0241
Shallow Water 0.00190.00440.0187 0.0010 0.24990.14270.11640.0076

Table 1. VRMSE for unconditioned 4→1 next-step prediction and 1:20 autoregressive rollout. Bold = best, green = second best. Using a fixed viscoelastic_instability dataset; see this GitHub issue for details.

Shear flow: autoregressive rollout. Comparison of autoregressive rollout model predictions on the shear flow dataset (4→1 unconditioned setting), showing the tracer field.

Acoustic scattering in a maze: autoregressive rollout. Waves propagate through a complex maze geometry with hard reflecting walls. The difficulty lies in learning how pressure waves scatter, reflect, and diffract around obstacles.

3D Rayleigh-Taylor instability: one-step prediction comparison

3D Rayleigh–Taylor instability: 10-step autoregressive prediction of density. Because displacements are predicted pointwise, Flower scales naturally to 3D with cost that is linear in the number of grid points. Rendered using vape4d.

Interpretability

Without any explicit supervision on flow direction, the warp field aligns with the underlying fluid velocity: the model learns to look upstream along characteristics, exactly as conservation law theory would suggest.
Learned displacement field (arrows) overlaid on shear flow tracer

Learned displacement fields on shear flow. The black arrows show the displacement field predicted by the first head of the first downsampling block, overlaid on the tracer field.

#Throughput

Accuracy improvements would be of limited use if they came at significant compute cost. Flower matches the throughput of FNO and scOT across resolutions, while substantially outperforming CNUnet, especially in 3D, where pointwise displacement prediction gives Flower a natural efficiency advantage.

Model throughput (samples/sec) vs. resolution for 2D and 3D problems

Training throughput (samples/sec) across resolutions for 2D and 3D problems, measured during the benchmark runs of Table 1. Error bars indicate standard deviation across datasets. Flower scales to high resolution and 3D at linear cost in grid points.

#Scaling

We scale Flower from 17M to 156M parameters on the compressible Euler equations dataset and compare against Poseidon-L, a 628M-parameter fo undation model pretrained on diverse fluid-dynamics. We chose Poseidon-L for comparison because it is the best-performing foundation model on this dataset, according to Table 1 of Walrus. All Flower variants are trained from scratch. Performance improves smoothly with model size.

Model Parameters 1-Step ↓ 1:20 Rollout ↓
Flower-Tiny 17.3M 0.0160 0.1114
Flower-Small 69.3M 0.0124 0.0850
Flower-Medium 155.8M 0.0108 0.0739
Poseidon-L (pretrained) 628.6M 0.0194 0.1114

Compressible Euler equations (VRMSE ↓). Poseidon-L is a pretrained foundation model; all Flower variants are trained from scratch on this single dataset. Even Flower-Tiny matches Poseidon-L on the 1:20 rollout.

Compressible Euler (periodic BC): extended autoregressive rollout. Flower-Medium (156M parameters) outperforms Poseidon-L (628M parameters, pretrained) with 4× fewer parameters and no cross-dataset pretraining.

#Ablations

We ablate key components of Flower on the viscoelastic instability dataset. The central finding: warping is non-negotiable. Removing it entirely leaves a U-Net backbone without any adaptive spatial interaction, causing a dramatic collapse. A single warp head recovers much of the one-step gain (consistent with the theory), but multiple heads are critical for stable long-horizon rollouts.

Model Variant 1-Step ↓ 1:20 Rollout ↓
without warping 0.4974 0.6911
single-head warping 0.1241 0.6719
without coordinate encoding 0.0638 0.4065
Calculate flow with a single conv (no MLP) 0.0834 0.3998
without Identity Projection 0.0712 0.3816
without GroupNorm 0.0622 0.3760
Full Model 0.0624 0.3465
vs. CNUnet (baseline) 0.1623 0.3592

Ablation on the viscoelastic instability dataset (VRMSE ↓), sorted by 1:20 rollout loss. Bold = best, green = second best. CNUnet included for reference.

Ablation study: viscoelastic instability rollout. Visual comparison across model variants. Without warping or with only a single head, predictions quickly diverge over extended rollouts; the full multi-head model remains accurate.

#BibTeX

Will be added as soon as arXiv releases the pre-print!