Flowers: A Warp Drive for Neural PDE Solvers

Muser, Till; Spitzer, Alexandra; Lassas, Matti; de Hoop, Maarten V.; Dokmanić, Ivan

Till Muser¹, Alexandra Spitzer¹, Matti Lassas², Maarten V. de Hoop³, Ivan Dokmanić¹

¹University of Basel ²University of Helsinki ³Rice University

TL;DR

We propose a neural PDE solver built entirely from learned coordinate warps: no Fourier layers, no attention, (almost) no convolutions. Our compact 17M-parameter model consistently outperforms Fourier, convolutional, and attention-based baselines across all 16 tested benchmarks from The Well and PDEBench. Scaled to 150M parameters, it outperforms Poseidon-L (628M parameters, pretrained) on compressible Euler, training from scratch with 4× fewer parameters.

move cursor · reveal warp heads

Visualizing Flowers: To predict the next state of a PDE governed by conservation laws, we must look "upstream" in the flow and move the quantity from there to the point of interest. We propose a model that explicitly performs this action. Hover to reveal learned warp directions for the first layer, aggregated into 6 k-means clusters for visualization. Each arrow points from a grid location to where the model looks to fetch a value.

#Abstract

We introduce Flowers, a neural architecture for learning PDE solution operators built entirely from multihead warps. Aside from pointwise channel mixing and a multiscale scaffold, Flowers use no Fourier multipliers, no dot-product attention, and no convolutional mixing. Each head predicts a displacement field and warps the mixed input features. Motivated by physics and computational efficiency, displacements are predicted pointwise, without any spatial aggregation, and nonlocality enters only through sparse sampling at source coordinates, one per head. Stacking warps in multiscale residual blocks yields Flowers, which implement adaptive, global interactions at linear cost.

We theoretically motivate this design through three complementary lenses: flow maps for conservation laws, waves in inhomogeneous media, and a kinetic-theoretic continuum limit. Flowers achieve excellent performance on a broad suite of 2D and 3D time-dependent PDE benchmarks, particularly flows and waves. A compact 17M-parameter model consistently outperforms Fourier, convolution, and attention-based baselines of similar size, while a 150M-parameter variant improves over recent transformer-based foundation models with much more parameters, data, and training compute.

#Results

One-step prediction comparison on viscoelastic instability: ground truth vs FNO, CNUnet, scOT, Flower

We benchmark on 16 datasets drawn from The Well and PDEBench, covering fluid dynamics, wave propagation, magnetohydrodynamics, 3D flows, and more. Models are compared at roughly equal parameter count (15–20M): FNO, a convolutional U-Net (CNUnet), an attention-based model (scOT), and Flower. Flower achieves the best next-step prediction on every dataset and the best 1:20 rollout on 15 of 16.

Collection	Dataset	Next-step (VRMSE ↓)				1:20 Rollout (VRMSE ↓)
Collection	Dataset	FNO	CNUnet	scOT	Flower	FNO	CNUnet	scOT	Flower
The Well	acoustic_scattering_maze	0.1454	0.0129	0.0361	0.0064	0.4197	0.0874	0.1996	0.0489
	active_matter	0.1749	0.0650	0.1050	0.0249	3.2862	1.7781	4.1055	1.3905
	gray_scott_reaction_diffusion	0.0372	0.0188	0.0673	0.0102	0.9125	0.3059	0.4169	0.2074
	MHD_64	0.3403	0.2062	—	0.1165	1.3007	0.9534	—	0.7580
	planetswe	0.0070	0.0027	0.0041	0.0007	0.1316	0.0624	0.0518	0.0187
	post_neutron_star_merger	0.4452	0.3391	—	0.3269	0.5980	0.6529	—	0.6223
	rayleigh_benard	0.2104	0.2171	0.1863	0.0807	39.038	12.507	5.6486	2.1661
	rayleigh_taylor_instability	0.1714	0.1351	—	0.0491	3.0057	5.3894	—	0.5862
	shear_flow	0.0769	0.0594	0.1093	0.0463	1.1245	0.7632	0.8930	0.2246
	supernova_explosion	0.4326	0.4316	—	0.2888	1.0162	2.2913	—	0.8113
	turbulence_gravity_cooling	0.2720	0.2113	—	0.1700	3.2583	2.0510	—	1.2636
	turbulent_radiative_layer_2D	0.3250	0.2559	0.3555	0.1930	1.2328	0.7051	0.8299	0.5491
	turbulent_radiative_layer_3D	0.3261	0.3322	—	0.2073	0.9203	0.7139	—	0.6840
	viscoelastic_instability^†	0.1914	0.1623	0.2017	0.0624	0.4284	0.3592	0.4890	0.3465
PDEBench	Diffusion-Reaction	0.0191	0.0033	0.0150	0.0015	0.7563	0.0279	0.0949	0.0241
PDEBench	Shallow Water	0.0019	0.0044	0.0187	0.0010	0.2499	0.1427	0.1164	0.0076

Table 1. VRMSE for unconditioned 4→1 next-step prediction and 1:20 autoregressive rollout. Bold = best, green = second best. ^†Using a fixed viscoelastic_instability dataset; see this GitHub issue for details.

Shear flow: autoregressive rollout. Comparison of autoregressive rollout model predictions on the shear flow dataset (4→1 unconditioned setting), showing the tracer field.

Acoustic scattering in a maze: autoregressive rollout. Waves propagate through a complex maze geometry with hard reflecting walls. The difficulty lies in learning how pressure waves scatter, reflect, and diffract around obstacles.

3D Rayleigh-Taylor instability: one-step prediction comparison

Interpretability

Without any explicit supervision on flow direction, the warp field aligns with the underlying fluid velocity: the model learns to look upstream along characteristics, exactly as conservation law theory would suggest.

Learned displacement field (arrows) overlaid on shear flow tracer

#Throughput

Accuracy improvements would be of limited use if they came at significant compute cost. Flower matches the throughput of FNO and scOT across resolutions, while substantially outperforming CNUnet, especially in 3D, where pointwise displacement prediction gives Flower a natural efficiency advantage.

#Scaling

We scale Flower from 17M to 156M parameters on the compressible Euler equations dataset and compare against Poseidon-L, a 628M-parameter fo undation model pretrained on diverse fluid-dynamics. We chose Poseidon-L for comparison because it is the best-performing foundation model on this dataset, according to Table 1 of Walrus. All Flower variants are trained from scratch. Performance improves smoothly with model size.

Model	Parameters	1-Step ↓	1:20 Rollout ↓
Flower-Tiny	17.3M	0.0160	0.1114
Flower-Small	69.3M	0.0124	0.0850
Flower-Medium	155.8M	0.0108	0.0739
Poseidon-L (pretrained)	628.6M	0.0194	0.1114

Compressible Euler equations (VRMSE ↓). Poseidon-L is a pretrained foundation model; all Flower variants are trained from scratch on this single dataset. Even Flower-Tiny matches Poseidon-L on the 1:20 rollout.

Compressible Euler (periodic BC): extended autoregressive rollout. Flower-Medium (156M parameters) outperforms Poseidon-L (628M parameters, pretrained) with 4× fewer parameters and no cross-dataset pretraining.

#Ablations

We ablate key components of Flower on the viscoelastic instability dataset. The central finding: warping is non-negotiable. Removing it entirely leaves a U-Net backbone without any adaptive spatial interaction, causing a dramatic collapse. A single warp head recovers much of the one-step gain (consistent with the theory), but multiple heads are critical for stable long-horizon rollouts.

Model Variant	1-Step ↓	1:20 Rollout ↓
without warping	0.4974	0.6911
single-head warping	0.1241	0.6719
without coordinate encoding	0.0638	0.4065
Calculate flow with a single conv (no MLP)	0.0834	0.3998
without Identity Projection	0.0712	0.3816
without GroupNorm	0.0622	0.3760
Full Model	0.0624	0.3465
vs. CNUnet (baseline)	0.1623	0.3592

Ablation on the viscoelastic instability dataset (VRMSE ↓), sorted by 1:20 rollout loss. Bold = best, green = second best. CNUnet included for reference.

Ablation study: viscoelastic instability rollout. Visual comparison across model variants. Without warping or with only a single head, predictions quickly diverge over extended rollouts; the full multi-head model remains accurate.

#BibTeX

Will be added as soon as arXiv releases the pre-print!