Physics-Guided Motion Loss for Video Generation Model

Bowen Xue1, Giuseppe Claudio Guarnera2, Shuang Zhao3, Zahra Montazeri1
1University of Manchester, 2University of York, 3University of California, Irvine
Pipeline for the physics-guided motion loss.

A compact frequency-domain loss regularizes generated motion without changing the video model architecture.

Abstract

Current video diffusion models generate visually compelling content but often violate basic laws of physics, producing subtle artifacts like rubber-sheet deformations and inconsistent object motion.

We introduce a frequency-domain physics prior that improves motion plausibility without modifying model architectures. Our method decomposes common rigid motions (translation, rotation, scaling) into lightweight spectral losses, requiring only 2.7% of frequency coefficients while preserving more than 97% of spectral energy.

Applied to video diffusion backbones such as Open-Sora, MVDIT, and Hunyuan, the loss improves motion accuracy, temporal consistency, and text-video alignment while maintaining visual quality.

Results

Below are representative frame strips. The proposed loss produces smoother, more coherent motion than the baseline on simple projected motion prompts.

Translation

A ball moving from right to left.

Ours Ours translation result.
Baseline Baseline translation result.

Curved Motion

A freight train arcs through a canyon.

Ours Ours freight train sequence.
Baseline Baseline freight train sequence.