Rolling Forcing:
Autoregressive Long Video Diffusion in Real Time

1Nanyang Technological University
2ARC Lab, Tencent PCG

Abstract

Streaming video generation is a fundamental component of interactive world models and neural game engines, yet existing methods suffer from severe error accumulation that degrades quality over long horizons.

We present Rolling Forcing, a novel autoregressive video diffusion technique that enables real-time generation of multi-minute videos with minimal error accumulation. Our approach introduces three key innovations:

  • Joint Denoising: Simultaneously denoising multiple frames with progressive noise levels to relax strict causality.
  • Attention Sink: Preserving key-value states of initial frames as a global context anchor to enhance long-term consistency.
  • Efficient Training: Distillation operating on non-overlapping windows and mitigating exposure bias from self-generated histories.

2-Minute Video Results

Interactive Video Streaming

Method Comparisons

MAGI-1
SkyReels-V2
CausVid
Self Forcing
Rolling Forcing