Training-free Latent Inter-Frame Pruning with Attention Recovery

Supplementary Material

Comparisons to Existing Real-Time (Low Latency) V2V Models

Our method significantly increases the throughput of the base model (Self-Forcing [1]) for real-time video editing while maintaining the visual quality and temporal consistency of edited videos.

  • Three corgi puppies sharing a meal together on a kitchen floor.

    Input Video LIPAR (Ours) - 33.8% Pruned Self-Forcing [1]
    StreamV2V [2] StreamDiffusion [3] ControlVideo [4]
  • Three majestic lions huddled together feasting on a meal.

    Input Video LIPAR (Ours) - 33.8% Pruned Self-Forcing [1]
    StreamV2V [2] StreamDiffusion [3] ControlVideo [4]
  • A beautiful blonde woman with blue eyes wearing is performing the moonwalk. Simple dark background.

    Input Video LIPAR (Ours) - 21.3% Pruned Self-Forcing [1]
    StreamV2V [2] StreamDiffusion [3] ControlVideo [4]
  • Two cute, fluffy penguins wearing winter scarves waddling across a frozen ice path in Antarctica.

    Input Video LIPAR (Ours) - 52.9% Pruned Self-Forcing [1]
    StreamV2V [2] StreamDiffusion [3] ControlVideo [4]
  • A old man with white beard is holding and interacting with mysterious rock that has a small tree growing on it. Natural lighting, domestic interior background.

    Input Video LIPAR (Ours) - 19.1% Pruned Self-Forcing [1]
    StreamV2V [2] StreamDiffusion [3] ControlVideo [4]
  • A woman wearing a black leather jacket riding a motorcycle while stretching her arms out joyfully. Realistic cinematic style, wind blowing through hair, blurred asphalt road beneath to imply speed.

    Input Video LIPAR (Ours) - 20.7% Pruned Self-Forcing [1]
    StreamV2V [2] StreamDiffusion [3] ControlVideo [4]
  • A young anime protagonist riding a sleek bike with arms outstretched.

    Input Video LIPAR (Ours) - 20.7% Pruned Self-Forcing [1]
    StreamV2V [2] StreamDiffusion [3] ControlVideo [4]
  • Anime style animation of a frog dancing and performing acrobatic side somersaults. Vibrant cel-shaded colors.

    Input Video LIPAR (Ours) - 16.8% Pruned Self-Forcing [1]
    StreamV2V [2] StreamDiffusion [3] ControlVideo [4]
  • A majestic lion is turning its head to look around in a field.

    Input Video LIPAR (Ours) - 11.9% Pruned Self-Forcing [1]
    StreamV2V [2] StreamDiffusion [3] ControlVideo [4]
  • Comparisons with Other Training-free Pruning Methods

    We compare our LIPAR pruning method against representative training-free pruning methods, including ToMe [5], Importance-based [6], and IDM [7], implemented on the Self-Forcing model, as shown below.

    Effectiveness of Attention Recovery

    Direct pruning leads to visual artifacts, and only M-degree approximation creates noisy patterns. In contrast, full Attention Recovery effectively mitigates artifacts and restores visual quality.

    Direct Pruning M-Degree Apprx. M-Degree Apprx.+ Noise-aware Duplication

    Visualizations with Time-to-move Integration

    We further integrate our LIPAR method to the Time-to-Move (TTM) model [8] to demonstrate its generalizability.

     

    Reference