Categories: Tutorials

Create Engaging Videos with Mochi1

Mochi 1, an open-source text-to-video diffusion model has been released by Genmo.  Trained with 10 billion parameters built on novel Asymmetric Diffusion Transformer (AsymmDiT) architecture that is also flexible to fine tune. The model is capable of generating output with high fidelity and strong prompt adherence.

The model is registered under Apache2.0 license, that means it can be used for research, educational and commercial purposes. 

Currently, it needs minimum 4 H100 GPU which is really huge for any individual to run, but they also inviting the community to release quantized model so that it easily accessible by the lower end users.

It can be run in ComfyUI, as it consume about 20GB VRAM in the VAE(Variational Auto Encoder) decoding level.

Installation

1. Install ComfyUI into machine.

2. Navigate to “ComfyUI/custom_nodes” folder. Open command prompt using “cmd“. Clone the repository by typing following command:

git clone https://github.com/kijai/ComfyUI-MochiWrapper.git

All the respective models gets auto downloaded from Kijai’s Hugging Face when you initiate the Workflow for the first time. 

If you are interested to work with the raw model, then you can directly access it from Genmo’s Hugging face.

Take it into consideration, the model weights is quite huge in size. So, be patient while its getting downloaded. You can track the real-time status in terminal for ComfyUI running in the background. 

All the models get saved to “ComfyUI/models/diffusion_models/mochi” folder and VAE to “ComfyUI/models/vae/mochi” folder.

Workflow

1. You can get the Workflow from your “ComfyUI-MochiWrapper/examples” folder.

2. Just drag and drop to ComfyUI.

3. Put your positive detailed prompt for better result. 

We do not have H100 stacks, but we tested this raw model on RTX 4090. The video consistency was really impressive as compared to CogVideoX. But this is massive, eats a lot of your VRAM.
With torch compile and cublas ops, gguf q8 enabled its quite lower, about 40 minutes with 200 steps. We hope there will be better quantization support in the future.
Apart from this, they will going to add support for image to video as well.
admage

Share
Published by
admage

Recent Posts

InstantIR: Restore Your Images

Figuring out the model that can fix your low quality pictures? Now, restoring your low…

4 days ago

Omnigen: Next-Gen Image Generation & Editing Tool

Traditional diffusion models uses various mechanisms for image modification like ControlNet, IP-Adapter, Inpainting, Face detection,…

1 week ago

Local Installation of Stable Diffusion 3.5

So, it's finally here. Stable Diffusion 3.5 has been released by StabilityAI on October 22nd,…

4 weeks ago

Video Depth Mapper: Efficient 3D Depth Mapping Solution

Due to the extreme diversity of video content like motion, camera panning, and length, the…

1 month ago

Top 30 Negative Prompts for Reliable Diffusion Models

 You can use these negative prompts while generating images using Stable Diffusion models.1. Clear BrandingPrompt:text,…

1 month ago

Tennibot: The Tennis Ball Roomba

While some tech companies have lofty goals to transform drug discovery through AI or to…

1 month ago