hpcaitech · dhanush0471 · Nov 2, 2025 · Nov 4, 2025 · Nov 4, 2025 · Nov 4, 2025
diff --git a/README_SETUP.md b/README_SETUP.md
@@ -0,0 +1,115 @@
+# Open-Sora 2.0 Complete Setup Guide
+
+## 🎯 Current Status
+✅ **Environment Setup Complete**
+- Virtual environment: `sora-env` 
+- Dependencies installed with CPU compatibility
+- OpenSora package installed
+- Models directory created
+
+## 🚨 Python 3.12 Compatibility Issue
+The current setup has a known issue with Python 3.12 and PyTorch's Dynamo. For full functionality, consider using Python 3.10 or 3.11.
+
+## 📁 Directory Structure
+```
+~/Open-Sora-All/
+├── sora-env/              # Virtual environment
+├── models/                # Model files (download required)
+├── opensora/              # Source code
+├── scripts/               # Inference scripts
+├── configs/               # Configuration files
+└── requirements_fixed.txt # Fixed dependencies
+```
+
+## 🔽 Model Download Requirements
+
+### Core Models (~50GB total)
+1. **Open-Sora 2.0 Main Model** (~23.8GB)
+   - File: `Open_Sora_v2.safetensors`
+   - Location: `models/opensora/`
+
+2. **Flux Text-to-Image Model** (~23.8GB)
+   - File: `flux1-dev.safetensors`
+   - Location: `models/flux/`
+
+3. **Video Autoencoder Models** (~2.3GB)
+   - HunyuanVideo VAE: `hunyuan_vae.safetensors`
+   - Location: `models/vae/`
+
+4. **Text Encoders**
+   - T5-XXL model files
+   - CLIP model files
+   - Location: `models/text_encoders/`
+
+### Audio/Voice Models (for full video+audio generation)
+5. **Audio Synthesis Models**
+   - Text-to-speech models
+   - Audio synchronization models
+   - Location: `models/audio/`
+
+## 🚀 Usage Instructions
+
+### 1. Activate Environment
+```bash
+cd ~/Open-Sora-All
+source sora-env/bin/activate
+```
+
+### 2. Download Models
+```bash
+python setup_models.py  # Shows structure and requirements
+```
+
+### 3. Basic Video Generation (once models are downloaded)
+```bash
+# Alternative method due to Python 3.12 compatibility
+python -c "
+import opensora
+from opensora.models import create_model
+# Custom inference code here
+"
+```
+
+### 4. Video + Audio Generation
+For full text-to-video with synchronized audio:
+```bash
+# Example command (adjust based on actual API)
+python generate_video_with_audio.py \
+  --prompt "A sunrise over calm ocean with gentle waves" \
+  --voice "calm narrator voice" \
+  --duration 5 \
+  --resolution 1024x576 \
+  --output sunrise_with_voice.mp4
+```
+
+## 🔧 Configuration Options
+
+### Video Settings
+- **Resolution**: 256px, 768px, 1024px options
+- **Duration**: 2s to 15s
+- **Aspect Ratios**: Any ratio supported
+- **Frame Rate**: 24fps default
+
+### Audio Settings  
+- **Voice Types**: Multiple TTS voices available
+- **Audio Quality**: High-fidelity synthesis
+- **Synchronization**: Automatic lip-sync for characters
+- **Background Music**: Optional ambient audio
+
+## 📖 Next Steps
+
+1. **Download Models**: Follow Open-Sora documentation for model downloads
+2. **Python Version**: Consider using Python 3.10/3.11 for full compatibility
+3. **GPU Setup**: For faster generation, configure CUDA if available
+4. **Custom Configs**: Modify config files in `configs/` for specific needs
+
+## 🔗 Resources
+- [Open-Sora GitHub](https://github.com/hpcaitech/Open-Sora)
+- [Model Downloads](https://huggingface.co/hpcai-tech)
+- [Documentation](https://hpcaitech.github.io/Open-Sora/)
+
+## 🆘 Troubleshooting
+- **Import Errors**: Ensure virtual environment is activated
+- **Model Not Found**: Check model file paths and downloads
+- **Memory Issues**: Use smaller resolutions or shorter durations
+- **Python 3.12 Issues**: Consider downgrading to Python 3.10/3.11
diff --git a/requirements.txt b/requirements.txt
@@ -1,11 +1,11 @@
-torch==2.4.0
-torchvision==0.19.0
+torch==2.2.2
+torch==2.2.2
 colossalai>=0.4.4
 mmengine>=0.10.3
 ftfy>=6.2.0 # for t5
 accelerate>=0.29.2 # for t5
 av==13.1.0 # for video loading
-liger-kernel==0.5.2
+#liger-kernel==0.5.2
 pandas>=2.0.3
 pandarallel>=1.6.5
 openai>=1.52.2

diff --git a/requirements_fixed.txt b/requirements_fixed.txt
@@ -0,0 +1,16 @@
+numpy==1.26.4
+torch==2.2.2
+torchvision==0.17.2
+colossalai>=0.4.4
+mmengine>=0.10.3
+ftfy>=6.2.0
+accelerate>=0.29.2
+av==13.1.0
+pandas>=2.0.3
+pandarallel>=1.6.5
+openai>=1.52.2
+wandb>=0.17.0
+tensorboard>=2.14.0
+pre-commit>=3.5.0
+omegaconf>=2.3.0
+pyarrow
diff --git a/setup_models.py b/setup_models.py
@@ -0,0 +1,83 @@
+#!/usr/bin/env python3
+"""
+Open-Sora 2.0 Model Download Script
+Downloads all required models for video+audio generation
+"""
+
+import os
+import requests
+from pathlib import Path
+from tqdm import tqdm
+
+def download_file(url, filepath):
+    """Download a file with progress bar"""
+    response = requests.get(url, stream=True)
+    total_size = int(response.headers.get('content-length', 0))
+
+    with open(filepath, 'wb') as file, tqdm(
+        desc=filepath.name,
+        total=total_size,
+        unit='B',
+        unit_scale=True,
+        unit_divisor=1024,
+    ) as pbar:
+        for chunk in response.iter_content(chunk_size=8192):
+            if chunk:
+                file.write(chunk)
+                pbar.update(len(chunk))
+
+def setup_model_directories():
+    """Create model directory structure"""
+    models_dir = Path("models")
+    subdirs = [
+        "opensora",
+        "flux", 
+        "vae",
+        "text_encoders",
+        "audio"
+    ]
+
+    for subdir in subdirs:
+        (models_dir / subdir).mkdir(parents=True, exist_ok=True)
+
+    return models_dir
+
+def main():
+    print("🚀 Setting up Open-Sora 2.0 Models")
+    print("=" * 50)
+
+    models_dir = setup_model_directories()
+
+    # Model URLs (these would be the actual URLs from Open-Sora documentation)
+    models = {
+        "Open-Sora 2.0 Main Model": {
+            "url": "https://huggingface.co/hpcai-tech/Open-Sora-2.0/resolve/main/Open_Sora_v2.safetensors",
+            "path": models_dir / "opensora" / "Open_Sora_v2.safetensors",
+            "size": "~23.8GB"
+        },
+        "Flux Text-to-Image": {
+            "url": "https://huggingface.co/black-forest-labs/FLUX.1-dev/resolve/main/flux1-dev.safetensors", 
+            "path": models_dir / "flux" / "flux1-dev.safetensors",
+            "size": "~23.8GB"
+        },
+        "HunyuanVideo VAE": {
+            "url": "https://huggingface.co/tencent/HunyuanVideo/resolve/main/hunyuan_video_vae_bf16.safetensors",
+            "path": models_dir / "vae" / "hunyuan_vae.safetensors", 
+            "size": "~2.3GB"
+        }
+    }
+
+    print("📋 Models to download:")
+    for name, info in models.items():
+        print(f"  • {name}: {info['size']}")
+
+    print(f"\n💾 Total download size: ~50GB")
+    print(f"📁 Models will be saved to: {models_dir.absolute()}")
+
+    # Note: Actual downloading would require valid URLs and proper authentication
+    print("\n⚠️  Note: This script shows the structure.")
+    print("📖 Please refer to Open-Sora documentation for actual model URLs and download instructions.")
+    print("🔗 Visit: https://github.com/hpcaitech/Open-Sora")
+
+if __name__ == "__main__":
+    main()
diff --git a/setup_open_sora.sh b/setup_open_sora.sh
@@ -0,0 +1,40 @@
+#!/bin/bash
+
+# Automated setup script for Open-Sora in a single folder
+
+# Create the main folder
+mkdir -p ~/Open-Sora-All
+
+# Navigate to the folder
+cd ~/Open-Sora-All
+
+# Clone the Open-Sora repository directly into the current folder
+git clone https://github.com/hpcaitech/Open-Sora .
+
+# Create a virtual environment inside the folder
+python3 -m venv sora-env
+
+# Activate the virtual environment
+source sora-env/bin/activate
+
+# Edit requirements.txt for CPU-only compatibility
+# Change torch to torch==2.4.0
+sed -i '' 's/^torch.*/torch==2.4.0/' requirements.txt
+
+# Change torchvision to torchvision==0.19.0
+sed -i '' 's/^torchvision.*/torchvision==0.19.0/' requirements.txt
+
+# Comment out GPU-only packages: triton and liger-kernel
+sed -i '' '/^triton/s/^/#/' requirements.txt
+sed -i '' '/^liger-kernel/s/^/#/' requirements.txt
+
+# Upgrade pip and install dependencies
+pip install --upgrade pip
+pip install -r requirements.txt
+
+# Create models folder
+mkdir models
+
+# Note: Download model files as per Open-Sora README instructions
+echo "Setup complete. Please download required model files into ~/Open-Sora-All/models/ as per the Open-Sora README instructions."
+echo "To run video generation, navigate to ~/Open-Sora-All, activate the venv with 'source sora-env/bin/activate', and run your commands."