Grok Imagine 1.5 Preview
Image-to-video generation with synced audio and expressive motion.
Input
Upload Image *

Click or drag files here
Image: JPG / PNG / WEBP, ≤10.0MB, up to 1 images, width & height ≥300px, aspect ratio 1:2.5 ~ 2.5:1
Prompt
Duration(s)
Resolution
Aspect Ratio
Result
View History| Model & Modality | Credits / Gen | Our Price (USD) | Official Price (USD) | DISCOUNT |
|---|---|---|---|---|
grok-imagine-video-1.5-preview, i2v, 480p videoGrok | 14.5 per second | $0.0647 | $0.08 | - 19% |
grok-imagine, i2v, 720p videoGrok | 25 per second | $0.1116 | $0.14 | - 20% |
grok-imagine, i2v, input image videoGrok | 2 per image | $0.0089 | $0.01 | - 11% |
Animate still images into short videos with synchronized audio using xAI’s Grok Imagine Video 1.5 preview model.
Prompt:
A massive rocket launching from a modern space center, engines igniting with intense flames and smoke, powerful liftoff, cinematic camera angle, dramatic lighting, realistic physics, clear blue sky, ultra detailed, high energy, 4K quality.”
Bring Any Image to Life, With Sound
Turn static images into dynamic videos while preserving subject identity, composition, and visual style.
Create synchronized dialogue, sound effects, ambient audio, and background music in a single generation.
Extend videos seamlessly from the last frame while maintaining motion, lighting, and scene continuity.
Maintain character appearance, visual style, and scene aesthetics across multiple video generations.
Edit and refine videos using natural language instructions without complex workflows.
Generate high-quality videos with realistic motion, smooth camera movements, and fast rendering speeds.
Grok Imagine Video 1.5 turns a static image into a dynamic video with realistic motion, natural interactions, and automatically generated sound. Upload a portrait, product photo, or illustration, and it transforms into a cinematic video with synced background music, sound effects, and ambient audio that match the visuals.
Grok Imagine Video 1.5 supports simultaneous generation of video and audio in a single pass, enabling true audio-visual co-generation. The system automatically produces context-aware sound, including synchronized action effects (e.g., blade swishes, footsteps), ambient audio (e.g., room tone, spatial reverb), background music, and dialogue, with natural lip-sync alignment. With only one image and a prompt, it can generate a cinema-grade video with fully integrated sound, eliminating the need for external post-production audio tools.
The model can expand a single image into a fully animated scene with improved motion consistency, physical realism, and fine-grained detail. It naturally reproduces complex phenomena such as fluid dynamics, rising steam, and translucent materials like glass, while preserving the original visual style. It also closely follows prompt instructions and supports natural-language-based camera control for more flexible scene direction.
Grok Imagine provides a fully integrated pipeline covering text-to-image, image editing, image-to-video, video generation, and clip extension, with Agent Mode enabling iterative creative refinement. This unified workflow is well-suited for short-form content, concept videos, and rapid prototyping, allowing users to efficiently transform ideas into production-ready video outputs within a single platform.
Grok Imagine Video 1.5 Preview recently claimed the #1 spot on the Image-to-Video Arena (720p) leaderboard, achieving a score of 1473 and surpassing Seedance 2.0's 1467. With a 52-point Elo improvement over its predecessor, Grok Imagine Video 1.5 ranks among the top-performing image-to-video models currently available on Crun.
| Model | Grok Imagine Video 1.5 | Seedance 2.0 |
|---|---|---|
Resolution | 720P | 1080P |
Video Length | 15s | 15s |
Frame Rate | 24fps | 24fps |
Audio-Visual Generation | Supported | Supported |
Reference Video | Not supported | Not supported |
Text-to-Video | Not supported | Supported |
Motion Quality | Medium | High |
Scene Complexity | Simple scenes | Multi-scene capable |
Character Consistency | Basic | Strong |
Generation Speed | Fast | Medium |
Control Level | Low–Medium | High (multi-modal control system) |