Try HappyHorse 1.0 now

The world’s first native audio-visual synchronized model, up to 35% off

Grok Imagine 1.5 Preview

Image-to-video generation with synced audio and expressive motion.

Input

Upload Image *

View upload limits

Image: JPG / PNG / WEBP, ≤10.0MB, up to 1 images, width & height ≥300px, aspect ratio 1:2.5 ~ 2.5:1

Prompt

232 / 5000 ✖

Duration(s)

Resolution

480p

720p

Aspect Ratio

auto
16:9
9:16
1:1
3:2
2:3

Result

View History

Model & Modality	Credits / Gen	Our Price (USD)	Official Price (USD)	DISCOUNT
grok-imagine-video-1.5-preview, i2v, 480p videoGrok	1.6 per second	$0.0071	$0.08	- 91%
grok-imagine-video-1.5-preview, i2v, 720p videoGrok	3 per second	$0.0134	$0.14	- 90%
grok-imagine-video-1.5-preview, i2v, input image videoGrok	2 per image	$0.0089	$0.01	- 11%

Native Multimodal Audio

Grok Imagine Video 1.5 API

Name: Grok Imagine 1.5 API
Brand: Crun

Animate still images into short videos with synchronized audio using xAI’s Grok Imagine Video 1.5 preview model.

View Documentation

15s

Max Duration

24 fps

Frame rate

720P

Resolution

Prompt:

A massive rocket launching from a modern space center, engines igniting with intense flames and smoke, powerful liftoff, cinematic camera angle, dramatic lighting, realistic physics, clear blue sky, ultra detailed, high energy, 4K quality.”

Core Features

Grok Imagine 1.5 API Core Features

Bring Any Image to Life, With Sound

Image-to-Video Generation

Turn static images into dynamic videos while preserving subject identity, composition, and visual style.

Native Audio Generation

Create synchronized dialogue, sound effects, ambient audio, and background music in a single generation.

Video Extension

Extend videos seamlessly from the last frame while maintaining motion, lighting, and scene continuity.

Reference Consistency

Maintain character appearance, visual style, and scene aesthetics across multiple video generations.

Prompt-Based Video Editing

Edit and refine videos using natural language instructions without complex workflows.

Fast Cinematic Rendering

Generate high-quality videos with realistic motion, smooth camera movements, and fast rendering speeds.

What You Can Build with Grok Imagine Video 1.5

Grok Imagine Video 1.5 turns a static image into a dynamic video with realistic motion, natural interactions, and automatically generated sound. Upload a portrait, product photo, or illustration, and it transforms into a cinematic video with synced background music, sound effects, and ambient audio that match the visuals.

Unified Audio-Visual Generation Capability

Grok Imagine Video 1.5 supports simultaneous generation of video and audio in a single pass, enabling true audio-visual co-generation. The system automatically produces context-aware sound, including synchronized action effects (e.g., blade swishes, footsteps), ambient audio (e.g., room tone, spatial reverb), background music, and dialogue, with natural lip-sync alignment. With only one image and a prompt, it can generate a cinema-grade video with fully integrated sound, eliminating the need for external post-production audio tools.

Realistic Motion, Physics Simulation, and Detail Fidelity

The model can expand a single image into a fully animated scene with improved motion consistency, physical realism, and fine-grained detail. It naturally reproduces complex phenomena such as fluid dynamics, rising steam, and translucent materials like glass, while preserving the original visual style. It also closely follows prompt instructions and supports natural-language-based camera control for more flexible scene direction.

End-to-End Creative Workflow

Grok Imagine provides a fully integrated pipeline covering text-to-image, image editing, image-to-video, video generation, and clip extension, with Agent Mode enabling iterative creative refinement. This unified workflow is well-suited for short-form content, concept videos, and rapid prototyping, allowing users to efficiently transform ideas into production-ready video outputs within a single platform.