/usr/blog/~ sh -c index.runtty0
$ cat /etc/motd

Everything you need to ship fastest-in-arena video with Grok Imagine v1.0.

> 10 guides covering Grok Imagine v1.0 end to end.

> Seventeen-second 720p renders, native audio in every clip, fifteen-second maximum duration.

> Real numbers, real code, real pipelines..

╔════════════════════════════╦════════════════════════════╦════════════════════════════╗
║  10                        ║  6                         ║  17s                       ║
║  Published posts           ║  Topic categories          ║  Prompt to finished 720p cl║
╚════════════════════════════╩════════════════════════════╩════════════════════════════╝
[ SCENE 00 / OVERVIEW ]
$ info xai/grok-imagine-video/text-to-video

> Grok Imagine v1.0 at a glance

> Grok Imagine v1.0 is xAI's generative video stack, announced February 1 and 2, 2026, with the xAI API going live January 28 and the fal endpoints turning on January 29. Its calling card is speed. A typical 720p clip finishes about seventeen seconds after you press submit, roughly two to four times faster than every other model in the current arena. If you run video in production and latency is the thing that breaks your pipeline, this is the one that unblocks it.

> Speed is only interesting if quality holds, and v1.0 holds. Arena Elo puts Grok Imagine at rank five on text-to-video with 1232 and rank three on image-to-video with 1325. DesignArena's Q1 2026 tally ranks it number one across text-to-video, image-to-video, and editing. The ceiling is 720p at 24fps with native audio baked into every clip, capped at fifteen seconds per generation. You will not get 1080p here and you will not get minute-long shots, but what you do get is dialogue, foley, ambient sound, and lip sync rendered in the same pass as the pixels. That single-pass audio is why teams move faster on Grok even before you count the render-time advantage.

> The fal surface is five endpoints that slot cleanly into existing pipelines. `xai/grok-imagine-video/text-to-video` is the default entry point. `xai/grok-imagine-video/image-to-video` takes a still and animates it, billing the per-second video rate plus a small per-input surcharge. `xai/grok-imagine-video/edit-video` accepts a clip and a prompt, producing a re-timed, re-framed, or re-scored edit at a slight resolution trade-off. `xai/grok-imagine-image` and `xai/grok-imagine-image/edit` cover stills and still edits if you want to scaffold a storyboard before you animate. Pricing holds to round, predictable numbers: five cents per second at 480p, seven cents at 720p, two cents per image generation or edit. You can ship a sixty-clip social push for under twenty dollars, a storyboarded spot for under five, and a personal iteration at the cost of a coffee. For teams that were previously rate-limited by render queues on slower models, Grok Imagine changes the shape of the workflow.

[ WHO_ITS_FOR ]
  • -Product teams who need fast iteration loops on marketing and social video
  • -Agencies shipping lots of short dialogue clips with native audio
  • -Storyboard artists chaining Image to Image to Video in one session
  • -Developers benchmarking latency-first video backends
  • -Content teams moving off slower peers to recover queue time
[ WHEN_TO_PICK ]
  • -You need 720p video with native audio in under thirty seconds wall clock
  • -Your pipeline loops on prompt iteration and every saved minute compounds
  • -You want a single endpoint for text, image, and edit flows without stitching vendors
  • -The fifteen-second clip ceiling fits your format (social, ads, short dialogue)
  • -You are cost-sensitive and want five to seven cents per second, no tier math
# infra
Running Grok Imagine through fal.ai means one API key for every xai/grok-imagine-* endpoint plus the 600-plus other models you already call, serverless queues that absorb bursts, and webhook delivery so your worker never has to poll.
[ SCENE 01 / INTEGRATION ]
$ cat example.ts

> call Grok Imagine v1.0 in under 20 lines

typescriptxai/grok-imagine-video/text-to-video
import { fal } from "@fal-ai/client";

fal.config({ credentials: process.env.FAL_KEY });

const result = await fal.subscribe(
  "xai/grok-imagine-video/text-to-video",
  {
    input: {
      prompt:
        "Dusk on a Brooklyn rooftop. A woman in a denim jacket turns to camera and says: 'I told you the pitch would land.' Wind against the mic, distant traffic hum.",
      duration: 6,
      resolution: "720p",
      aspect_ratio: "16:9",
    },
    logs: true,
    onQueueUpdate: (update) => {
      if (update.status === "IN_PROGRESS") {
        update.logs?.forEach((log) => console.log(log.message));
      }
    },
  },
);

console.log(result.data.video.url);
$ node example.js
// { video: { url: "https://v3.fal.media/files/grok-imagine/..." }, seed: 1483921 }
[ READ THE FULL API REFERENCE ]
[ SCENE 03 / PRICING ]
$ cat /etc/pricing.tsv

> what Grok Imagine v1.0 costs on fal.ai

+--------------------------------------+----------------+--------------------------+------------+
| ENDPOINT                             | RATE           | EXAMPLE                  | COST       |
+--------------------------------------+----------------+--------------------------+------------+
| xai/grok-imagine-video/text-to-video | $0.05 per sec… | 6s 480p 16:9 with nativ… | $0.30      |
| xai/grok-imagine-video/text-to-video | $0.07 per sec… | 10s 720p 16:9 with nati… | $0.70      |
| xai/grok-imagine-video/image-to-vid… | $0.07 + $0.00… | 8s 720p from one still … | $0.562     |
| xai/grok-imagine-video/edit-video    | $0.06 to $0.0… | 10s edit pass at 854x480 | $0.60 to … |
| xai/grok-imagine-image               | $0.02 per ima… | 1 still at 1024x576      | $0.02      |
| xai/grok-imagine-image/edit          | $0.02 per ima… | 1 edit pass on a provid… | $0.02      |
+--------------------------------------+----------------+--------------------------+------------+

# Prices verified against fal.ai/pricing on April 19, 2026. Edit-video runs at 854x480 because the pipeline downscales inputs before re-rendering.

$ openhttps://fal.ai/pricing
[ SCENE 05 / COMPARISON ]
$ compare xai/grok-imagine-video/text-to-video --field=all

> Grok Imagine v1.0 vs the field

+--------------------+-----------+---------+---------------+--------+--------------------------------+------------------------------+
| MODEL              | RES       | DUR     | PRICE         | ELO    | ENDPOINT                       | BEST FOR                     |
+--------------------+-----------+---------+---------------+--------+--------------------------------+------------------------------+
| * Grok Imagine v1… | 720p      | 15s     | $0.07/s       | 1232 … | xai/grok-imagine-video/text-t… | Fastest finish in the arena… |
|   Kling 3.0 Pro    | 1080p     | 10s     | $0.09/s       | 1298   | fal-ai/kling-video/v3/pro/tex… | Clean motion at 1080p, stro… |
|   Seedance 2.0     | 1080p     | 12s     | $0.08/s       | 1270   | fal-ai/seedance-2.0/text-to-v… | Production-grade fidelity w… |
|   Veo 3.1          | 1080p     | 8s      | $0.12/s       | 1315   | fal-ai/veo/v3.1/text-to-video  | Cinematic ceiling with nati… |
|   HappyHorse 1.0   | 1080p     | 10s     | $0.10/s       | 1260   | fal-ai/happyhorse-1.0/text-to… | Stylized motion and painter… |
|   Sora 2 Pro       | 1080p     | 20s     | $0.15/s       | 1340   | -                              | Long-form coherence and nar… |
+--------------------+-----------+---------+---------------+--------+--------------------------------+------------------------------+

# * primary model

Grok Imagine wins when wall-clock speed beats raw resolution. Pick Veo 3.1 or Sora 2 Pro if you need 1080p cinematic finish, Seedance 2.0 or Kling 3.0 Pro for balanced fidelity, and Grok Imagine when iteration speed compounds across a project.

[ By the numbers ]

> The numbers.

# What this publication is and isn't, in numbers.

[01]
10
Published posts

> Each one is dated, second-person, and opinionated.

[02]
6
Topic categories

> Filter by the constraint you care about.

[03]
58
min
Total reading time

> Total length of every post in the archive.

[04]
0
Em-dashes tolerated

> Not a single U+2014 survives our ship check.

[05]
1
Featured picks

> Editor-selected cover stories.

[06]
100%
Posts illustrated

> Custom covers on every featured post.

$ wc -l /var/stats/* | tail -n 1
[ SCENE 07 / FAQ ]
$ man faq

> frequently asked

Q>How fast is Grok Imagine compared to other video models?

A>A typical call to `xai/grok-imagine-video/text-to-video` returns a finished 720p clip about seventeen seconds after submit. That is two to four times faster than Kling 3.0 Pro, Seedance 2.0, Veo 3.1, and HappyHorse on matched workloads. Speed wins because xAI's inference path skips the multi-pass refinement most competitors run. If your pipeline loops on prompt iteration, Grok Imagine saves real minutes per session. If you batch a sixty-clip social push through the queue, you finish the job before a peer model finishes fifteen.

Q>How does the pricing math actually work?

A>For `xai/grok-imagine-video/text-to-video` you pay five cents per second at 480p and seven cents per second at 720p. A six-second 720p clip costs forty-two cents. A ten-second 720p clip costs seventy cents. The fifteen-second ceiling at 720p tops out at $1.05. `xai/grok-imagine-video/image-to-video` adds two-tenths of a cent per input image. `xai/grok-imagine-video/edit-video` runs six to eight cents per second at 854x480. Stills through `xai/grok-imagine-image` and `xai/grok-imagine-image/edit` are two cents each. No tier math, no resolution multiplier, no subscription gate.

Q>What resolutions does Grok Imagine v1.0 support?

A>v1.0 caps at 720p for `xai/grok-imagine-video/text-to-video` and `xai/grok-imagine-video/image-to-video`. You can drop to 480p for cheaper iteration at five cents per second. The edit endpoint `xai/grok-imagine-video/edit-video` runs at 854x480 because the pipeline downscales the input clip before re-rendering. If your storyboard calls for 1080p finishes, Seedance 2.0 or Veo 3.1 via fal.ai are better picks. For a 720p ceiling with native audio and fastest turnaround, Grok Imagine is the one.

Q>Why is there a fifteen-second clip limit?

A>xAI tuned v1.0 for tight iteration and social-format output, so `xai/grok-imagine-video/text-to-video` caps at fifteen seconds per generation. That covers most social, ad, and short-dialogue formats without padding. If you need longer, chain clips with the Extend from Frame pattern: render a fifteen-second clip, pull the final frame, and pass it to `xai/grok-imagine-video/image-to-video` with a continuation prompt. Two chained calls give you a thirty-second arc with consistent subject. Three give you forty-five. Audio continuity across chains is softer than within a single generation, so plan foley or score at the edit stage.

Q>Why does edit-video downscale to 854x480?

A>`xai/grok-imagine-video/edit-video` reframes, re-times, and re-scores an input clip inside one inference pass. To keep the pass fast enough to hit the seventeen-second finish target, the pipeline downscales the input to 854x480 before touching it. If you need a higher-resolution edit, run the edit first to validate the change, then re-render the original prompt through `xai/grok-imagine-video/text-to-video` at 720p with the same parameters. Two calls, one coherent shot, and you keep the ceiling.

Q>Can I generate stills too, not just video?

A>Yes. `xai/grok-imagine-image` generates stills at two cents per image, and `xai/grok-imagine-image/edit` edits an existing still for the same rate. Stills are useful for storyboarding, character sheets, or keyframe seeding before you commit to a full render pass. A common workflow: generate a subject still with `xai/grok-imagine-image`, iterate with `xai/grok-imagine-image/edit`, then pass the locked still into `xai/grok-imagine-video/image-to-video` to animate. Three endpoints, one seed lineage, and you can scaffold a spot for under a dollar before any video cost lands.

Q>How do I chain clips longer than fifteen seconds?

A>The pattern is Extend from Frame. Render your first clip with `xai/grok-imagine-video/text-to-video` up to fifteen seconds. Pull the final frame of the output. Submit that frame to `xai/grok-imagine-video/image-to-video` with a continuation prompt that references the subject, the motion arc, and the emotional beat. The image-to-video endpoint accepts a still plus prompt and extends forward another fifteen seconds. Repeat as needed. Subject consistency holds across two or three chains before drift accumulates, which is fine for most ad and social formats.

Q>What is the content policy on Grok Imagine?

A>xAI applies standard safety filters to every `xai/grok-imagine-*` endpoint. The platform blocks CSAM, non-consensual intimate content, and known IP infringement. It is more permissive than most peers on stylized violence, edgy humor, and political caricature, which is consistent with xAI's broader positioning. If a render hits the filter, the response surfaces a flagged status and no credit is charged for that attempt. Re-prompt with softer framing and resubmit. For enterprise workloads with custom policy needs, contact xAI through the fal team for a tightened or loosened allowlist.

Q>How does Grok Imagine compare to Kling, Veo, and Sora?

A>Speed beats them all, quality sits in the top tier. Grok Imagine via `xai/grok-imagine-video/text-to-video` finishes a 720p clip in about seventeen seconds. Kling 3.0 Pro delivers 1080p at higher fidelity but two to three times slower. Veo 3.1 reaches cinematic ceiling with native audio but costs twelve cents per second and caps at eight seconds. Sora 2 Pro offers the longest shots at twenty seconds with the best narrative coherence, but runs fifteen cents per second. Pick Grok for iteration-heavy work, Seedance 2.0 or Kling for balanced fidelity, and Sora or Veo when the ceiling matters more than the queue.

Q>Why run Grok Imagine on fal.ai?

A>Running Grok Imagine through fal.ai covers eight practical wins. One, one API key authenticates every `xai/grok-imagine-*` endpoint plus 600-plus sibling models. Two, serverless queues absorb bursts without cold starts. Three, webhook delivery removes the need for polling loops. Four, unified billing means Grok credits land on the same invoice as every other model you call. Five, the fal dashboard gives you per-call logs and retry history. Six, the async queue pattern keeps long jobs off your request path. Seven, client SDKs in TypeScript and Python wrap the raw HTTP with sane defaults. Eight, if xAI rate limits you upstream, fal's proxy layer returns a clean error surface you can retry against instead of opaque timeouts.

Also reading