The generative media API we wanted to use

One API for every generative media job — at a fraction of the price.

VisionQ ships two first-party multimodal systems — Odyssey-image at $0.02 per generation and Odyssey-1-video at $0.10 per generation. Each model covers generation, editing, image-to-image and (for video) 5 and 10 second clips in a single endpoint. On equivalent workloads that's typically 5–10× cheaper than fal.ai and similar providers. Teams who outgrow the hosted models deploy their own checkpoints on the same autoscaling GPU fleet via Custom Inference.

Headquarters

London, 8 Stoney Lane

What we build

Generative image & video API • custom inference for user models • autoscaling GPU fleet • SDKs for Python, TS and Go

VisionQ engineers working on inference infrastructure

Who we serve

From solo developers shipping their first image app to ML teams running millions of generations a day.

Why we're doing this

Generative image and video models are moving faster than any team can keep up with internally. Building a reliable inference platform around them — with autoscaling, retries, observability and GPU-level cost attribution — takes months. Most teams don't want that to be their job.

We already built that platform for our own workloads. VisionQ is the product version: the same API, the same SDKs, the same GPU fleet — open to any team that wants to skip the infrastructure and get back to shipping their product.

Built for real workloads

We started VisionQ after running generation pipelines in production ourselves. Every abstraction in the API exists because we needed it — deterministic seeds, signed URLs, region pinning, per-request cost breakdowns.

Your model, our GPUs

We believe the best inference platform is one that doesn't lock you into its models. Custom Inference lets you ship any checkpoint, LoRA or pipeline you have rights to — and scale it the same way our hosted endpoints do.

Boring infrastructure

GPUs are fun. Production incidents are not. We obsess over the unglamorous parts — retries, cold-starts, observability, quotas — so your on-call rotation can stay quiet.

What guides us

Latency is a feature

We measure p50, p95 and p99 on every endpoint and publish them internally every week. Shipping a new model is only done when its tail latency is good, not just its average.

Reproducibility matters

Every generation records its seed, model version, GPU fingerprint and exact parameters. Six months later, you can rerun the same request and get the same output. Legal, QA and marketing all appreciate this more than we expected.

One API, many modalities

Images and video share auth, billing, SDK and dashboards. You shouldn't need a different vendor for every format, and you definitely shouldn't need a different invoice.

Have a workload we should benchmark?

Send us the model, the request profile and the latency budget. We'll benchmark it on our fleet, share the numbers and help you decide whether moving from self-hosted GPUs to VisionQ actually makes sense.