TurboJPEG and Rust PNG
SIMD-accelerated JPEG decode and a Rust/PyO3 PNG decoder replace PIL. ~2x on 4K JPEG and ~4.6x on 1K PNG.
A patched vLLM image that accelerates vision-language serving end-to-end, from JPEG and PNG decode to fused Triton kernels. Up to 225x faster on cached image batches versus vanilla vLLM.
autovllm patches both the CPU image pipeline and the GPU Triton kernels of vllm/vllm-openai. The same vLLM server you already know, now built for real multi-modal traffic.
SIMD-accelerated JPEG decode and a Rust/PyO3 PNG decoder replace PIL. ~2x on 4K JPEG and ~4.6x on 1K PNG.
A 512-entry decode cache with O(1) fingerprint lookup, plus a base64-level short-circuit that skips b64decode on cache hits.
Eight fused kernels: RMSNorm, residual+norm, SiLU MLP, QK-norm+RoPE, LM head+top-k, DeltaNet recurrent, and more.
Optimized RMSNorm, SwiGLU, and RoPE from Liger with backward-pass support. Drop in without any config changes.
Resolution capped to 4 MP before inter-process transfer. No more shipping full-res pixel buffers across the wire.
Ships as vlmrun/vlmrun-vllm-openai:v0.16.0-* on top of official vLLM. Swap one image, keep everything else.
Quick Start
autovllm ships as a patched Docker image on top of vllm/vllm-openai. Your OpenAI-compatible API surface, schedulers, and configs are unchanged. You just go faster.