Parameters

Temperature 0.7
Max Tokens 2048
Top P 0.9

Enterprise Performance

This playground runs on dedicated H100 clusters utilizing TensorRT-LLM for sub-millisecond TTFT (Time To First Token) and 100+ tokens/sec throughput.

Interactive Inference

TTFT: -- ms
Speed: -- tok/s
Tokens: 0

Ready for Inference

Type a prompt below to see the streaming response powered by A3Gate's optimized infrastructure.