Skip to main content
Every inference request completed through FAR AI generates a detailed record that developers can inspect: time to first token, end-to-end latency, tokens cached from previous requests, and energy consumed. These metrics are available through the developer portal and via API, enabling developers to monitor quality, optimize prompts, and make informed decisions about model selection and performance.