Most models fit comfortably within the memory of a single high-end consumer GPU. For these models, a single node handles the entire request independently. For the largest and most capable open-weight models which require more GPU memory than any single consumer card can provide FAR AI supports distributed inference across multiple machines on the same local network. The model is divided intelligently across participating nodes, with each machine handling its assigned portion of the computation. From the developer’s perspective, the response arrives exactly as it would from a single machine: streaming, fast, and complete. This multi-node capability is what allows FAR AI to serve the most powerful open-weight models without requiring enterprise-grade hardware at any single location.Documentation Index
Fetch the complete documentation index at: https://wp.farlabs.ai/llms.txt
Use this file to discover all available pages before exploring further.