A potential pitfall in any distributed inference network is that faster GPUs earn far more per hour than slower ones, even when both are equally reliable. This would make the Reliability Score meaningless as an earnings signal for operators with mid-range hardware. FAR AI addresses this differently depending on the inference mode in use. Single-machine and same-LAN inference For models that fit within a single node or a local cluster of nodes, FAR AI uses a tiered model catalog. Each model is tagged with a minimum hardware tier, and the orchestrator routes jobs only to nodes in the appropriate tier. Within each tier, nodes have comparable hardware, so a higher Reliability Score translates directly into proportionally more work and higher earnings. When demand temporarily exceeds a tier’s capacity, a hardware-aware rate adjustment ensures nodes serving above their native tier are compensated fairly for the additional cost. Cross-network distributed inference For models too large for any single device, the orchestrator splits the job across multiple nodes on the broader FAR AI network. In this mode there is no tier concept in the single-machine sense. The orchestrator identifies any combination of reachable nodes whose combined available GPU memory is sufficient to hold the model, assigns each node the weight shard that fits its hardware, and compensates each node proportionally to the memory it contributed and the energy it consumed. An operator with a smaller GPU is not below any tier, they contribute the slice they can carry and are paid for exactly that slice. Reliability Score still governs which nodes are selected when multiple combinations could satisfy the memory requirement.Documentation Index
Fetch the complete documentation index at: https://wp.farlabs.ai/llms.txt
Use this file to discover all available pages before exploring further.