Anytime inference

Connect internal settling dynamics to product-facing latency choices.

This chapter is where the runtime meets user experience. The model can stop early, keep iterating, or spend extra compute only when the confidence curve says the gain is worth it.

  • Fast mode should feel decisive rather than careless.
  • Balanced mode is the default tradeoff between certainty and response time.
  • Careful mode spends more latency to settle the state more tightly.
Exit policy

Anytime inference stops when the field is ready.

Threshold

medium

Latency

moderate

let the model take a few extra steps when the signal is mixed.

Confidence curve

Confidence rises as latency accumulates.

confidence
latency
Token-step continuation

Keep iterating only while the expected gain still beats the cost.

step 1
step 2
step 3
stop

Anytime inference connects the internal state machinery to product behavior: the model can return an answer early, keep going for higher confidence, or stop once the marginal improvement fades.