fix: review cost estimation accuracy after v2.1.75 token overcounting fix #228

Closed
opened 2026-03-20 08:26:11 -07:00 by hikari · 0 comments
Owner

Background

CLI v2.1.75 fixed token estimation over-counting for thinking and tool_use blocks that was causing premature context compaction. This means the actual token counts reported for these block types are now lower than before.

Why It Matters for Hikari Desktop

Hikari Desktop performs its own cost estimation in wsl_bridge.rs when a session is interrupted (before the final result message arrives with accurate usage). The estimation logic uses character-to-token ratios. If the CLI was previously over-counting these blocks, our estimation may have been calibrated against inflated numbers.

We should:

  1. Review the estimation logic for thinking and tool_use blocks in wsl_bridge.rs
  2. Check whether our conservative chars/token ratio and safety margin still produce reasonable estimates against real post-v2.1.75 usage data
  3. Update the ratio/margin if testing shows systematic over-estimation

This is a low-urgency calibration task, but worth tracking as part of the v2.1.75 audit.

Acceptance Criteria

  • Cost estimation accuracy reviewed against real sessions on CLI ≥ 2.1.75
  • Estimation constants updated if materially off
  • Tests updated to reflect any changed constants

This issue was created with help from Hikari~ 🌸

## Background CLI v2.1.75 fixed token estimation over-counting for `thinking` and `tool_use` blocks that was causing premature context compaction. This means the actual token counts reported for these block types are now lower than before. ## Why It Matters for Hikari Desktop Hikari Desktop performs its own cost estimation in `wsl_bridge.rs` when a session is interrupted (before the final `result` message arrives with accurate usage). The estimation logic uses character-to-token ratios. If the CLI was previously over-counting these blocks, our estimation may have been calibrated against inflated numbers. We should: 1. Review the estimation logic for `thinking` and `tool_use` blocks in `wsl_bridge.rs` 2. Check whether our conservative chars/token ratio and safety margin still produce reasonable estimates against real post-v2.1.75 usage data 3. Update the ratio/margin if testing shows systematic over-estimation This is a low-urgency calibration task, but worth tracking as part of the v2.1.75 audit. ## Acceptance Criteria - [ ] Cost estimation accuracy reviewed against real sessions on CLI ≥ 2.1.75 - [ ] Estimation constants updated if materially off - [ ] Tests updated to reflect any changed constants ✨ This issue was created with help from Hikari~ 🌸
naomi closed this issue 2026-03-23 14:28:09 -07:00
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: nhcarrigan/hikari-desktop#228