chore: audit frontend error reporting to exclude expected behaviours #73

Closed
opened 2026-03-19 12:07:11 -07:00 by hikari · 0 comments
Owner

Description

The frontend error webhook is firing for conditions that are normal, expected game behaviour — not actual errors. This is causing alert fatigue and burying real issues in noise.

Known False Positives

The following have been observed being reported as errors when they should be silent (or at most a UI-level warning):

  • auto_saveFailed to fetch: Network hiccup during a background auto-save. Expected on flaky connections; should not page anyone.
  • auto_prestigeNot eligible for prestige: The tick engine attempted auto-prestige but the player hasn't hit the threshold yet. This is normal game state, not an error.
  • challenge_bossBoss is not currently available: Auto-boss attempted to challenge a boss that's already defeated or locked. Expected during normal tick cycles.
  • start_explorationAn exploration is already in progress: Auto-quest or tick engine tried to start an exploration when one was running. Expected race condition in the tick loop.
  • collect_explorationExploration is not in progress: Collect attempted on an area that was already collected or not yet started. Expected timing edge case.

Expected Behaviour

Each error source should be audited to determine whether it represents:

  1. A true error — unexpected failure that warrants a webhook notification
  2. An expected business logic rejection — should be caught and silently discarded (or logged at a lower severity)
  3. A network/transient failure — should be retried or silently swallowed depending on context

The webhook should only fire for category 1.

Impact

  • Email inbox is being flooded with non-actionable alerts
  • Real errors risk being missed due to alert fatigue

This issue was created with help from Hikari~ 🌸

## Description The frontend error webhook is firing for conditions that are normal, expected game behaviour — not actual errors. This is causing alert fatigue and burying real issues in noise. ## Known False Positives The following have been observed being reported as errors when they should be silent (or at most a UI-level warning): - **`auto_save` — `Failed to fetch`**: Network hiccup during a background auto-save. Expected on flaky connections; should not page anyone. - **`auto_prestige` — `Not eligible for prestige`**: The tick engine attempted auto-prestige but the player hasn't hit the threshold yet. This is normal game state, not an error. - **`challenge_boss` — `Boss is not currently available`**: Auto-boss attempted to challenge a boss that's already defeated or locked. Expected during normal tick cycles. - **`start_exploration` — `An exploration is already in progress`**: Auto-quest or tick engine tried to start an exploration when one was running. Expected race condition in the tick loop. - **`collect_exploration` — `Exploration is not in progress`**: Collect attempted on an area that was already collected or not yet started. Expected timing edge case. ## Expected Behaviour Each error source should be audited to determine whether it represents: 1. A **true error** — unexpected failure that warrants a webhook notification 2. An **expected business logic rejection** — should be caught and silently discarded (or logged at a lower severity) 3. A **network/transient failure** — should be retried or silently swallowed depending on context The webhook should only fire for category 1. ## Impact - Email inbox is being flooded with non-actionable alerts - Real errors risk being missed due to alert fatigue ✨ This issue was created with help from Hikari~ 🌸
naomi closed this issue 2026-03-19 16:01:22 -07:00
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: nhcarrigan/elysium#73