Had another incident this evening, but the system dealt with it, stopping the unknown order out and flattening the position:
here are some details of what it suggests we do this time to mitigate the future scenarios happening:
Root cause (one sentence): order_manager._place_bracket_orders places the stop then the target sequentially with no flat-check between, so when an instant-stop-fill exits the position during bracket placement, the target gets placed
anyway and rests at the broker as an orphan limit.
3) Clean fix
Three layered changes in src/live/order_manager.py, all in _place_bracket_orders (line 567+), plus a hardening in _handle_exit_fill:
Layer 1 — pre-check before target placement (catches the common case where stop fills before target is even attempted):
# Take profit (limit) — not critical, stop provides protection
if position["target_price"] is not None:
# Stop may have filled instantly during placement, exiting the
# position via WS callback. Don't place an orphan target.
if position_key not in self.positions:
logger.warning(
"Position %s exited during bracket placement; "
"skipping target order",
position_key,
)
return
target_price = round_to_tick(position["target_price"], tick_size)
target_id = await self._place_order(...)
Layer 2 — post-check after target placement (catches the race where exit fires during the target's REST round-trip):
if target_id:
if position_key not in self.positions:
logger.warning(
"Position %s exited during target REST call; "
"cancelling orphan target %s",
position_key, target_id,
)
try:
await self.conn.cancel_order(target_id)
except Exception as e:
logger.error("Failed to cancel orphan target %s: %s",
target_id, e)
return # don't register
position["linked_orders"]["target_order_id"] = target_id
...
Layer 3 — same two checks around the stop. Rarer (stop is placed first) but possible on retry. Use the same pattern.
Could it have been faster:
- The orphan was open from 19:16:39 → 19:17:00 = 21 seconds, then closed.
- The Reconciler's poll loop alone would have closed it at ~19:16:55 (15s broker grace from 19:16:40 first poll). It was ~6 seconds slower because the grace clock had only just started.
- If you tighten grace_period_broker_s from 15 → 5 in config/reconciler.yaml, autoflatten would have happened ~10s after the orphan opened. Tradeoff: a tighter grace risks flattening a legitimate position whose registration is racing
with the WS event. 15s was chosen for a reason; I wouldn't tighten it.
- The legacy main.py reconciler has auto_flatten_untracked: false (default) — leave it that way. Two reconcilers fighting over the same flatten is worse than one slow one.
Could it have been prevented: Yes, completely — that's what the Layer 1+2 fix above does. The orphan target should never have rested at the broker in the first place. The 21-second response is acceptable defense-in-depth, but the right
place to fix this is at the source: don't create orphans.
here are some details of what it suggests we do this time to mitigate the future scenarios happening:
Root cause (one sentence): order_manager._place_bracket_orders places the stop then the target sequentially with no flat-check between, so when an instant-stop-fill exits the position during bracket placement, the target gets placed
anyway and rests at the broker as an orphan limit.
3) Clean fix
Three layered changes in src/live/order_manager.py, all in _place_bracket_orders (line 567+), plus a hardening in _handle_exit_fill:
Layer 1 — pre-check before target placement (catches the common case where stop fills before target is even attempted):
# Take profit (limit) — not critical, stop provides protection
if position["target_price"] is not None:
# Stop may have filled instantly during placement, exiting the
# position via WS callback. Don't place an orphan target.
if position_key not in self.positions:
logger.warning(
"Position %s exited during bracket placement; "
"skipping target order",
position_key,
)
return
target_price = round_to_tick(position["target_price"], tick_size)
target_id = await self._place_order(...)
Layer 2 — post-check after target placement (catches the race where exit fires during the target's REST round-trip):
if target_id:
if position_key not in self.positions:
logger.warning(
"Position %s exited during target REST call; "
"cancelling orphan target %s",
position_key, target_id,
)
try:
await self.conn.cancel_order(target_id)
except Exception as e:
logger.error("Failed to cancel orphan target %s: %s",
target_id, e)
return # don't register
position["linked_orders"]["target_order_id"] = target_id
...
Layer 3 — same two checks around the stop. Rarer (stop is placed first) but possible on retry. Use the same pattern.
Could it have been faster:
- The orphan was open from 19:16:39 → 19:17:00 = 21 seconds, then closed.
- The Reconciler's poll loop alone would have closed it at ~19:16:55 (15s broker grace from 19:16:40 first poll). It was ~6 seconds slower because the grace clock had only just started.
- If you tighten grace_period_broker_s from 15 → 5 in config/reconciler.yaml, autoflatten would have happened ~10s after the orphan opened. Tradeoff: a tighter grace risks flattening a legitimate position whose registration is racing
with the WS event. 15s was chosen for a reason; I wouldn't tighten it.
- The legacy main.py reconciler has auto_flatten_untracked: false (default) — leave it that way. Two reconcilers fighting over the same flatten is worse than one slow one.
Could it have been prevented: Yes, completely — that's what the Layer 1+2 fix above does. The orphan target should never have rested at the broker in the first place. The 21-second response is acceptable defense-in-depth, but the right
place to fix this is at the source: don't create orphans.