Fixing Sound Transit's ETAs

When I took Sound Transit this summer, it often gave me a "realtime" ETA of 0 minutes when the bus was 4 minutes away. But how did it end up being displayed as realtime? Long story that starts with

OneBusAway

A Sound Transit wrapper. Founded 2008, written in Java, comes from UW.

Want Sound Transit data? You'll need to use their API - auth in (you can use TEST or steal soundtransit.org's (yes even soundtransit.org relies on OneBusAway)) and parse (choose from JSON or GTFS-RT). Let's try it.

GET

https://api.pugetsound.onebusaway.org/api/where/arrivals-and-departures-for-stop/1_10912.json?key=00000000-0000-0000-0000-000000000000&minutesAfter=240

542 to Redmond 4m ago from 69s ago πŸ“

49 to Downtown Seattle Broadway 3m ago from 65s ago πŸ“

372 to Lake City Ravenna in 4s from 70s ago πŸ“

48 to Mount Baker Transit Center Central District in 87s from 2m ago ❌

48 to Mount Baker Transit Center Central District in 2m from 75s ago πŸ“

70 to Downtown Seattle Fairview in 4m from 40s ago ❌

44 to University Of Washington Medical Center Wallingford in 4m from 77s ago πŸ“

271 to Issaquah Bellevue in 6m from 40s ago ❌

372 to Lake City Ravenna in 14m from 67s ago πŸ”œ

48 to Mount Baker Transit Center Central District in 16m from 40s ago ❌

49 to Downtown Seattle Broadway in 17m from 40s ago ❌

70 to Downtown Seattle Fairview in 19m from 70s ago πŸ”œ

542 to Redmond in 20m from 40s ago ❌

44 to University Of Washington Medical Center Wallingford in 21m from 70s ago πŸ“

44 to University Of Washington Medical Center Wallingford in 27m from 78s ago πŸ“

372 to Lake City Ravenna in 29m from 3m ago ❌

372 to Lake City Ravenna in 29m from 65s ago πŸ”œ

48 to Mount Baker Transit Center Central District in 31m from 74s ago πŸ”œ

70 to Downtown Seattle Fairview in 34m from 77s ago πŸ”œ

44 to University Of Washington Medical Center Wallingford in 36m from 73s ago πŸ”œ

271 to Issaquah Bellevue schedule: in 36m ❌

49 to Downtown Seattle Broadway in 37m from 67s ago πŸ”œ

372 to Lake City Ravenna in 44m from 40s ago ❌

48 to Mount Baker Transit Center Central District in 46m from 67s ago πŸ”œ

44 to University Of Washington Medical Center Wallingford in 48m from 70s ago πŸ”œ

70 to Downtown Seattle Fairview in 49m from 69s ago πŸ”œ

70 to Downtown Seattle Fairview in 49m from 3m ago ❌

542 to Redmond in 50m from 67s ago πŸ”œ

49 to Downtown Seattle Broadway in 57m from 40s ago ❌

372 to Lake City Ravenna in 59m from 4m ago πŸ”œ

372 to Lake City Ravenna in 59m from 40s ago ❌

44 to University Of Washington Medical Center Wallingford in 60m from 70s ago πŸ”œ

48 to Mount Baker Transit Center Central District in 61m from 40s ago ❌

70 to Downtown Seattle Fairview in 64m from 40s ago ❌

271 to Issaquah Bellevue in 66m from 77s ago πŸ”œ

44 to University Of Washington Medical Center Wallingford in 72m from 73s ago πŸ”œ

372 to Lake City Ravenna in 74m from 78s ago πŸ”œ

48 to Mount Baker Transit Center Central District in 76m from 66s ago πŸ”œ

49 to Downtown Seattle Broadway in 77m from 64s ago πŸ”œ

70 to Downtown Seattle Fairview in 79m from 64s ago πŸ”œ

542 to Redmond in 80m from 40s ago ❌

44 to University Of Washington Medical Center Wallingford in 84m from 40s ago ❌

372 to Lake City Ravenna in 89m from 70s ago πŸ”œ

48 to Mount Baker Transit Center Central District in 91m from 70s ago πŸ”œ

70 to Downtown Seattle Fairview in 94m from 64s ago πŸ”œ

44 to University Of Washington Medical Center Wallingford in 96m from 77s ago πŸ”œ

271 to Issaquah Bellevue in 96m from 75s ago πŸ”œ

49 to Downtown Seattle Broadway in 97m from 65s ago πŸ”œ

372 to Lake City Ravenna in 104m from 67s ago πŸ”œ

48 to Mount Baker Transit Center Central District in 106m from 75s ago πŸ”œ

48 to Mount Baker Transit Center Central District in 106m from 2m ago ❌

44 to University Of Washington Medical Center Wallingford in 108m from 70s ago πŸ”œ

70 to Downtown Seattle Fairview in 109m from 40s ago ❌

542 to Redmond in 110m from 69s ago πŸ”œ

49 to Downtown Seattle Broadway in 117m from 40s ago ❌

372 to Lake City Ravenna in 119m from 65s ago πŸ”œ

372 to Lake City Ravenna in 119m from 3m ago ❌

44 to University Of Washington Medical Center Wallingford in 120m from 78s ago πŸ”œ

48 to Mount Baker Transit Center Central District in 121m from 40s ago ❌

70 to Downtown Seattle Fairview in 124m from 70s ago πŸ”œ

271 to Issaquah Bellevue in 126m from 71s ago πŸ”œ

44 to University Of Washington Medical Center Wallingford in 132m from 73s ago πŸ”œ

48 to Mount Baker Transit Center Central District in 134m from 74s ago πŸ”œ

372 to Lake City Ravenna in 134m from 40s ago ❌

49 to Downtown Seattle Broadway in 137m from 67s ago πŸ”œ

70 to Downtown Seattle Fairview in 139m from 77s ago πŸ”œ

542 to Redmond in 140m from 40s ago ❌

43 to Capitol Hill University Of Washington Medical Center in 143m from 70s ago πŸ”œ

372 to Lake City Ravenna in 149m from 4m ago πŸ”œ

372 to Lake City Ravenna in 149m from 40s ago ❌

48 to Mount Baker Transit Center Central District in 151m from 67s ago πŸ”œ

70 to Downtown Seattle Fairview in 154m from 69s ago πŸ”œ

70 to Downtown Seattle Fairview in 154m from 3m ago ❌

44 to University Of Washington Medical Center Wallingford in 156m from 70s ago πŸ”œ

271 to Issaquah Bellevue in 156m from 77s ago πŸ”œ

49 to Downtown Seattle Broadway in 157m from 40s ago ❌

372 to Lake City Ravenna in 164m from 78s ago πŸ”œ

48 to Mount Baker Transit Center Central District in 166m from 40s ago ❌

44 to University Of Washington Medical Center Wallingford in 167m from 73s ago πŸ”œ

542 to Redmond in 168m from 67s ago πŸ”œ

70 to Downtown Seattle Fairview in 169m from 40s ago ❌

49 to Downtown Seattle Broadway in 177m from 64s ago πŸ”œ

372 to Lake City Ravenna schedule: in 179m ❌

48 to Mount Baker Transit Center Central District in 181m from 66s ago πŸ”œ

44 to University Of Washington Medical Center Wallingford in 183m from 40s ago ❌

70 to Downtown Seattle Fairview in 185m from 64s ago πŸ”œ

271 to Issaquah Bellevue in 186m from 76s ago πŸ”œ

372 to Lake City Ravenna in 194m from 67s ago πŸ”œ

48 to Mount Baker Transit Center Central District in 196m from 70s ago πŸ”œ

49 to Downtown Seattle Broadway in 197m from 65s ago πŸ”œ

44 to University Of Washington Medical Center Wallingford in 198m from 77s ago πŸ”œ

70 to Downtown Seattle Fairview in 200m from 64s ago πŸ”œ

542 to Redmond in 200m from 40s ago ❌

372 to Lake City Ravenna in 209m from 65s ago πŸ”œ

372 to Lake City Ravenna in 209m from 3m ago ❌

48 to Mount Baker Transit Center Central District in 211m from 75s ago πŸ”œ

48 to Mount Baker Transit Center Central District in 211m from 2m ago ❌

44 to University Of Washington Medical Center Wallingford in 212m from 70s ago πŸ”œ

70 to Downtown Seattle Fairview in 215m from 40s ago ❌

49 to Downtown Seattle Broadway in 217m from 40s ago ❌

271 to Issaquah Bellevue in 217m from 40s ago ❌

372 to Lake City Ravenna in 224m from 40s ago ❌

44 to University Of Washington Medical Center Wallingford in 226m from 78s ago πŸ”œ

48 to Mount Baker Transit Center Central District in 226m from 40s ago ❌

542 to Redmond in 229m from 69s ago πŸ”œ

70 to Downtown Seattle Fairview in 230m from 70s ago πŸ”œ

49 to Downtown Seattle Broadway in 237m from 67s ago πŸ”œ

48 to Mount Baker Transit Center Central District in 239m from 74s ago πŸ”œ

372 to Lake City Ravenna in 239m from 4m ago πŸ”œ

372 to Lake City Ravenna in 239m from 40s ago ❌

98.2% (108 vs 2) have predicted: true (with 16 actually deviating from schedule). But only 71 predictions are based on a location; from there, 20 could be accurate (within an hour); from there, 7 are likely πŸ“ accurate (bus already running the trip in question).

Let's view the same data, but instead of using OneBusAway's JSON, use their GTFS-RT, and see what changes.

GET

https://api.pugetsound.onebusaway.org/api/gtfs_realtime/trip-updates-for-agency/[id].pbtext?key=00000000-0000-0000-0000-000000000000&removeAgencyIds=false

372 to Lake City Ravenna in 4s from 70s ago

48 to Mount Baker Transit Center Central District in 87s from 2m ago

48 to Mount Baker Transit Center Central District in 2m from 75s ago

70 to Downtown Seattle Fairview in 4m from 40s ago

44 to University Of Washington Medical Center Wallingford in 4m from 77s ago

271 to Issaquah Bellevue in 6m from 40s ago

372 to Lake City Ravenna in 14m from 67s ago

48 to Mount Baker Transit Center Central District in 16m from 40s ago

49 to Downtown Seattle Broadway in 17m from 40s ago

70 to Downtown Seattle Fairview in 19m from 70s ago

542 to Redmond in 20m from 40s ago

44 to University Of Washington Medical Center Wallingford in 21m from 70s ago

44 to University Of Washington Medical Center Wallingford in 27m from 78s ago

372 to Lake City Ravenna in 29m from 3m ago

372 to Lake City Ravenna in 29m from 65s ago

48 to Mount Baker Transit Center Central District in 31m from 74s ago

70 to Downtown Seattle Fairview in 34m from 77s ago

44 to University Of Washington Medical Center Wallingford in 36m from 73s ago

49 to Downtown Seattle Broadway in 37m from 67s ago

372 to Lake City Ravenna in 44m from 40s ago

48 to Mount Baker Transit Center Central District in 46m from 67s ago

44 to University Of Washington Medical Center Wallingford in 48m from 70s ago

70 to Downtown Seattle Fairview in 49m from 3m ago

70 to Downtown Seattle Fairview in 49m from 69s ago

542 to Redmond in 50m from 67s ago

49 to Downtown Seattle Broadway in 57m from 40s ago

372 to Lake City Ravenna in 59m from 4m ago

372 to Lake City Ravenna in 59m from 40s ago

44 to University Of Washington Medical Center Wallingford in 60m from 70s ago

48 to Mount Baker Transit Center Central District in 61m from 40s ago

44 to University Of Washington Medical Center Wallingford in 72m from 73s ago

44 to University Of Washington Medical Center Wallingford in 84m from 40s ago

It turns out that the GTFS-RT has only 32 ETAs at this stop. But the GTFS-RT is the ground truth. The GTFS-RT doesn't go beyond "in 84m" because we actually don't have data beyond that. So we've just found a bug with OneBusAway's JSON API:

Delay is appearing on all trip of vehicle and trip repeating multiple times Issue #216

"this trip is showing twice in list, and all other trips are carrying the same delay value even we didn't delay on those trips"

Fix block-level cache contamination PR #465

"if any trip on a block had GTFS‑RT data, the cache returned a non‑empty location list, and predicted=true leaked to every trip on that block even if they had no RT data."


So we've found out that 70.4% of ETAs are junk and don't exist in the upstream GTFS-RT. But is it just 70.4%, or are there more junk ETAs?

While you make a prediction, some fun facts:

Sound Transit's location data is much better than their ETA data.

Sound Transit's ETAs are accurate to the second at the start of the day.

Even though soundtransit.org fetches the next 4 hours' ETAs, it only uses the next hour's data.

That GTFS-RT table had to cross reference with the JSON to include line info, because OneBusAway has another bug (will be fixed by #466) that drops route_id from its GTFS-RT for everything but current trips.


If you said there are more junk ETAs, you're right. Think about it - how could an ETA more than an hour out possibly be accurate? Within GTFS-RT, just 19 ETAs are based on a location; from there, 18 could be accurate (within an hour); from there, 4 are likely accurate (bus already running the trip in question).

In other terms, 96.4% of ETAs might be junk, and 82.7% of ETAs are junk. Our only recourse is to get Sound Transit to fix their GTFS-RT feed to not include ETAs when they aren't actually realtime. And this fix is a two parter:

Generate and email Sound Transit a technical explanation of why/how to use NO_DATA when they don't have enough information to predict a realtime ETA

Sound Transit's GTFS-RT feed copies static schedule times into stop_time_update's arrival and departure time fields for trips where the vehicle hasn't started the route yet. This makes those stop_time_update blocks indistinguishable from actual real-time predictions. The spec provides NO_DATA for exactly this situation β€” it means "no real-time timing available, use the static schedule."

Current behavior

Many TripUpdates in the feed look like this:

trip_update {
trip {
  trip_id: "605627320"
  schedule_relationship: SCHEDULED
}
stop_time_update {
  arrival { time: 1783035360 }     # ← copied from static GTFS
  departure { time: 1783035360 }   # ← copied from static GTFS
  stop_id: "LS23561"
}
stop_time_update {
  arrival { time: 1783035585 }
  departure { time: 1783035585 }
  stop_id: "38567"
}
# ... 14 more stops with copied times ...
vehicle { id: "8096168" }
timestamp: 1783035094
}

There's no route_id, no delay field, and no explicit schedule_relationship on the stop_time_update entries (they default to SCHEDULED). The vehicle hasn't started this trip yet β€” in many cases it's still finishing a previous trip in the block, or it's not broadcasting position at all.

The time values in arrival and departure are exact copies of the static GTFS schedule. They are never anything else. A trip that's 40+ minutes from departure shows the same schedule times as a trip that's actively being tracked and running on time. There is no way to tell them apart by looking at the stop_time_update alone.

This isn't just optimistic extrapolation β€” it's wrong. The GTFS-RT spec says a populated time field in a StopTimeEvent means "prediction available." A schedule copy is not a prediction. Any consumer β€” Google Maps, Transit App, OneBusAway β€” will interpret these schedule echoes as real-time ETAs and display them with real-time indicators.

What the spec provides: NO_DATA

The StopTimeUpdate.schedule_relationship enum includes:

NO_DATA β€” "No real-time data is given for this stop. It indicates that there is no realtime timing information available. When set NO_DATA is propagated through subsequent stops so this is the recommended way of specifying from which stop you do not have realtime timing information."

And StopTimeEvent says:

time β€” "Forbidden if StopTimeUpdate.schedule_relationship is NO_DATA."

The correct representation for a trip that hasn't started yet is:

trip_update {
  trip {
    trip_id: "605627320"
    schedule_relationship: SCHEDULED
  }
  vehicle { id: "8096168" }
  timestamp: 1783035094
  stop_time_update {
    schedule_relationship: NO_DATA
    stop_id: "LS23561"
  }
}

This says:

  • Trip exists, vehicle is assigned β€” the TripUpdate and vehicle field remain
  • No real-time timing available β€” NO_DATA on the stops
  • Consumer should use the static GTFS for schedule times β€” which they already have and merge by design

Only one NO_DATA stop is needed per trip β€” the spec says it propagates to all subsequent stops automatically.

Current feed statistics

Category Count Description
Active TripUpdates (has route_id, has delay field) 162 Vehicle is being tracked β€” these are correct
Schedule stubs (no route_id, no delay, schedule times copied) 71 These are the problem β€” vehicle hasn't started the trip, times are fake predictions
Of which: vehicle is actively tracked on a different trip (same block) 20 Vehicle is currently elsewhere; will serve this trip next. Still no ETA for this trip yet.
Of which: vehicle is not currently tracked anywhere 51 Vehicle ID is known but not broadcasting position. Assignment may be speculative.

My request

For every stop_time_update on a trip where the vehicle hasn't actually started that trip yet, replace the current pattern (schedule times in arrival/departure, implicit SCHEDULED) with:

stop_time_update {
  schedule_relationship: NO_DATA
  stop_id: "..."
}
  • Keep the TripUpdate as a whole + its vehicle field β€” consumers should still know which bus is assigned
  • Keep the TripUpdate's trip_id β€” the trip exists
  • Drop the schedule times from arrival/departure β€” they aren't predictions
  • When the vehicle actually starts the trip, populate the real arrival/departure with SCHEDULED + time, set route_id on the trip, and include a delay field (as the feed already does for active trips)

Static GTFS already provides the schedule. GTFS-RT should only provide data that improves on it. Schedule copies do not improve on it β€” they actively mislead consumers into thinking real-time predictions exist when they don't.

Observed on the live trip-updates feed at api.pugetsound.onebusaway.org/api/gtfs_realtime/trip-updates-for-agency/40.pb.

Passthrough NO_DATA schedule_relationship for stop_time_updates PR #467

"Passthrough schedule_relationship value directly instead of hardcoding SCHEDULED"


The only thing left to do is wait on SoundTransit and OneBusAway.

More posts