Fixing Sound Transit's ETAs
When I took Sound Transit this summer, it often gave me a "realtime" ETA of 0 minutes when the bus was 4 minutes away. But how did it end up being displayed as realtime? Long story that starts with
OneBusAway
A Sound Transit wrapper. Founded 2008, written in Java, comes from UW.
Want Sound Transit data? You'll need to use their API - auth in (you can use TEST or
steal soundtransit.org's (yes even soundtransit.org relies on OneBusAway))
and parse (choose from JSON or GTFS-RT). Let's try it.
GET
https://api.pugetsound.onebusaway.org/api/where/arrivals-and-departures-for-stop/1_10912.json?key=00000000-0000-0000-0000-000000000000&minutesAfter=240
542 to Redmond 4m ago from 69s ago π
49 to Downtown Seattle Broadway 3m ago from 65s ago π
372 to Lake City Ravenna in 4s from 70s ago π
48 to Mount Baker Transit Center Central District in 87s from 2m ago β
48 to Mount Baker Transit Center Central District in 2m from 75s ago π
70 to Downtown Seattle Fairview in 4m from 40s ago β
44 to University Of Washington Medical Center Wallingford in 4m from 77s ago π
271 to Issaquah Bellevue in 6m from 40s ago β
372 to Lake City Ravenna in 14m from 67s ago π
48 to Mount Baker Transit Center Central District in 16m from 40s ago β
49 to Downtown Seattle Broadway in 17m from 40s ago β
70 to Downtown Seattle Fairview in 19m from 70s ago π
542 to Redmond in 20m from 40s ago β
44 to University Of Washington Medical Center Wallingford in 21m from 70s ago π
44 to University Of Washington Medical Center Wallingford in 27m from 78s ago π
372 to Lake City Ravenna in 29m from 3m ago β
372 to Lake City Ravenna in 29m from 65s ago π
48 to Mount Baker Transit Center Central District in 31m from 74s ago π
70 to Downtown Seattle Fairview in 34m from 77s ago π
44 to University Of Washington Medical Center Wallingford in 36m from 73s ago π
271 to Issaquah Bellevue schedule: in 36m β
49 to Downtown Seattle Broadway in 37m from 67s ago π
372 to Lake City Ravenna in 44m from 40s ago β
48 to Mount Baker Transit Center Central District in 46m from 67s ago π
44 to University Of Washington Medical Center Wallingford in 48m from 70s ago π
70 to Downtown Seattle Fairview in 49m from 69s ago π
70 to Downtown Seattle Fairview in 49m from 3m ago β
542 to Redmond in 50m from 67s ago π
49 to Downtown Seattle Broadway in 57m from 40s ago β
372 to Lake City Ravenna in 59m from 4m ago π
372 to Lake City Ravenna in 59m from 40s ago β
44 to University Of Washington Medical Center Wallingford in 60m from 70s ago π
48 to Mount Baker Transit Center Central District in 61m from 40s ago β
70 to Downtown Seattle Fairview in 64m from 40s ago β
271 to Issaquah Bellevue in 66m from 77s ago π
44 to University Of Washington Medical Center Wallingford in 72m from 73s ago π
372 to Lake City Ravenna in 74m from 78s ago π
48 to Mount Baker Transit Center Central District in 76m from 66s ago π
49 to Downtown Seattle Broadway in 77m from 64s ago π
70 to Downtown Seattle Fairview in 79m from 64s ago π
542 to Redmond in 80m from 40s ago β
44 to University Of Washington Medical Center Wallingford in 84m from 40s ago β
372 to Lake City Ravenna in 89m from 70s ago π
48 to Mount Baker Transit Center Central District in 91m from 70s ago π
70 to Downtown Seattle Fairview in 94m from 64s ago π
44 to University Of Washington Medical Center Wallingford in 96m from 77s ago π
271 to Issaquah Bellevue in 96m from 75s ago π
49 to Downtown Seattle Broadway in 97m from 65s ago π
372 to Lake City Ravenna in 104m from 67s ago π
48 to Mount Baker Transit Center Central District in 106m from 75s ago π
48 to Mount Baker Transit Center Central District in 106m from 2m ago β
44 to University Of Washington Medical Center Wallingford in 108m from 70s ago π
70 to Downtown Seattle Fairview in 109m from 40s ago β
542 to Redmond in 110m from 69s ago π
49 to Downtown Seattle Broadway in 117m from 40s ago β
372 to Lake City Ravenna in 119m from 65s ago π
372 to Lake City Ravenna in 119m from 3m ago β
44 to University Of Washington Medical Center Wallingford in 120m from 78s ago π
48 to Mount Baker Transit Center Central District in 121m from 40s ago β
70 to Downtown Seattle Fairview in 124m from 70s ago π
271 to Issaquah Bellevue in 126m from 71s ago π
44 to University Of Washington Medical Center Wallingford in 132m from 73s ago π
48 to Mount Baker Transit Center Central District in 134m from 74s ago π
372 to Lake City Ravenna in 134m from 40s ago β
49 to Downtown Seattle Broadway in 137m from 67s ago π
70 to Downtown Seattle Fairview in 139m from 77s ago π
542 to Redmond in 140m from 40s ago β
43 to Capitol Hill University Of Washington Medical Center in 143m from 70s ago π
372 to Lake City Ravenna in 149m from 4m ago π
372 to Lake City Ravenna in 149m from 40s ago β
48 to Mount Baker Transit Center Central District in 151m from 67s ago π
70 to Downtown Seattle Fairview in 154m from 69s ago π
70 to Downtown Seattle Fairview in 154m from 3m ago β
44 to University Of Washington Medical Center Wallingford in 156m from 70s ago π
271 to Issaquah Bellevue in 156m from 77s ago π
49 to Downtown Seattle Broadway in 157m from 40s ago β
372 to Lake City Ravenna in 164m from 78s ago π
48 to Mount Baker Transit Center Central District in 166m from 40s ago β
44 to University Of Washington Medical Center Wallingford in 167m from 73s ago π
542 to Redmond in 168m from 67s ago π
70 to Downtown Seattle Fairview in 169m from 40s ago β
49 to Downtown Seattle Broadway in 177m from 64s ago π
372 to Lake City Ravenna schedule: in 179m β
48 to Mount Baker Transit Center Central District in 181m from 66s ago π
44 to University Of Washington Medical Center Wallingford in 183m from 40s ago β
70 to Downtown Seattle Fairview in 185m from 64s ago π
271 to Issaquah Bellevue in 186m from 76s ago π
372 to Lake City Ravenna in 194m from 67s ago π
48 to Mount Baker Transit Center Central District in 196m from 70s ago π
49 to Downtown Seattle Broadway in 197m from 65s ago π
44 to University Of Washington Medical Center Wallingford in 198m from 77s ago π
70 to Downtown Seattle Fairview in 200m from 64s ago π
542 to Redmond in 200m from 40s ago β
372 to Lake City Ravenna in 209m from 65s ago π
372 to Lake City Ravenna in 209m from 3m ago β
48 to Mount Baker Transit Center Central District in 211m from 75s ago π
48 to Mount Baker Transit Center Central District in 211m from 2m ago β
44 to University Of Washington Medical Center Wallingford in 212m from 70s ago π
70 to Downtown Seattle Fairview in 215m from 40s ago β
49 to Downtown Seattle Broadway in 217m from 40s ago β
271 to Issaquah Bellevue in 217m from 40s ago β
372 to Lake City Ravenna in 224m from 40s ago β
44 to University Of Washington Medical Center Wallingford in 226m from 78s ago π
48 to Mount Baker Transit Center Central District in 226m from 40s ago β
542 to Redmond in 229m from 69s ago π
70 to Downtown Seattle Fairview in 230m from 70s ago π
49 to Downtown Seattle Broadway in 237m from 67s ago π
48 to Mount Baker Transit Center Central District in 239m from 74s ago π
372 to Lake City Ravenna in 239m from 4m ago π
372 to Lake City Ravenna in 239m from 40s ago β
98.2% (108 vs 2) have predicted: true
(with 16 actually deviating from schedule). But only 71 predictions are based on a location; from there, 20 could be accurate (within an hour); from there, 7 are likely π accurate (bus already running the trip in question).
Let's view the same data, but instead of using OneBusAway's JSON, use their GTFS-RT, and see what changes.
GET
https://api.pugetsound.onebusaway.org/api/gtfs_realtime/trip-updates-for-agency/[id].pbtext?key=00000000-0000-0000-0000-000000000000&removeAgencyIds=false
372 to Lake City Ravenna in 4s from 70s ago
48 to Mount Baker Transit Center Central District in 87s from 2m ago
48 to Mount Baker Transit Center Central District in 2m from 75s ago
70 to Downtown Seattle Fairview in 4m from 40s ago
44 to University Of Washington Medical Center Wallingford in 4m from 77s ago
271 to Issaquah Bellevue in 6m from 40s ago
372 to Lake City Ravenna in 14m from 67s ago
48 to Mount Baker Transit Center Central District in 16m from 40s ago
49 to Downtown Seattle Broadway in 17m from 40s ago
70 to Downtown Seattle Fairview in 19m from 70s ago
542 to Redmond in 20m from 40s ago
44 to University Of Washington Medical Center Wallingford in 21m from 70s ago
44 to University Of Washington Medical Center Wallingford in 27m from 78s ago
372 to Lake City Ravenna in 29m from 3m ago
372 to Lake City Ravenna in 29m from 65s ago
48 to Mount Baker Transit Center Central District in 31m from 74s ago
70 to Downtown Seattle Fairview in 34m from 77s ago
44 to University Of Washington Medical Center Wallingford in 36m from 73s ago
49 to Downtown Seattle Broadway in 37m from 67s ago
372 to Lake City Ravenna in 44m from 40s ago
48 to Mount Baker Transit Center Central District in 46m from 67s ago
44 to University Of Washington Medical Center Wallingford in 48m from 70s ago
70 to Downtown Seattle Fairview in 49m from 3m ago
70 to Downtown Seattle Fairview in 49m from 69s ago
542 to Redmond in 50m from 67s ago
49 to Downtown Seattle Broadway in 57m from 40s ago
372 to Lake City Ravenna in 59m from 4m ago
372 to Lake City Ravenna in 59m from 40s ago
44 to University Of Washington Medical Center Wallingford in 60m from 70s ago
48 to Mount Baker Transit Center Central District in 61m from 40s ago
44 to University Of Washington Medical Center Wallingford in 72m from 73s ago
44 to University Of Washington Medical Center Wallingford in 84m from 40s ago
It turns out that the GTFS-RT has only 32 ETAs at this stop. But the GTFS-RT is the ground truth. The GTFS-RT doesn't go beyond "in 84m" because we actually don't have data beyond that. So we've just found a bug with OneBusAway's JSON API:
Delay is appearing on all trip of vehicle and trip repeating multiple times Issue #216
"this trip is showing twice in list, and all other trips are carrying the same delay value even we didn't delay on those trips"
Fix block-level cache contamination PR #465
"if any trip on a block had GTFSβRT data, the cache returned a nonβempty location list,
and predicted=true leaked to every trip on that block even if they had no RT data."
So we've found out that 70.4% of ETAs are junk and don't exist in the upstream GTFS-RT. But is it just 70.4%, or are there more junk ETAs?
While you make a prediction, some fun facts:
Sound Transit's location data is much better than their ETA data.
Sound Transit's ETAs are accurate to the second at the start of the day.
Even though soundtransit.org fetches the next 4 hours' ETAs, it only uses the next
hour's data.
That GTFS-RT table had to cross reference with the JSON to include line info, because
OneBusAway has another bug (will be fixed by #466) that drops route_id from its GTFS-RT for everything but current trips.
If you said there are more junk ETAs, you're right. Think about it - how could an ETA more than an hour out possibly be accurate? Within GTFS-RT, just 19 ETAs are based on a location; from there, 18 could be accurate (within an hour); from there, 4 are likely accurate (bus already running the trip in question).
In other terms, 96.4% of ETAs might be junk, and 82.7% of ETAs are junk. Our only recourse is to get Sound Transit to fix their GTFS-RT feed to not include ETAs when they aren't actually realtime. And this fix is a two parter:
Generate and email Sound Transit a technical explanation of why/how to use NO_DATA when they don't have enough information to predict a realtime ETA
Sound Transit's GTFS-RT feed copies static schedule times into stop_time_update's arrival and departure time fields for trips where the vehicle hasn't started the route yet. This makes
those
stop_time_update blocks indistinguishable from actual real-time predictions. The
spec provides NO_DATA for exactly this situation β it means "no real-time timing
available, use the static schedule."
Current behavior
Many TripUpdates in the feed look like this:
trip_update {
trip {
trip_id: "605627320"
schedule_relationship: SCHEDULED
}
stop_time_update {
arrival { time: 1783035360 } # β copied from static GTFS
departure { time: 1783035360 } # β copied from static GTFS
stop_id: "LS23561"
}
stop_time_update {
arrival { time: 1783035585 }
departure { time: 1783035585 }
stop_id: "38567"
}
# ... 14 more stops with copied times ...
vehicle { id: "8096168" }
timestamp: 1783035094
}
There's no route_id, no delay field, and no explicit schedule_relationship on the stop_time_update entries (they default to SCHEDULED). The
vehicle hasn't started this trip yet β in many cases it's still finishing a previous trip in
the block, or it's not broadcasting position at all.
The time values in arrival and departure are exact copies
of the static GTFS schedule. They are never anything else. A trip that's 40+ minutes from departure
shows the same schedule times as a trip that's actively being tracked and running on time. There
is no way to tell them apart by looking at the stop_time_update alone.
This isn't just optimistic extrapolation β it's wrong. The GTFS-RT spec says a populated time field in a StopTimeEvent means "prediction available." A schedule copy is not a
prediction. Any consumer β Google Maps, Transit App, OneBusAway β will interpret these schedule
echoes as real-time ETAs and display them with real-time indicators.
What the spec provides: NO_DATA
The StopTimeUpdate.schedule_relationship enum includes:
NO_DATAβ "No real-time data is given for this stop. It indicates that there is no realtime timing information available. When setNO_DATAis propagated through subsequent stops so this is the recommended way of specifying from which stop you do not have realtime timing information."
And StopTimeEvent says:
time β "Forbidden if StopTimeUpdate.schedule_relationship is NO_DATA."
The correct representation for a trip that hasn't started yet is:
trip_update {
trip {
trip_id: "605627320"
schedule_relationship: SCHEDULED
}
vehicle { id: "8096168" }
timestamp: 1783035094
stop_time_update {
schedule_relationship: NO_DATA
stop_id: "LS23561"
}
} This says:
- Trip exists, vehicle is assigned β the TripUpdate and
vehiclefield remain - No real-time timing available β
NO_DATAon the stops - Consumer should use the static GTFS for schedule times β which they already have and merge by design
Only one NO_DATA stop is needed per trip β the spec says it propagates to all subsequent
stops automatically.
Current feed statistics
| Category | Count | Description |
|---|---|---|
| Active TripUpdates (has route_id, has delay field) | 162 | Vehicle is being tracked β these are correct |
| Schedule stubs (no route_id, no delay, schedule times copied) | 71 | These are the problem β vehicle hasn't started the trip, times are fake predictions |
| Of which: vehicle is actively tracked on a different trip (same block) | 20 | Vehicle is currently elsewhere; will serve this trip next. Still no ETA for this trip yet. |
| Of which: vehicle is not currently tracked anywhere | 51 | Vehicle ID is known but not broadcasting position. Assignment may be speculative. |
My request
For every stop_time_update on a trip where the vehicle hasn't actually started that
trip yet, replace the current pattern (schedule times in arrival/departure, implicit SCHEDULED) with:
stop_time_update {
schedule_relationship: NO_DATA
stop_id: "..."
} -
Keep the TripUpdate as a whole + its
vehiclefield β consumers should still know which bus is assigned - Keep the TripUpdate's
trip_idβ the trip exists -
Drop the schedule times from
arrival/departureβ they aren't predictions -
When the vehicle actually starts the trip, populate the real
arrival/departurewithSCHEDULED+time, setroute_idon thetrip, and include adelayfield (as the feed already does for active trips)
Static GTFS already provides the schedule. GTFS-RT should only provide data that improves on it. Schedule copies do not improve on it β they actively mislead consumers into thinking real-time predictions exist when they don't.
Observed on the live trip-updates feed at api.pugetsound.onebusaway.org/api/gtfs_realtime/trip-updates-for-agency/40.pb.
Passthrough NO_DATA schedule_relationship for stop_time_updates PR #467
"Passthrough schedule_relationship value directly instead of hardcoding SCHEDULED"
The only thing left to do is wait on SoundTransit and OneBusAway.