Processed
Datasets used by Severin Borenstein in Airline Research
Please note that Domestic DB1B data are now available for FREE!! from the Department of Transportation website. Go to their aviation statistics website HERE and search the page for "DB1B".
All of the datasets listed here are translations/refinements of the U.S. Department of Transportation's O&D Data Bank 1A Ticket Dollar Value Database (DB1A and DB1B). The raw DB1A/DB1B data are available directly from the DOT (if you are a U.S. citizen and after you receive authorization from DOT) for a fee. For more information on this, go to the DOT's airline data webpage. Also, for more information on the DB1A/DB1B, go to the DOT's website. The DB1A/DB1B is a quarterly dataset.
NOTE: The Data Bank 1A was replaced in 2003 by Data Bank 1B. These datasets are identical except for (1) DB1B indentifies both the operating and the ticketing carrier while DB1A assumed they were the same and (2) DB1B has changed the codes for seat classes. I have used DB1A data through 2002q4 and DB1B data starting with 2003q1. Below I refer to them collectively as DB1A.
I receive many requests for airline data. I make data available only for academic research. This is not a for-profit operation. It is a service to the academic community. If you are expecting hand-holding or customer-oriented service, you've come to the wrong place. Datasets are available for a small fee to cover hassle costs of dealing with requests and purchase of additional data from DOT (and to discourage requests for gigantic data dumps).
All datasets exists for 1979 to 2012, though there are data
reliability issues for the first few years, and particularly for 1980q4 when EA
and DL significantly under-reported.
ASCII Reformatting of DB1A: This is simply a reformatting of the DB1A from the arcane COBOL-based format in which DOT sold it for many years to an ASCII format. It drops a small amount of information by omitting some coupon data for tickets of more than 8 coupons (about 0.3% of all records). This dataset includes international data, so is subject to DOT restrictions. You must have DOT authorization before obtaining this dataset. See the DOT website for details.
Detailed Description:
1-4 Fare
5-6 Reporting Carrier (alpha code)
7-8 Coupons
9-13 Number of
Passengers
14-16 Point of Ticket Origin (alpha code)
17-19 World Code of Ticket Origin
the
following fields repeat for each coupon up to 8 coupons
20-21 Transporting Carrier on the Coupon
(alpha code)
22-23 Class of Service on the Coupon (alpha
code)
24-26 Destination Airport on the Coupon
(alpha code)
27 "X","Y", or
"Z" if desginated ticket break point, blank
otherwise
28-30 World Code of Coupon Destination
Airport
Tickets with more than 8 coupons
are included in the dataset,
but
information on only the first 8 coupons is reported.
No records are dropped in the
processing of the DOT tape.
The file DB1ARPT.???
(where ??? is the period YYQ) gives the breakdown of
records and passengers by dom/intl and number of coupons.
Translation of Domestic DB1A into More Usable Form: This is a translation that drops the more unusual tickets (eg, any ticket with more than 4 coupons, one-way ticket with more than 2 coupons) and all international tickets. Includes numerical indicators for all domestic airports.
SPECIFICATIONS
Format = ASCII
Record Length = 52
Total Records = varies by quarter,
see db1aboe8.YYQ (YYQ=year,quarter)
For information from reading
original DB1A tape see db1arpt.YYQ
RECORD LAYOUT
1 = Point of Purchase (B=Base Airport,
R=Reference Airport)
2-4 = 3-letter code for base airport
5-7 = 3-letter code for reference airport
8-10 = 3-letter code for change-of-plane
airport
11-12 = 2-letter code for
first-segment carrier
13-14 = 2-letter code for
second-segment carrier
15-16 = 2-letter fare code for
first segment
17-18 = 2-letter fare code for
second segment
19-22 = Fare
23-26 = First-segment distance
27-30 = Second-segment distance
31-34 = Base-to-reference nonstop
distance
35-40 = Number of passengers
41-42 = Reporting carrier
43-45 = 3-digit numerical code for
base airport (see
airport.lst)
46-48 = 3-digit numerical code for
reference airport (see airport.lst)
49-51 = 3-digit numerical code for
change-of-plane airport (see airport.lst)
52 = Ticket type code
SCREENING
The following records are dropped
from the original DB1A:
1. Any record that includes an
airport outside the 50-state U.S.
2. Any record with more than 4
coupons
3. Any one-way ticket with more
than 2 coupons and any 3 or 4
coupon
ticket with more than two trip-break points
OTHER NOTES:
1. Round-trip and open-jaw tickets
are broken into two records,
one for
each directional trip. For round-trip (closed-jaw)
tickets
fare in DB1A is divided in half for each of the
directional
trips. For open-jaw tickets, fare is
divided by
proportion
of ticket miles is each of the directional trips.
2. Some records contain airport
codes that are not included in the DOT's
Database 5 or for which no location
information is present. For these
records, no distances are calculated. The records are still included
with the
3-letter airport codes, but distances are set to 0.
Round-trip open-jaw fares are
divided in half, rather than weighted by
share of
distance, since distance is not known.
3. No screening of data based on
fare reasonableness has been
done.
TRIP TYPE CODES
O = One-way ticket.
R = Part of a round-trip ticket.
U = Part of an
"unbalanced" ticket (round trip ticket with
2-coupons in one direction and 1-coupon in other
direction).
I = Interline ticket. Change of carrier within at least one of the
directional trips on the ticket. Used only on round-trip tickets,
because interline on one-way tickets is evident
from carrier listings.
Note: Not used if outbound trip entirely on one carrier and return
entirely on a different carrier. Such tickets are not distinguishable
from one-carrier round-trip tickets in this
datasets.
[Supercedes U or R].
J = Open Jaw ticket. Trip destination on second directional trip
of the
ticket is not equal to trip origination on
first directional trip of
the ticket. [Supercedes U, R or I].
Aggregation of Domestic DB1A into One Record per Carrier-Route – “Market Share Dataset” [DISCONTINUED – SEE BROADENED MARKET DATASET BELOW]: This is a relatively compact summary of the domestic DB1A. It compresses all O&D information for a carrier on a route into one record, giving average direct and change-of-plane fares and market shares for the given carrier on the route. Includes numerical indicators for all domestic airports.
I created the "market
share" files to have a relatively compact summary of
the
domestic airline ticket data from the DOT's Databank 1A (DB1A), a 10%
sample of
all tickets collected by US carriers.
-- Tickets with an international segment are excluded.
-- First-class tickets are excluded.
-- Tickets must be one-way or round-trip; open-jaw, circle trips,
etc are excluded.
-- A ticket must have no more than 2 coupons for a one-way trip, no more
than 4 coupons (and no more than 2 coupons each
way) for a rountd-trip
ticket.
-- Tickets with fare less than $10 excluded.
-- Extremely high fare tickets (probably keypunch errors) excluded
(4.0 times USDOT's Standard Industry Fare Level).
-- Tickets with interline one-way trips excluded.
-- Only 30 largest domestic airlines in each quarter reported, but
passenger shares are share of all passengers
reported in DB1A.
-- Only routes with at least 90 passengers reported in DB1A included.
To use the file, you must also have
two other files, airport.lst and
carrier.lst. airport.lst
lists all the US airports by three-letter code in
order of
the numbers I have assigned to them (as well as their longitude,
latitude,
and DOT numeric airport code). carrier.lst gives a partial list
of
two-letter alpha codes for airlines and their translations at various
times
(these codes have been reused when carriers fold so they may refer
to other
airlines at other times).
All files are in flat ascii. Each mktshare
file, which covers the data for
one quater, is about 1Mb in size and compresses using PkZip to about 400k.
The layout of the mktshare file is
1-3 numeric code of first airport (lower number
airport)
5-7 numeric code of second airport.
9-10 alpha carrier code
12-15 distance of route
17-23 total passengers on route
included in fare calculations
-- first class
excluded for all but Southwest
(which reports
all as FC but has no FC section on its planes)
-- more then one change of plane excluded
-- other weird
tickets excluded
25-29 share of total that flew this
carrier direct
31-35 share of total that flew this
carrier change-of-plane
37-41 avg
(one-way equivalent) fare for direct passengers
42-46 avg
(one-way equivalent) fare for change-of-plane passengers
Aggregation of Domestic DB1A into Market-Carrier Dataset – “Broadened Market Dataset”: The "Broadened Market Data" file replace the "Market Share" data files that I previously used. Unlike the Market Share files, the Market Data files include interline one-way tickets. These were rare in the 1990s when I created the Market Share files, but have become quite common with regional airlines code-sharing with majors. Also, the Market Data include all carriers, not just the 30 largest, and include all routes. As a result, these are larger than the Market Share datasets. The structure is somewhat different as well. They datasets are created from the domestic airline ticket data from the DOT's Databank 1A/1B (DB1A), a 10% sample of all tickets collected by US carriers.
Criteria for inclusion of tickets:
-- Tickets with an
international segment are excluded.
-- First-class
tickets are excluded for carriers that report less than 90% of tickets as first
class (retains FC for carriers that report all tickets as FC in that quarter).
-- Tickets must be
one-way or round-trip; open-jaw, circle trips, etc
are excluded.
-- A ticket must
have no more than 2 coupons for a one-way trip, no more than 4 coupons (and no
more than 2 coupons each way) for a rountd-trip
ticket.
-- Tickets with fare
less than $20 or fares above $9998 excluded.
-- Tickets with
fares more than 5 times USDOT's Standard Industry Fare Level for observed trip
distance during observed quarter are excluded.
-- Records are for
one-way trips, so round-trip tickets are split into two one-way observations
The Stata dataset has one record
per route/carrier-set/dir-cop where
-- route is a pair of airports
without regard to direction
-- carrier-set is one carrier and a
blank if the trip is one-coupon. If the
trip is two-coupon, carrier-set is the pair of airlines with codes listed in
alphabetical order. Information about
the order of flights is not retained.
-- dir-cop is a distinction between one-coupon (direct) and
two-coupon (change-of-plane) tickets. On
two-coupon tickets, the location of the change-of-plane is not retained, though
the average total routing distance for all c-o-p tickets collapsed into a
single record is reported.
The dataset includes the following variables
yr --
year
qtr --
quarter
ap1 -- 3-letter alpha code of the first airport (by
alphabetical ordering)
ap2 -- 3-letter alpha code of the secondt
airpor (by alphabetical ordering) - blank if
one-coupon ticket
cr1 -- 2-letter alpha code of the first carrier (by
alphabetical ordering)
cr2 -- 2-letter alpha code of the second carrier (by
alphabetical ordering) - blank if one-coupon ticket
pax --
number of passengers reported in record
nsdst --
non-stop distance from airport ap1 to airport ap2
avdst --
average total routing distance of passengers in this record - equal to nsdst if one-coupon ticket
avprc --
average one-way equivalent price paid by passengers reported in this record
cop -- 0 if one-coupon ticket, 1 if
two-coupon ticket