Processed Datasets used by Severin Borenstein in Airline Research

Please note that Domestic DB1B data are now available for FREE!! from the Department of Transportation website. Go to their aviation statistics website HERE and search the page for "DB1B".

All of the datasets listed here are translations/refinements of the U.S. Department of Transportation's  O&D Data Bank 1A Ticket Dollar Value Database (DB1A and DB1B).  The raw DB1A/DB1B data are available directly from the DOT (if you are a U.S. citizen and after you receive authorization from DOT) for a fee.  For more information on this, go to the DOT's airline data webpage. Also, for more information on the DB1A/DB1B, go to the DOT's website.  The DB1A/DB1B is a quarterly dataset.

NOTE: The Data Bank 1A was replaced in 2003 by Data Bank 1B.  These datasets are identical except for (1) DB1B indentifies both the operating and the ticketing carrier while DB1A assumed they were the same and (2) DB1B has changed the codes for seat classes.  I have used DB1A data through 2002q4 and DB1B data starting with 2003q1.     Below I refer to them collectively as DB1A.

I receive many requests for airline data.  I make data available only for academic research.  This is not a for-profit operation.  It is a service to the academic community.  If you are expecting hand-holding or customer-oriented service, you've come to the wrong place. Datasets are available for a small fee to cover hassle costs of dealing with requests and purchase of additional data from DOT (and to discourage requests for gigantic data dumps).

All datasets exists for 1979 to 2012, though there are data reliability issues for the first few years, and particularly for 1980q4 when EA and DL significantly under-reported.

 

ASCII Reformatting of DB1A: This is simply a reformatting of the DB1A from the arcane COBOL-based format in which DOT sold it for many years to an ASCII format.  It drops a small amount of information by omitting some coupon data for tickets of more than 8 coupons (about 0.3% of all records).  This dataset includes international data, so is subject to DOT restrictions.  You must have DOT authorization before obtaining this dataset.  See the DOT website for details. 

Detailed Description:

1-4    Fare

5-6    Reporting Carrier (alpha code)

7-8    Coupons

9-13   Number of Passengers

14-16  Point of Ticket Origin (alpha code)

17-19  World Code of Ticket Origin

 

the following fields repeat for each coupon up to 8 coupons

20-21  Transporting Carrier on the Coupon (alpha code)

22-23  Class of Service on the Coupon (alpha code)

24-26  Destination Airport on the Coupon (alpha code)

27     "X","Y", or "Z" if desginated ticket break point, blank otherwise

28-30  World Code of Coupon Destination Airport

 

Tickets with more than 8 coupons are included in the dataset,

but information on only the first 8 coupons is reported.

 

No records are dropped in the processing of the DOT tape.

 

The file DB1ARPT.??? (where ??? is the period YYQ) gives the breakdown of

records and passengers by dom/intl and number of coupons.

 

Translation of Domestic DB1A into More Usable Form:  This is a translation that drops the more unusual tickets (eg, any ticket with more than 4 coupons, one-way ticket with more than 2 coupons) and all international tickets.  Includes numerical indicators for all domestic airports.

SPECIFICATIONS

Format = ASCII

Record Length = 52

Total Records = varies by quarter, see db1aboe8.YYQ (YYQ=year,quarter)

For information from reading original DB1A tape see db1arpt.YYQ

 

RECORD LAYOUT

1     = Point of Purchase (B=Base Airport, R=Reference Airport)

2-4   = 3-letter code for base airport

5-7   = 3-letter code for reference airport

8-10  = 3-letter code for change-of-plane airport

11-12 = 2-letter code for first-segment carrier

13-14 = 2-letter code for second-segment carrier

15-16 = 2-letter fare code for first segment

17-18 = 2-letter fare code for second segment

19-22 = Fare

23-26 = First-segment distance

27-30 = Second-segment distance

31-34 = Base-to-reference nonstop distance

35-40 = Number of passengers

41-42 = Reporting carrier

43-45 = 3-digit numerical code for base airport  (see airport.lst)

46-48 = 3-digit numerical code for reference airport   (see airport.lst)

49-51 = 3-digit numerical code for change-of-plane airport   (see airport.lst)

52    = Ticket type code

 

 

SCREENING

 

The following records are dropped from the original DB1A:

 

1. Any record that includes an airport outside the 50-state U.S.

 

2. Any record with more than 4 coupons

 

3. Any one-way ticket with more than 2 coupons and any 3 or 4

coupon ticket with more than two trip-break points

 

 

OTHER NOTES:

 

1. Round-trip and open-jaw tickets are broken into two records,

one for each directional trip. For round-trip (closed-jaw)

tickets fare in DB1A is divided in half for each of the

directional trips.  For open-jaw tickets, fare is divided by

proportion of ticket miles is each of the directional trips.

 

2. Some records contain airport codes that are not included in the DOT's

Database 5 or for which no location information is present.  For these

records,  no distances are calculated.  The records are still included

with the 3-letter airport codes, but distances are set to 0.

Round-trip open-jaw fares are divided in half, rather than weighted by

share of distance, since distance is not known.

 

3. No screening of data based on fare reasonableness has been

done.

 

 

TRIP TYPE CODES

 

O = One-way ticket.

 

R = Part of a round-trip ticket.

 

U = Part of an "unbalanced" ticket (round trip ticket with

    2-coupons in one direction and 1-coupon in other direction).

 

I = Interline ticket.  Change of carrier within at least one of the

    directional trips on the ticket.  Used only on round-trip tickets,

    because interline on one-way tickets is evident from carrier listings.

    Note: Not used if outbound trip entirely on one carrier and return

    entirely on a different carrier.  Such tickets are not distinguishable

    from one-carrier round-trip tickets in this datasets.

    [Supercedes U or R].

 

J = Open Jaw ticket.  Trip destination on second directional trip of the

    ticket is not equal to trip origination on first directional trip of

    the ticket. [Supercedes U, R or I].

 

 

 

Aggregation of Domestic DB1A into One Record per Carrier-Route – “Market Share Dataset” [DISCONTINUED – SEE BROADENED MARKET DATASET BELOW]: This is a relatively compact summary of the domestic DB1A.  It compresses all O&D information for a carrier on a route into one record, giving average direct and change-of-plane fares and market shares for the given carrier on the route. Includes numerical indicators for all domestic airports.

 

I created the "market share" files to have a relatively compact summary of

the domestic airline ticket data from the DOT's Databank 1A (DB1A), a 10%

sample of all tickets collected by US carriers. 

 

  -- Tickets with an international segment are excluded. 

  -- First-class tickets are excluded. 

  -- Tickets must be one-way or round-trip; open-jaw, circle trips,

     etc are excluded.

  -- A ticket must have no more than 2 coupons for a one-way trip, no more

     than 4 coupons (and no more than 2 coupons each way) for a rountd-trip

     ticket.

  -- Tickets with fare less than $10 excluded.

  -- Extremely high fare tickets (probably keypunch errors) excluded

     (4.0 times USDOT's Standard Industry Fare Level).

  -- Tickets with interline one-way trips excluded.

  -- Only 30 largest domestic airlines in each quarter reported, but

     passenger shares are share of all passengers reported in DB1A.

  -- Only routes with at least 90 passengers reported in DB1A included.

 

To use the file, you must also have two other files, airport.lst and

carrier.lst.  airport.lst lists all the US airports by three-letter code in

order of the numbers I have assigned to them (as well as their longitude,

latitude, and DOT numeric airport code).  carrier.lst gives a partial list

of two-letter alpha codes for airlines and their translations at various

times (these codes have been reused when carriers fold so they may refer

to other airlines at other times).

 

All files are in flat ascii. Each mktshare file, which covers the data for

one quater, is about 1Mb in size and compresses using PkZip to about 400k. 

 

The layout of the mktshare file is

 

1-3   numeric code of first airport (lower number airport) 

5-7   numeric code of second airport.

9-10  alpha carrier code

12-15 distance of route

17-23 total passengers on route included in fare calculations

      -- first class excluded for all but Southwest

         (which reports all as FC but has no FC section on its planes)

      -- more then one change of plane excluded

      -- other weird tickets excluded

25-29 share of total that flew this carrier direct

31-35 share of total that flew this carrier change-of-plane

37-41 avg (one-way equivalent) fare for direct passengers

42-46 avg (one-way equivalent) fare for change-of-plane passengers

 

 

Aggregation of Domestic DB1A into Market-Carrier Dataset – “Broadened Market Dataset”:   The "Broadened Market Data" file replace the "Market Share" data files that I previously used.  Unlike the Market Share files, the Market Data files include interline one-way tickets.  These were rare in the 1990s when I created the Market Share files, but have become quite common with regional airlines code-sharing with majors.  Also, the Market Data include all carriers, not just the 30 largest, and include all routes.  As a result, these are larger than the Market Share datasets.  The structure is somewhat different as well.  They datasets are created from the domestic airline ticket data from the DOT's Databank 1A/1B (DB1A), a 10% sample of all tickets collected by US carriers. 

 

Criteria for inclusion of tickets:

  -- Tickets with an international segment are excluded. 

  -- First-class tickets are excluded for carriers that report less than 90% of tickets as first class (retains FC for carriers that report all tickets as FC in that quarter). 

  -- Tickets must be one-way or round-trip; open-jaw, circle trips, etc are excluded.

  -- A ticket must have no more than 2 coupons for a one-way trip, no more than 4 coupons (and no more than 2 coupons each way) for a rountd-trip ticket.

  -- Tickets with fare less than $20 or fares above $9998 excluded.

  -- Tickets with fares more than 5 times USDOT's Standard Industry Fare Level for observed trip distance during observed quarter are excluded.

  -- Records are for one-way trips, so round-trip tickets are split into two one-way observations

 

The Stata dataset has one record per route/carrier-set/dir-cop where

-- route is a pair of airports without regard to direction

-- carrier-set is one carrier and a blank if the trip is one-coupon.  If the trip is two-coupon, carrier-set is the pair of airlines with codes listed in alphabetical order.  Information about the order of flights is not retained.

-- dir-cop is a distinction between one-coupon (direct) and two-coupon (change-of-plane) tickets.  On two-coupon tickets, the location of the change-of-plane is not retained, though the average total routing distance for all c-o-p tickets collapsed into a single record is reported.

 

The dataset includes the following variables

yr -- year

qtr -- quarter

ap1 -- 3-letter alpha code of the first airport (by alphabetical ordering)

ap2 -- 3-letter alpha code of the secondt airpor (by alphabetical ordering) - blank if one-coupon ticket

cr1 -- 2-letter alpha code of the first carrier (by alphabetical ordering)

cr2 -- 2-letter alpha code of the second carrier (by alphabetical ordering) - blank if one-coupon ticket

pax -- number of passengers reported in record

nsdst -- non-stop distance from airport ap1 to airport ap2

avdst -- average total routing distance of passengers in this record - equal to nsdst if one-coupon ticket

avprc -- average one-way equivalent price paid by passengers reported in this record

cop -- 0 if one-coupon ticket, 1 if two-coupon ticket