CPT to BRU: A (Very) Cursory Google Flights Analysis

RStudio Google Flights Aviation

Google Flights is a useful platform with which to find the cheapest flights for a given trip. However, a user like me could find it difficult to peruse prices for return flights across multiple dimensions simultaneously. I capture information from the platform’s “Date Grid” and perform some very simple eyeballing.

Wihan Marais
2023-09-25

This document was last updated on 2023-09-25.

Introduction

Google Flights is a useful platform with which to find the cheapest flights for a given trip. The site offers many tools and levers for the user to fine-tune their search. However, a user like me could find it difficult to peruse prices for return flights across multiple dimensions simultaneously, specifically a) departure or return dates, and b) trip duration. Prices are surprisingly sensitive to slight changes in either dimension. To address this problem in my search for cheap tickets between Cape Town International (CPT) and Brussels Airport (BRU), I exploit– and capture information from the platform’s “Date Grid”. This data is compiled and cleaned, and investigated using a few simple techniques. This has aided my search for return trips I should be tracking.

Data Collection

  1. On the Google Flights landing page (Figure 1), I key in my “Where from?” and “Where to?” destinations, and provide departure and return dates at the extremes of the set of feasible dates for my hypothetical trip, namely 13 October and 16 November.
Google Flights landing page

Figure 1: Google Flights landing page

  1. A search will present options based on this criteria. I have no intention of undertaking a 34-day trip. I am, however, interested in some configuration between these dates. To learn more, I select the Date grid tool in Figure 2.
Search results

Figure 2: Search results

  1. This produces a price matrix by departure and return dates which displays the price of the cheapest flight ticket for each date combination (Figure 3). This is extremely useful to determine your personalised best date combination, but only for a limited range of trip duration. In other words, only 49 combinations in my range are being displayed, with relative trip duration varying only slightly.
Date grid

Figure 3: Date grid

  1. Thus, I elicit as many date combinations or grids between 13 October and 16 November by using the Departure and Return toggles on the Date grid. From my initial starting point, I toggle either Return or Departure in increments of 7 days. For example, for a given week of Return dates, I will shift the grid left or right (i.e. Departure dates) by a week, whilst staying within the bounds of 13 October and 16 November. I do the same for Return dates, but then keep Departure dates fixed. In turn, I take a screenshot of each resulting grid. Naturally, there existed plenty of overlap between the contents of the screenshots.

  2. Screenshots were pasted to a single .docx file, which was subsequently saved as a .pdf. Using this free Image To CSV Converter, I was able to tabulate the data from the .pdf containing all of the prices for the date combinations contained in the screenshots. The resulting .csv file needed some cleaning, which was surprisingly time consuming. Most importantly, I removed duplicate entries and illogical date combinations (“no flights”).

Caveat

By now it should be obvious to the reader that this approach is far from optimal. An API approach or, at the very least, a Date grid with more rows and columns would have made my life quite a bit easier. I am not suggesting that other approaches are not possible, but instead that this was my approach, which emerged haphazardly and as a result of some path dependence related to using Google Flights as my starting point. It seems that the Google Flights API was shut down some years ago and I did not attempt to find any suitable alternatives. I also tried web-scraping the data, but I gave up pretty quickly when I could not easily determine the appropriate HTML attributes and nodes to use when the Date grid was determined dynamically. Please, feel free to recommend more sensible paths.

Data

The aforementioned screenshots were captured on 21 September 2023, thereby providing only a static representation of prices which are inherently dynamic. Duplicate and “no flight” entries were also removed. The resulting data is presented below.

There are 595 potential combinations of dates in the range of \([2023/10/13;\ 2023/11/16]\). That is, all unique combinations of Departure and Return dates where Departure does not exceed Return. Furthermore, trips should be no shorter than 7 days and no longer than 21 days. Filtering the data in this fashion leaves 315 potential combinations.

Of those remaining combinations, 28 price observations are omitted. In other words, these are the observations which I failed to capture with screenshots. Seemingly, as illustrated in Figure 4, I failed to capture the grid of Departures of \([2023/10/13;\ 2023/10/19]\) with Returns of \([2023/10/20;\ 2023/10/26]\). In fact, I neglected these observations because I did not deem it feasible to depart during the week of 13 to 19 October, particularly given the improbability of obtaining a Schengen Visa on such short notice. Ultimately, the dataset encompasses prices for 287 different trips.

Figure 4: Missing prices in range of potential trip dates

Analysis

Table 1 presents the cheapest possible return trips by departure date and trip length given my stylised criteria. At the time of sampling, the lowest price of any trip was R10356. 18 different trips of varying departure dates and duration were priced at this level.

Table 1: Cheapest Return Trips at R10356
DepartReturnTripDepartReturnTrip
2023-10-152023-11-01172023-10-222023-11-1019
2023-10-152023-11-03192023-10-222023-11-1120
2023-10-172023-11-01152023-10-242023-11-018
2023-10-172023-11-03172023-10-242023-11-0310
2023-10-172023-11-06202023-10-242023-11-0613
2023-10-222023-11-01102023-10-242023-11-0815
2023-10-222023-11-03122023-10-242023-11-1017
2023-10-222023-11-06152023-10-242023-11-1118
2023-10-222023-11-08172023-10-242023-11-1320
21 September 2023. Price in ZAR.

Figure 5 presents the distribution of prices by trip duration and a corresponding linear regression of price on trip duration. There is quite clearly a statistically significant \((\text{t-statistic} = -4.326)\) inverse relationship between price and trip duration \((\beta = -91.81),\) although the variation in duration alone explains only a relatively small portion of the variation in price \((\text{R-squared} = 0.058).\)

Figure 5: Price distribution of return trips by trip duration

Figure 6 presents the price distribution of return trips by week of departure. At first glance, prices are decreasing in the amount of time between the date of data collection (21 September) and the departure data, although it is doubtful that this is a statistically significant inverse relationship (see Figure 7). This corresponds with the notion that trips booked on shorter notice are likely to be more expensive. The week beginning 30 October appears to be an exception. Presumably, surge pricing may apply during the first week of November. In all cases, the cheapest trips in the dataset are to be found at the extreme tails of price distributions of each week. In other words, the cheapest tickets—the ones I am most interested in–are exceptional cases.

Figure 6: Price distribution of trips by ISO week

Figure 7: The relationship between price and departure date

Remarks

This post has enabled me to take a closer look at some of the useful tools and features Google Flights has to offer. In particular, the platform is able to present the user with extraordinarily cheap tickets for a given return trip in a given time span. However, when users’ convictions about time frames and trip duration are loosely held, they may find it difficult to search for and compare promising options varying across both dimensions. I have tried–although quite clunkily–to overcome this issue by capturing trip data across multiple date grids. I have uncovered the cheapest trips given my idiosyncratic preferences and performed some cursory analyses. I show that prices are quite sensitive to slight changes in these parameters and that longer trips command lower prices. Moreover, departure dates determine prices in as far as a shorter lead time is likely to entail higher prices, and departure dates are subject to higher demand and (presumably) surge pricing. The sample, of course, is limited in size and only representative of a single route for a specific period spanning slightly more than a month. Next steps: APIs, time-varying prices and improved coverage.

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-SA 4.0. Source code is available at https://github.com/WihanZA/wihan_distill, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Marais (2023, Sept. 25). Wihan Marais: CPT to BRU: A (Very) Cursory Google Flights Analysis. Retrieved from https://www.wihanza.com/posts/2023-09-25-cpt-to-bru-a-very-cursory-google-flights-analysis/

BibTeX citation

@misc{marais2023cpt,
  author = {Marais, Wihan},
  title = {Wihan Marais: CPT to BRU: A (Very) Cursory Google Flights Analysis},
  url = {https://www.wihanza.com/posts/2023-09-25-cpt-to-bru-a-very-cursory-google-flights-analysis/},
  year = {2023}
}