Scraping International Investment Agreement Data

International Economics Foreign Direct Investment Web Scraping

Comprehensive and up-to-date data on international investment agreements (IIAs) concluded between (groups of) countries are hard to come by, making it difficult to study their effects on bilateral investment flows. In this post, I propose and describe a simple approach to obtain and manipulate data from UNCTAD’s IIA Mapping Project using R.

Wihan Marais
2024-05-06

Background

Bilateral investment treaties (BITs) and related bilateral tax treaties (BTTs) are intended to promote, among others, capital flows between countries. For example, such treaties facilitate cross-border investment by multinationals by eliminating double-taxation. On the other hand, these treaties deter FDI in some cases by limiting investors’ ability to circumvent taxes by investing elsewhere. Understandably, the potentially ambiguous effects of these treaties on capital flows, such as aggregate foreign direct investment (FDI), and mergers and acquisitions (M&A), have received much attention in the empirical international economics literature.

Blonigen and Davies (2004)—one of the earliest studies on this front—could not find persuasive evidence for the facilitating role of BTTs with respect to US inbound and outbound FDI activity over the period 1980–1999. Although not their primary focus, Di Giovanni (2005) shows that capital tax treaties promote cross-border M&A deals for the period 1990 – 1999, beyond that of the advantages gleaned from countries’ capital tax rates, perhaps as a result of improved transparency in doing business across borders.1 Blonigen and Piger (2014) use Bayesian statistical techniques to highlight negotiated bilateral agreements, like BITs and service agreements, as important determinants of M&A activity worldwide. Specifically, it is suggested that these treaties are consequential drivers of FDI into poorer countries.

However, keeping track of the status BITs and other treaties with investment provisions (TIPs) remains rather difficult. Ever-evolving lists of signatories and economic union members, as well as differences between ratification, enforcement and termination dates make it particularly difficult. In an effort to construct my own BIT measure to include in the model specification of my research paper, “Gravity and Cross-Border Influence: The Importance of Distance in Mobile Telecommunications”, I turned to web scraping. In what follows, I describe the corresponding data source and methodology.

Data

I apply a simple and undemanding web scraping method to the International Investment Agreements (IIA) Navigator hosted by the United Nations Conference on Trade and Development (UNCTAD).2 UNCTAD provides this and other tools to aid the monitoring, analysis and improvement of international investment policymaking. The IIA Navigator comprises a database of IIAs, a product of the collaborative IIA Mapping Project,3 which is regularly updated as the mapping of IIA content continues. For more information, see the relevant project description and methodology.4

IIAs are categorised as either bilateral investment treaties (BITs) or treaties with investment provisions (TIPs). The Navigator conceives of this distinction as follows:

“A bilateral investment treaty is an agreement between two countries regarding promotion and protection of investments made by investors from respective countries in each other’s territory. The great majority of IIAs are BITs.

The category of treaties with investment provisions brings together various types of investment treaties that are not BITs. Three main types of TIPs can be distinguished:

  1. broad economic treaties that include obligations commonly found in BITs (e.g. a free trade agreement with an investment chapter);

  2. treaties with limited investment-related provisions (e.g. only those concerning establishment of investments or free transfer of investment-related funds); and

  3. treaties that only contain “framework” clauses such as the ones on cooperation in the area of investment and/or for a mandate for future negotiations on investment issues.”

Although not considered in this post, a comparable resource for bilateral tax treaties can be found in the Tax Treaties Explorer provided by the International Centre for Tax and Development (ICTD).5

Methodology

In this section, I describe and provide the code employed to obtain and clean IIA data. First, I’ll load the necessary packages. In addition, I set the plan for executing futures across multiple sessions in parallel.

library(pacman)
p_load(tidyverse,
       rvest,
       countrycode,
       skimr,
       furrr)

plan(multisession, workers = 6)

Scraping and Parsing IIA Table

The focus of my web scraping is merely the table on the Mapping of IIA Content webpage. I am primarily concerned with the timing of countries’ IIA affiliations. Hence, it is important to standardise the coding of countries’ names, as well as all date variables.

Scraping the table on the Mapping of IIA Content webpage.

Using rvest, I scrape the webpage and parse the table. I save it as an RDS file for future use, instead of repeatedly scraping and parsing.

iia_url <- "https://investmentpolicy.unctad.org/international-investment-agreements/iia-mapping"
iia_webpage <- read_html(iia_url)
iia_table <- html_table(iia_webpage, header = T, trim = T)[[1]]
saveRDS(iia_table, "data/iia_table.rds") 

Data Overview and Wrangling

The table is then read back into the R environment. I skim the resulting table to get an overview of the data. At first glance, there seems to be plenty of empty cells, and clunky variable names. I create parsimonious variable names and replace empty cells with missing values.6

iia_table <- read_rds("data/iia_table.rds") %>% 
  rename(Index = No.,
         Full = `Full title`,
         Short = `Short title`,
         Signature = `Date of signature`,
         Entry = `Date of entry into force`,
         Termination = `Termination date`) %>% 
  select(-Text) %>% 
  mutate(across(where(is.character), ~ifelse(. == "", NA, .)))

skim(iia_table)
Table 1: Data summary
Name iia_table
Number of rows 2591
Number of columns 9
_______________________
Column type frequency:
character 8
numeric 1
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
Full 2503 0.03 21 207 0 88 0
Short 0 1.00 10 96 0 2591 0
Type 0 1.00 4 4 0 2 0
Status 0 1.00 8 21 0 3 0
Parties 0 1.00 10 165 0 2504 0
Signature 0 1.00 10 10 0 2104 0
Entry 317 0.88 10 10 0 1897 0
Termination 2121 0.18 7 10 0 301 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Index 0 1 1296 748.1 1 648.5 1296 1943.5 2591 ▇▇▇▇▇

The resulting data frame comprises 2591 rows or distinct IIAs. The table below presents a sample of five of these rows.

iia_table %>% 
  select(Index, Parties, Signature, Entry, Termination) %>% 
  head(5)
Index Parties Signature Entry Termination
1 Afghanistan, Germany 20/04/2005 12/10/2007 NA
2 Afghanistan, Türkiye 10/07/2004 19/07/2005 NA
3 Albania, Austria 18/03/1993 01/08/1995 NA
4 Albania, Azerbaijan 09/02/2012 13/07/2012 NA
5 Albania, BLEU (Belgium-Luxembourg Economic Union) 01/02/1999 18/10/2002 NA

The tables above highlight some important issues with the data, which I will adress in turn:

  1. Some variables are prone to missing many observations.

    • Most importantly, this is suggestive of the fact that most IIAs have not yet been terminated, nor have scheduled terminations. In addition, missing Entry observations are indicative of the fact that some IIAs are not enforced.
  2. Date variables are given as character variables.

  3. Individual parties are delimited by commas.

  4. Individual parties are sometimes given by their non-English names.

  5. Parties include economic blocs and unions, which are referred to as “Country Groupings” by the Navigator.

Dates

I convert date columns into the appropriate format with the lubridate package. The conditional mutate of Terminate was necessary because some observations were given in the mm/yyyy format.

iia_table <- iia_table %>% 
  mutate(Signature = dmy(Signature),
         Entry = dmy(Entry),
         Termination = gsub("-","/", Termination),
         Termination = case_when(
           nchar(Termination) > 7 ~ Termination,
           nchar(Termination) <= 7 ~ paste("01", Termination, sep = "/"),
           TRUE ~ NA),
         Termination = dmy(Termination))

Parties

Parties to a given IIA are presented as a character string delimited by commas. For each IIA, I transform this value into a list containing elements representing each party. However, some parties’ names comprise commas too, e.g., Republic of Korea is given as “Korea, Republic of”. To delimit parties, I first correct the names of parties which contain commas. The names of these parties, as well as their replacement values, were identified manually, which are represented by the list correctnames.7 Using the correctname_function, country names are standardised.

correctnames <- list(
  # list(original, replacement)
    list("Korea, Republic of","Republic of Korea"),
    list("Moldova, Republic of", "Republic of Moldova"),
    list("Iran, Islamic Republic of", "Islamic Republic of Iran"),
    list("Bolivia, Plurinational State of", "Plurinational State of Bolivia"),
    list("Korea, Dem. People's Rep. of", "Dem. People's Rep. of Korea"),
    list("Venezuela, Bolivarian Republic of", "Bolivarian Republic of Venezuela"),
    list("Congo, Democratic Republic of the", "Democratic Republic of the Congo"),
    list("Tanzania, United Republic of", "United Republic of Tanzania"),
    list("Micronesia, Federated States of", "Federated States of Micronesia")
)

correctname_function <- function(incorrectnamevector){
  temp_names <- incorrectnamevector
  for (i in 1:length(correctnames)) {
    temp_names <- gsub(
      pattern = correctnames[[i]][1],
      replacement = correctnames[[i]][2],
      x = temp_names,
      ignore.case = T
    )
  }
  return(temp_names)
}

iia_table$Parties <- iia_table$Parties %>% 
  lapply(correctname_function) %>% unlist()

Country Groupings

As it stands, the Parties column consists of character strings containing countries’ standardised names, as well as the names of country groupings, all delimited by commas. I want to replace country groupings in Parties with their corresponding sets of countries.

Scraping the table on the IIAs by Country Grouping webpage.

Hence, I obtain a list of all country groupings by scraping the table on the IIAs by Country Grouping webpage with an approach akin to that used before. Each grouping in the table contains a link to its dedicated webpage. I use a CSS selector to identify the elements in groupings_webpage which contain these links.8 I parse these links and merge them with the table listing all country groupings, i.e., groupings_table.

groupings_url <- "https://investmentpolicy.unctad.org/international-investment-agreements/by-country-grouping"
groupings_webpage <- read_html(groupings_url)
groupings_table <- html_table(groupings_webpage, header = T, trim = T)[[1]] %>% 
  select(Index = No., Name)

groupings_table$Links <- html_elements(groupings_webpage, ".min-one-line a") %>% 
  html_attrs() %>% 
  unlist() %>% 
  unname()

Groupings’ webpages, like that of the ACP, are devoted to providing additional information regarding their members, and the status of their IIA affiliations. For now, I am only interested in determining the country compositions of each grouping. I write a function to access groupings’ webpages using their corresponding links, which are relative to the website’s base URL.9 On each webpage, the function scrapes a vector of members, again with the use of the appropriate CSS selector.

Scraping country compositions from each country grouping’s dedicated webpage.

I use future_map from the furrr package to execute the function across multiple sessions to speed up operations. For each link, the function returns a single character string containing a grouping’s members delimited by commas, as in the original table of IIAs or iia_table. As before, I save the table as an RDS file for later use, and to avoid repeatedly scraping multiple pages.

grouping_members_function <- function(groupinglink){
  paste("https://investmentpolicy.unctad.org", groupinglink, sep = "") %>% 
    read_html(.) %>% 
    html_elements("#general a") %>% 
    html_text2() %>% 
    paste(collapse = ", ") %>% 
    return(.)
}

groupings_table$Members <- groupings_table$Links %>% 
  future_map(~grouping_members_function(groupinglink = .x)) %>% 
    unlist()

saveRDS(groupings_table, "data/groupings_table.rds")

I read groupings_table back into the global environment and, as before, clean non-standard country names with correctname_function. Some country groupings’ members are country groupings in and of themselves. I create a minor function to highlight the country groupings of concern.

groupings_table <- read_rds("data/groupings_table.rds")

groupings_table$Members <- groupings_table$Members %>% 
  lapply(correctname_function) %>% unlist()

# Create short name to look for matches in members
groupings_table <- groupings_table %>% 
  mutate(Short = str_extract(Name, "\\(([^)]+)\\)"),
         Short = gsub("\\(|\\)", "", Short))

groupingroup_function <- function(groupingname){
  member_vector <- groupings_table %>% 
    filter(Name == groupingname) %>% 
    pull(Members) %>% 
    str_split(", ") %>% 
    unlist() %>% 
    unique()
  member_vector <- member_vector[member_vector %in% (groupings_table$Short)]
  if(length(member_vector) > 0){
    return(data.frame(Name = groupingname, Issue = paste(member_vector, collapse = ", ")))
  } else{
    return(data.frame(Name = groupingname, Issue = NA))
  }
}

groupings_table$Name %>% 
  map_dfr(groupingroup_function) %>% 
  filter(!is.na(Issue))
Name Issue
Energy Charter Treaty members European Union
EU (European Union) European Union

Seemingly, European Union is the only country grouping to be listed as member in other groupings. For example, Energy Charter Treaty lists the European Union as one of its members. To address this I must first remove European Union as a member of the EU (European Union) country grouping.

groupings_table <- groupings_table %>% 
  mutate(Members = case_when(
    Short == "European Union" ~ gsub("European Union,", "", Members),
    TRUE ~ Members
  ))

groupings_table <- groupings_table %>% 
  mutate(Members = gsub(
    "European Union",
    (groupings_table %>% filter(Short == "European Union") %>% pull(Members)),
    x = Members))

# test if cleaning worked
groupings_table$Name %>% 
  map_dfr(groupingroup_function) %>% 
  filter(!is.na(Issue)) %>% 
  nrow(.) == 0
[1] TRUE

Now, I need to replace the country groupings, as they appear in the Parties column of iia_table, with their corresponding Members. To this end, I first transform Parties into character vectors by delimiting individual parties, and ensuring there are no white spaces. I create and a execute a function to identify country groupings in party vectors, and in turn, replace them with groupings’ members. The result is iia_table with a Parties column consisting of vectors of strictly countries.

iia_table$Parties <- iia_table$Parties %>% 
  str_split(", ") %>% 
  lapply(str_squish)

append_groupings_function <- function(partylist){
  temp_vector <- unlist(partylist)
  for (i in 1:length(temp_vector)) {
    if (temp_vector[i] %in% groupings_table$Name) {
      
      temp_members <- groupings_table %>% 
        filter(Name == temp_vector[i]) %>% 
        pull(Members) %>% 
        str_split(", ") %>% 
        unlist()
      temp_vector <- temp_vector %>% append(temp_members)
    }
  }
  temp_vector <- temp_vector[!temp_vector %in% groupings_table$Name]
  return(temp_vector)
}

iia_table$Parties <- iia_table$Parties %>% 
  lapply(append_groupings_function)

Subsequently, I use the formidable countrycode package to convert the country names in each vector of countries in Parties to their corresponding ISO 3 codes. Once again, I execute the function using futures.

iia_table$Parties <- iia_table$Parties %>%
  future_map(~countrycode(.x,
                          origin = "country.name",
                          destination = "iso3c"))

Bilateral Matrix of IIA Involvement

The product of all operations thus far is a data frame comprising IIAs already mapped by UNCTAD. The processed iita_table presents each IIA alongside information about its type, signatories, dates, and status. It is now possible to construct a time-varying, bilateral matrix of countries’ mutual involvement in IIAs.

I consider only those IIA which are/have been enforced, as opposed to those which have been signed but not yet enforced. 317 out of 2591 IIAs have been signed without ever entering into force. I exclude these IIAs from the construction of the bilateral matrix.

iia_table <- iia_table %>% 
  filter(!is.na(Entry))

The matrix consists of a dummy variable indicating the presence of an IIA between an origin and destination country in a given year. I take 1 July as the cut-off date to convert Entry and Termination into year columns, Start and End. In other words, if an IIA takes effect before 1 July of year \(t\), the particular IIA is indicated for \(t\). If the IIA takes effect on or after this cut-off, it is only indicated for \(t+1\). On the other hand, if an IIA is terminated after 1 July of year \(t\), it is still deemed in place during \(t\). If the IIA is terminated before the cut-off date in \(t\), it is deemed in place until the end of \(t-1\). Where termination dates are indefinite, I take the current year as the end of the enforcement period.

iia_table <- iia_table %>% 
  mutate(Start = case_when(
    Entry < ymd(paste(year(Entry), "-07-01", sep = "")) ~ year(Entry),
    Entry >= ymd(paste(year(Entry), "-07-01", sep = "")) ~ year(Entry) + 1,
    TRUE ~ NA)) %>% 
  mutate(End = case_when(
    is.na(Termination) ~ year(Sys.Date()),
    Termination < ymd(paste(year(Termination), "-07-01", sep = "")) ~ year(Termination) - 1,
    Termination >= ymd(paste(year(Termination), "-07-01", sep = "")) ~ year(Termination),
    TRUE ~ NA))

Thus, I have determined the start and end years of each IIA’s enforcement. As with the Parties variable, I create a vector containing all years of enforcement for each.

iia_table <- iia_table %>% 
  mutate(Period = future_map2(Start, End, ~seq.int(.x, .y, by = 1))) 

For example, the first IIA in iia_table, titled Afghanistan - Germany BIT (2005), was signed on 2005-04-20, entered into force on 2007-10-12, and has an undefined termination date. Hence, its enforcement period is as follows:

iia_table$Period[[1]]
 [1] 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020
[14] 2021 2022 2023 2024

Using Index, Parties and Period columns from iia_table, I can begin to construct this matrix, for example:

iia_matrix <- iia_table %>% 
  select(Index, Origin = Parties, Destination = Parties, Period)

# Example
print(filter(iia_matrix, Index == 1))
# A tibble: 1 × 4
  Index Origin    Destination Period    
  <int> <list>    <list>      <list>    
1     1 <chr [2]> <chr [2]>   <dbl [17]>

As opposed to imposing some structure on a matrix of countries, and subsequently applying a function to determine a country pair’s joint affiliation to the same IIA in a given year, I instead allow for the matrix to spring from IIA data. The unnest function is very useful in this regard. I create a data frame explicitly containing all possible Origin-Destination combinations of countries, which appear in mapped IIAs. In addition, I create the indicator variable of interest agree_iia. Following the typical structure of gravity covariate datasets, agree_iia need not be dyadic across Origin-Destination pairs.

iia_matrix <- iia_matrix %>% 
  unnest(Origin) %>% 
  unnest(Destination) %>% 
  distinct() %>%  # to eliminate duplicate country pairs per IIA
  mutate(agree_iia = 1)

# Example
print(filter(iia_matrix, Index == 1))
# A tibble: 4 × 5
  Index Origin Destination Period     agree_iia
  <int> <chr>  <chr>       <list>         <dbl>
1     1 AFG    AFG         <dbl [17]>         1
2     1 AFG    DEU         <dbl [17]>         1
3     1 DEU    AFG         <dbl [17]>         1
4     1 DEU    DEU         <dbl [17]>         1

I follow the same unnesting procedure for the enforcement periods of each IIA. Subsequently, I aggregate agree_iia from the IIA-Origin-Destination-Year level to the Origin-Destination-Year level. Thus, agree_iia is now indicated for a given year if at least one IIA is enforced among a particular Origin-Destination pair in said year.

iia_matrix <- iia_matrix %>% 
  unnest(Period) %>% 
  group_by(Origin, Destination, Period) %>% 
  summarise(agree_iia = max(agree_iia, na.rm = T)) %>% 
  ungroup()

There are 127833 Origin-Destination-Year combinations arising from mapped IIAs. The earliest enforcement period start is 1962, and the latest end—by construction—is 2024. There are 188 unique countries covered by previously mapped IIAs, and IIAs’ cover 5558 actual combinations of Origins and Destinations.

Using the spread and gather functions, I transform the data frame into a balanced matrix format. If a particular Origin-Destination-Year observation of agree_iia was not present in the original set of 127833 observations, agree_iia now takes the value of zero for said combination. In addition, I convert agree_iia to zero where Origins and Destinations are symmetric, which is standard practice in the construction of bilateral trade facilitation variables.

iia_matrix <- iia_matrix %>% 
  spread(Period, agree_iia, fill = 0) %>% 
  gather(Year, agree_iia, 3:ncol(.)) %>% 
  spread(Destination, agree_iia, fill = 0) %>% 
  gather(Destination, agree_iia, 3:ncol(.)) %>% 
  select(Year, Origin, Destination, agree_iia) %>% 
  mutate(agree_iia = case_when(
    Origin == Destination ~ 0,
    TRUE ~ agree_iia
  ))

The final product is a balanced bilateral matrix of countries’ joint affiliation to at least one IIA—or lack thereof—in a given year. It comprises 2226672 observations of agree_iia, spanning the years 1962 to 2024. It encompasses 35344 unique combinations of Origin and Destination countries—the countries covered by mapped IIAs. An exemplary subset of this matrix is tabulated below; that is, observations of agree_iia for 2024. The tabulation is accompanied by a download link for the corresponding .csv data.

Limitations

I want to urge caution when adopting a similar approach to the one given here, or when employing the resulting dataset in empirical research. Please note the following limitations:

I kindly invite willing and eager readers to reach out to me regarding any flaws, concerns, suggestions, etc. I would love to hear them!

Update

This post was last updated on 23 May 2024.

Session Information

─ Session info ─────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.2 (2023-10-31 ucrt)
 os       Windows 11 x64 (build 22631)
 system   x86_64, mingw32
 ui       RTerm
 language (EN)
 collate  English_South Africa.utf8
 ctype    English_South Africa.utf8
 tz       Africa/Johannesburg
 date     2024-05-23
 pandoc   3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)

─ Packages ─────────────────────────────────────────────────────────
 package       * version date (UTC) lib source
 base64enc       0.1-3   2015-07-28 [1] CRAN (R 4.3.0)
 bslib           0.7.0   2024-03-29 [1] CRAN (R 4.3.3)
 cachem          1.0.8   2023-05-01 [1] CRAN (R 4.3.1)
 chromote        0.2.0   2024-02-12 [1] CRAN (R 4.3.3)
 cli             3.6.2   2023-12-11 [1] CRAN (R 4.3.2)
 codetools       0.2-20  2024-03-31 [1] CRAN (R 4.3.3)
 colorspace      2.1-0   2023-01-23 [1] CRAN (R 4.3.1)
 countrycode   * 1.6.0   2024-03-22 [1] CRAN (R 4.3.3)
 crosstalk       1.2.1   2023-11-23 [1] CRAN (R 4.3.2)
 digest          0.6.35  2024-03-11 [1] CRAN (R 4.3.3)
 distill       * 1.6     2023-10-06 [1] CRAN (R 4.3.2)
 downlit         0.4.3   2023-06-29 [1] CRAN (R 4.3.1)
 dplyr         * 1.1.4   2023-11-17 [1] CRAN (R 4.3.3)
 DT            * 0.33    2024-04-04 [1] CRAN (R 4.3.3)
 evaluate        0.23    2023-11-01 [1] CRAN (R 4.3.2)
 fansi           1.0.6   2023-12-08 [1] CRAN (R 4.3.2)
 fastmap         1.1.1   2023-02-24 [1] CRAN (R 4.3.1)
 fontawesome     0.5.2   2023-08-19 [1] CRAN (R 4.3.1)
 forcats       * 1.0.0   2023-01-29 [1] CRAN (R 4.3.3)
 furrr         * 0.3.1   2022-08-15 [1] CRAN (R 4.3.1)
 future        * 1.33.2  2024-03-26 [1] CRAN (R 4.3.3)
 generics        0.1.3   2022-07-05 [1] CRAN (R 4.3.1)
 ggplot2       * 3.5.0   2024-02-23 [1] CRAN (R 4.3.3)
 globals         0.16.3  2024-03-08 [1] CRAN (R 4.3.3)
 glue            1.7.0   2024-01-09 [1] CRAN (R 4.3.2)
 gtable          0.3.4   2023-08-21 [1] CRAN (R 4.3.1)
 here          * 1.0.1   2020-12-13 [1] CRAN (R 4.3.3)
 highr           0.10    2022-12-22 [1] CRAN (R 4.3.1)
 hms             1.1.3   2023-03-21 [1] CRAN (R 4.3.1)
 htmltools     * 0.5.8.1 2024-04-04 [1] CRAN (R 4.3.2)
 htmlwidgets     1.6.4   2023-12-06 [1] CRAN (R 4.3.2)
 httr            1.4.7   2023-08-15 [1] CRAN (R 4.3.3)
 jquerylib       0.1.4   2021-04-26 [1] CRAN (R 4.3.1)
 jsonlite        1.8.8   2023-12-04 [1] CRAN (R 4.3.2)
 kableExtra    * 1.4.0   2024-01-24 [1] CRAN (R 4.3.3)
 knitr         * 1.46    2024-04-06 [1] CRAN (R 4.3.3)
 later           1.3.2   2023-12-06 [1] CRAN (R 4.3.2)
 lifecycle       1.0.4   2023-11-07 [1] CRAN (R 4.3.2)
 listenv         0.9.1   2024-01-29 [1] CRAN (R 4.3.2)
 lubridate     * 1.9.3   2023-09-27 [1] CRAN (R 4.3.3)
 magrittr        2.0.3   2022-03-30 [1] CRAN (R 4.3.1)
 memoise         2.0.1   2021-11-26 [1] CRAN (R 4.3.1)
 MetBrewer     * 0.2.0   2022-03-21 [1] CRAN (R 4.3.3)
 munsell         0.5.1   2024-04-01 [1] CRAN (R 4.3.3)
 pacman        * 0.5.1   2019-03-11 [1] CRAN (R 4.3.3)
 parallelly      1.37.1  2024-02-29 [1] CRAN (R 4.3.3)
 pillar          1.9.0   2023-03-22 [1] CRAN (R 4.3.1)
 pkgconfig       2.0.3   2019-09-22 [1] CRAN (R 4.3.1)
 printr        * 0.3     2023-03-08 [1] CRAN (R 4.3.3)
 processx        3.8.4   2024-03-16 [1] CRAN (R 4.3.3)
 promises        1.3.0   2024-04-05 [1] CRAN (R 4.3.2)
 ps              1.7.6   2024-01-18 [1] CRAN (R 4.3.2)
 purrr         * 1.0.2   2023-08-10 [1] CRAN (R 4.3.3)
 R6              2.5.1   2021-08-19 [1] CRAN (R 4.3.1)
 ragg            1.3.0   2024-03-13 [1] CRAN (R 4.3.3)
 Rcpp            1.0.12  2024-01-09 [1] CRAN (R 4.3.2)
 readr         * 2.1.5   2024-01-10 [1] CRAN (R 4.3.3)
 repr            1.1.7   2024-03-22 [1] CRAN (R 4.3.3)
 rlang           1.1.3   2024-01-10 [1] CRAN (R 4.3.2)
 rmarkdown       2.26    2024-03-05 [1] CRAN (R 4.3.3)
 rprojroot       2.0.4   2023-11-05 [1] CRAN (R 4.3.2)
 rstudioapi      0.16.0  2024-03-24 [1] CRAN (R 4.3.3)
 rvest         * 1.0.4   2024-02-12 [1] CRAN (R 4.3.2)
 sass            0.4.9   2024-03-15 [1] CRAN (R 4.3.3)
 scales          1.3.0   2023-11-28 [1] CRAN (R 4.3.2)
 sessioninfo     1.2.2   2021-12-06 [1] CRAN (R 4.3.2)
 skimr         * 2.1.5   2022-12-23 [1] CRAN (R 4.3.1)
 stringi         1.8.3   2023-12-11 [1] CRAN (R 4.3.2)
 stringr       * 1.5.1   2023-11-14 [1] CRAN (R 4.3.3)
 svglite         2.1.3   2023-12-08 [1] CRAN (R 4.3.2)
 systemfonts     1.0.6   2024-03-07 [1] CRAN (R 4.3.3)
 textshaping     0.3.7   2023-10-09 [1] CRAN (R 4.3.2)
 tibble        * 3.2.1   2023-03-20 [1] CRAN (R 4.3.3)
 tidyr         * 1.3.1   2024-01-24 [1] CRAN (R 4.3.3)
 tidyselect      1.2.1   2024-03-11 [1] CRAN (R 4.3.3)
 tidyverse     * 2.0.0   2023-02-22 [1] CRAN (R 4.3.3)
 timechange      0.3.0   2024-01-18 [1] CRAN (R 4.3.2)
 tzdb            0.4.0   2023-05-12 [1] CRAN (R 4.3.1)
 utf8            1.2.4   2023-10-22 [1] CRAN (R 4.3.2)
 uuid            1.2-0   2024-01-14 [1] CRAN (R 4.3.2)
 vctrs           0.6.5   2023-12-01 [1] CRAN (R 4.3.2)
 viridisLite     0.4.2   2023-05-02 [1] CRAN (R 4.3.3)
 websocket       1.4.1   2021-08-18 [1] CRAN (R 4.3.3)
 withr           3.0.0   2024-01-16 [1] CRAN (R 4.3.2)
 xaringanExtra * 0.7.0   2022-07-16 [1] CRAN (R 4.3.2)
 xfun            0.43    2024-03-25 [1] CRAN (R 4.3.3)
 xml2            1.3.6   2023-12-04 [1] CRAN (R 4.3.2)
 yaml            2.3.8   2023-12-11 [1] CRAN (R 4.3.2)

 [1] C:/Users/marai/AppData/Local/R/win-library/4.3
 [2] C:/Program Files/R/R-4.3.2/library

────────────────────────────────────────────────────────────────────
Blonigen, Bruce A., and Ronald B. Davies. 2004. “The Effects of Bilateral Tax Treaties on U.S. FDI Activity.” International Tax and Public Finance 11 (5): 601–22. https://doi.org/10.1023/B:ITAX.0000036693.32618.00.
Blonigen, Bruce A., and Jeremy Piger. 2014. “Determinants of Foreign Direct Investment: Determinants of Foreign Direct Investment.” Canadian Journal of Economics/Revue Canadienne d’économique 47 (3): 775–812. https://doi.org/10.1111/caje.12091.
Di Giovanni, Julian. 2005. “What Drives Capital Flows? The Case of Cross-Border M&A Activity and Financial Deepening.” Journal of International Economics 65 (1): 127–49. https://doi.org/10.1016/j.jinteco.2003.11.007.
Tax Analysts. 2001. “Worldwide Tax Treaty Index.” Washington, DC.

  1. Tax treaty data was obtained from Tax Analysts (2001).↩︎

  2. https://investmentpolicy.unctad.org/international-investment-agreements↩︎

  3. https://investmentpolicy.unctad.org/international-investment-agreements/iia-mapping↩︎

  4. https://investmentpolicy.unctad.org/uploaded-files/document/Mapping%20Project%20Description%20and%20Methodology.pdf↩︎

  5. https://www.treaties.tax/en/↩︎

  6. I remove the Text column, because I am not interested in IIAs’ original documentation.↩︎

  7. This was done by observing the parties with the longest names, for those containing additional commas typically have longer names.↩︎

  8. I like the user-friendly and simple to use CSS selector offered by the SelectorGadget Chrome extension.↩︎

  9. For example, /international-investment-agreements/groupings/11/acp-african-caribbean-and-pacific-group-of-states-.↩︎

References

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-SA 4.0. Source code is available at https://github.com/WihanZA/wihan_distill, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Marais (2024, May 6). Wihan Marais: Scraping International Investment Agreement Data. Retrieved from https://www.wihanza.com/posts/2024-05-06-scraping-international-investment-agreement-data/

BibTeX citation

@misc{marais2024scraping,
  author = {Marais, Wihan},
  title = {Wihan Marais: Scraping International Investment Agreement Data},
  url = {https://www.wihanza.com/posts/2024-05-06-scraping-international-investment-agreement-data/},
  year = {2024}
}