Wihan Marais: Scraping International Investment Agreement Data

Wihan Marais

Background

Bilateral investment treaties (BITs) and related bilateral tax treaties (BTTs) are intended to promote, among others, capital flows between countries. For example, such treaties facilitate cross-border investment by multinationals by eliminating double-taxation. On the other hand, these treaties deter FDI in some cases by limiting investors’ ability to circumvent taxes by investing elsewhere. Understandably, the potentially ambiguous effects of these treaties on capital flows, such as aggregate foreign direct investment (FDI), and mergers and acquisitions (M&A), have received much attention in the empirical international economics literature.

Blonigen and Davies (2004)—one of the earliest studies on this front—could not find persuasive evidence for the facilitating role of BTTs with respect to US inbound and outbound FDI activity over the period 1980–1999. Although not their primary focus, Di Giovanni (2005) shows that capital tax treaties promote cross-border M&A deals for the period 1990 – 1999, beyond that of the advantages gleaned from countries’ capital tax rates, perhaps as a result of improved transparency in doing business across borders.¹ Blonigen and Piger (2014) use Bayesian statistical techniques to highlight negotiated bilateral agreements, like BITs and service agreements, as important determinants of M&A activity worldwide. Specifically, it is suggested that these treaties are consequential drivers of FDI into poorer countries.

However, keeping track of the status BITs and other treaties with investment provisions (TIPs) remains rather difficult. Ever-evolving lists of signatories and economic union members, as well as differences between ratification, enforcement and termination dates make it particularly difficult. In an effort to construct my own BIT measure to include in the model specification of my research paper, “Gravity and Cross-Border Influence: The Importance of Distance in Mobile Telecommunications”, I turned to web scraping. In what follows, I describe the corresponding data source and methodology.

Data

I apply a simple and undemanding web scraping method to the International Investment Agreements (IIA) Navigator hosted by the United Nations Conference on Trade and Development (UNCTAD).² UNCTAD provides this and other tools to aid the monitoring, analysis and improvement of international investment policymaking. The IIA Navigator comprises a database of IIAs, a product of the collaborative IIA Mapping Project,³ which is regularly updated as the mapping of IIA content continues. For more information, see the relevant project description and methodology.⁴

IIAs are categorised as either bilateral investment treaties (BITs) or treaties with investment provisions (TIPs). The Navigator conceives of this distinction as follows:

“A bilateral investment treaty is an agreement between two countries regarding promotion and protection of investments made by investors from respective countries in each other’s territory. The great majority of IIAs are BITs.

The category of treaties with investment provisions brings together various types of investment treaties that are not BITs. Three main types of TIPs can be distinguished:

broad economic treaties that include obligations commonly found in BITs (e.g. a free trade agreement with an investment chapter);

treaties with limited investment-related provisions (e.g. only those concerning establishment of investments or free transfer of investment-related funds); and

treaties that only contain “framework” clauses such as the ones on cooperation in the area of investment and/or for a mandate for future negotiations on investment issues.”

Although not considered in this post, a comparable resource for bilateral tax treaties can be found in the Tax Treaties Explorer provided by the International Centre for Tax and Development (ICTD).⁵

Methodology

In this section, I describe and provide the code employed to obtain and clean IIA data. First, I’ll load the necessary packages. In addition, I set the plan for executing futures across multiple sessions in parallel.

library(pacman)
p_load(tidyverse,
       rvest,
       countrycode,
       skimr,
       furrr)

plan(multisession, workers = 6)

Scraping and Parsing IIA Table

The focus of my web scraping is merely the table on the Mapping of IIA Content webpage. I am primarily concerned with the timing of countries’ IIA affiliations. Hence, it is important to standardise the coding of countries’ names, as well as all date variables.

Scraping the table on the Mapping of IIA Content webpage.

Using rvest, I scrape the webpage and parse the table. I save it as an RDS file for future use, instead of repeatedly scraping and parsing.

iia_url <- "https://investmentpolicy.unctad.org/international-investment-agreements/iia-mapping"
iia_webpage <- read_html(iia_url)
iia_table <- html_table(iia_webpage, header = T, trim = T)[[1]]
saveRDS(iia_table, "data/iia_table.rds")

Data Overview and Wrangling

The table is then read back into the R environment. I skim the resulting table to get an overview of the data. At first glance, there seems to be plenty of empty cells, and clunky variable names. I create parsimonious variable names and replace empty cells with missing values.⁶

iia_table <- read_rds("data/iia_table.rds") %>% 
  rename(Index = No.,
         Full = `Full title`,
         Short = `Short title`,
         Signature = `Date of signature`,
         Entry = `Date of entry into force`,
         Termination = `Termination date`) %>% 
  select(-Text) %>% 
  mutate(across(where(is.character), ~ifelse(. == "", NA, .)))

skim(iia_table)

Table 1: Data summary
Name	iia_table
Number of rows	2591
Number of columns	9
_______________________
Column type frequency:
character	8
numeric	1
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	n_unique
Full	2503	0.03	21	207	88
Short	0	1.00	10	96	2591
Type	0	1.00	4	4	2
Status	0	1.00	8	21	3
Parties	0	1.00	10	165	2504
Signature	0	1.00	10	10	2104
Entry	317	0.88	10	10	1897
Termination	2121	0.18	7	10	301

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
Index	0	1	1296	748.1	1	648.5	1296	1943.5	2591	▇▇▇▇▇

The resulting data frame comprises 2591 rows or distinct IIAs. The table below presents a sample of five of these rows.

iia_table %>% 
  select(Index, Parties, Signature, Entry, Termination) %>% 
  head(5)

Index	Parties	Signature	Entry	Termination
1	Afghanistan, Germany	20/04/2005	12/10/2007	NA
2	Afghanistan, Türkiye	10/07/2004	19/07/2005	NA
3	Albania, Austria	18/03/1993	01/08/1995	NA
4	Albania, Azerbaijan	09/02/2012	13/07/2012	NA
5	Albania, BLEU (Belgium-Luxembourg Economic Union)	01/02/1999	18/10/2002	NA

The tables above highlight some important issues with the data, which I will adress in turn:

Some variables are prone to missing many observations.
- Most importantly, this is suggestive of the fact that most IIAs have not yet been terminated, nor have scheduled terminations. In addition, missing Entry observations are indicative of the fact that some IIAs are not enforced.
Date variables are given as character variables.
Individual parties are delimited by commas.
Individual parties are sometimes given by their non-English names.
Parties include economic blocs and unions, which are referred to as “Country Groupings” by the Navigator.

Dates

I convert date columns into the appropriate format with the lubridate package. The conditional mutate of Terminate was necessary because some observations were given in the mm/yyyy format.

iia_table <- iia_table %>% 
  mutate(Signature = dmy(Signature),
         Entry = dmy(Entry),
         Termination = gsub("-","/", Termination),
         Termination = case_when(
           nchar(Termination) > 7 ~ Termination,
           nchar(Termination) <= 7 ~ paste("01", Termination, sep = "/"),
           TRUE ~ NA),
         Termination = dmy(Termination))

Parties

Parties to a given IIA are presented as a character string delimited by commas. For each IIA, I transform this value into a list containing elements representing each party. However, some parties’ names comprise commas too, e.g., Republic of Korea is given as “Korea, Republic of”. To delimit parties, I first correct the names of parties which contain commas. The names of these parties, as well as their replacement values, were identified manually, which are represented by the list correctnames.⁷ Using the correctname_function, country names are standardised.

correctnames <- list(
  # list(original, replacement)
    list("Korea, Republic of","Republic of Korea"),
    list("Moldova, Republic of", "Republic of Moldova"),
    list("Iran, Islamic Republic of", "Islamic Republic of Iran"),
    list("Bolivia, Plurinational State of", "Plurinational State of Bolivia"),
    list("Korea, Dem. People's Rep. of", "Dem. People's Rep. of Korea"),
    list("Venezuela, Bolivarian Republic of", "Bolivarian Republic of Venezuela"),
    list("Congo, Democratic Republic of the", "Democratic Republic of the Congo"),
    list("Tanzania, United Republic of", "United Republic of Tanzania"),
    list("Micronesia, Federated States of", "Federated States of Micronesia")
)

correctname_function <- function(incorrectnamevector){
  temp_names <- incorrectnamevector
  for (i in 1:length(correctnames)) {
    temp_names <- gsub(
      pattern = correctnames[[i]][1],
      replacement = correctnames[[i]][2],
      x = temp_names,
      ignore.case = T
    )
  }
  return(temp_names)
}

iia_table$Parties <- iia_table$Parties %>% 
  lapply(correctname_function) %>% unlist()

Country Groupings

As it stands, the Parties column consists of character strings containing countries’ standardised names, as well as the names of country groupings, all delimited by commas. I want to replace country groupings in Parties with their corresponding sets of countries.

Hence, I obtain a list of all country groupings by scraping the table on the IIAs by Country Grouping webpage with an approach akin to that used before. Each grouping in the table contains a link to its dedicated webpage. I use a CSS selector to identify the elements in groupings_webpage which contain these links.⁸ I parse these links and merge them with the table listing all country groupings, i.e., groupings_table.

groupings_url <- "https://investmentpolicy.unctad.org/international-investment-agreements/by-country-grouping"
groupings_webpage <- read_html(groupings_url)
groupings_table <- html_table(groupings_webpage, header = T, trim = T)[[1]] %>% 
  select(Index = No., Name)

groupings_table$Links <- html_elements(groupings_webpage, ".min-one-line a") %>% 
  html_attrs() %>% 
  unlist() %>% 
  unname()

Groupings’ webpages, like that of the ACP, are devoted to providing additional information regarding their members, and the status of their IIA affiliations. For now, I am only interested in determining the country compositions of each grouping. I write a function to access groupings’ webpages using their corresponding links, which are relative to the website’s base URL.⁹ On each webpage, the function scrapes a vector of members, again with the use of the appropriate CSS selector.

Scraping country compositions from each country grouping’s dedicated webpage.

I use future_map from the furrr package to execute the function across multiple sessions to speed up operations. For each link, the function returns a single character string containing a grouping’s members delimited by commas, as in the original table of IIAs or iia_table. As before, I save the table as an RDS file for later use, and to avoid repeatedly scraping multiple pages.

grouping_members_function <- function(groupinglink){
  paste("https://investmentpolicy.unctad.org", groupinglink, sep = "") %>% 
    read_html(.) %>% 
    html_elements("#general a") %>% 
    html_text2() %>% 
    paste(collapse = ", ") %>% 
    return(.)
}

groupings_table$Members <- groupings_table$Links %>% 
  future_map(~grouping_members_function(groupinglink = .x)) %>% 
    unlist()

saveRDS(groupings_table, "data/groupings_table.rds")

I read groupings_table back into the global environment and, as before, clean non-standard country names with correctname_function. Some country groupings’ members are country groupings in and of themselves. I create a minor function to highlight the country groupings of concern.

groupings_table <- read_rds("data/groupings_table.rds")

groupings_table$Members <- groupings_table$Members %>% 
  lapply(correctname_function) %>% unlist()

# Create short name to look for matches in members
groupings_table <- groupings_table %>% 
  mutate(Short = str_extract(Name, "\\(([^)]+)\\)"),
         Short = gsub("\\(|\\)", "", Short))

groupingroup_function <- function(groupingname){
  member_vector <- groupings_table %>% 
    filter(Name == groupingname) %>% 
    pull(Members) %>% 
    str_split(", ") %>% 
    unlist() %>% 
    unique()
  member_vector <- member_vector[member_vector %in% (groupings_table$Short)]
  if(length(member_vector) > 0){
    return(data.frame(Name = groupingname, Issue = paste(member_vector, collapse = ", ")))
  } else{
    return(data.frame(Name = groupingname, Issue = NA))
  }
}

groupings_table$Name %>% 
  map_dfr(groupingroup_function) %>% 
  filter(!is.na(Issue))

Name	Issue
Energy Charter Treaty members	European Union
EU (European Union)	European Union

Seemingly, European Union is the only country grouping to be listed as member in other groupings. For example, Energy Charter Treaty lists the European Union as one of its members. To address this I must first remove European Union as a member of the EU (European Union) country grouping.

groupings_table <- groupings_table %>% 
  mutate(Members = case_when(
    Short == "European Union" ~ gsub("European Union,", "", Members),
    TRUE ~ Members
  ))

groupings_table <- groupings_table %>% 
  mutate(Members = gsub(
    "European Union",
    (groupings_table %>% filter(Short == "European Union") %>% pull(Members)),
    x = Members))

# test if cleaning worked
groupings_table$Name %>% 
  map_dfr(groupingroup_function) %>% 
  filter(!is.na(Issue)) %>% 
  nrow(.) == 0

[1] TRUE

Now, I need to replace the country groupings, as they appear in the Parties column of iia_table, with their corresponding Members. To this end, I first transform Parties into character vectors by delimiting individual parties, and ensuring there are no white spaces. I create and a execute a function to identify country groupings in party vectors, and in turn, replace them with groupings’ members. The result is iia_table with a Parties column consisting of vectors of strictly countries.

iia_table$Parties <- iia_table$Parties %>% 
  str_split(", ") %>% 
  lapply(str_squish)

append_groupings_function <- function(partylist){
  temp_vector <- unlist(partylist)
  for (i in 1:length(temp_vector)) {
    if (temp_vector[i] %in% groupings_table$Name) {
      
      temp_members <- groupings_table %>% 
        filter(Name == temp_vector[i]) %>% 
        pull(Members) %>% 
        str_split(", ") %>% 
        unlist()
      temp_vector <- temp_vector %>% append(temp_members)
    }
  }
  temp_vector <- temp_vector[!temp_vector %in% groupings_table$Name]
  return(temp_vector)
}

iia_table$Parties <- iia_table$Parties %>% 
  lapply(append_groupings_function)

Subsequently, I use the formidable countrycode package to convert the country names in each vector of countries in Parties to their corresponding ISO 3 codes. Once again, I execute the function using futures.

iia_table$Parties <- iia_table$Parties %>%
  future_map(~countrycode(.x,
                          origin = "country.name",
                          destination = "iso3c"))

Bilateral Matrix of IIA Involvement

The product of all operations thus far is a data frame comprising IIAs already mapped by UNCTAD. The processed iita_table presents each IIA alongside information about its type, signatories, dates, and status. It is now possible to construct a time-varying, bilateral matrix of countries’ mutual involvement in IIAs.

I consider only those IIA which are/have been enforced, as opposed to those which have been signed but not yet enforced. 317 out of 2591 IIAs have been signed without ever entering into force. I exclude these IIAs from the construction of the bilateral matrix.

iia_table <- iia_table %>% 
  filter(!is.na(Entry))

The matrix consists of a dummy variable indicating the presence of an IIA between an origin and destination country in a given year. I take 1 July as the cut-off date to convert Entry and Termination into year columns, Start and End. In other words, if an IIA takes effect before 1 July of year \(t\), the particular IIA is indicated for \(t\). If the IIA takes effect on or after this cut-off, it is only indicated for \(t+1\). On the other hand, if an IIA is terminated after 1 July of year \(t\), it is still deemed in place during \(t\). If the IIA is terminated before the cut-off date in \(t\), it is deemed in place until the end of \(t-1\). Where termination dates are indefinite, I take the current year as the end of the enforcement period.

iia_table <- iia_table %>% 
  mutate(Start = case_when(
    Entry < ymd(paste(year(Entry), "-07-01", sep = "")) ~ year(Entry),
    Entry >= ymd(paste(year(Entry), "-07-01", sep = "")) ~ year(Entry) + 1,
    TRUE ~ NA)) %>% 
  mutate(End = case_when(
    is.na(Termination) ~ year(Sys.Date()),
    Termination < ymd(paste(year(Termination), "-07-01", sep = "")) ~ year(Termination) - 1,
    Termination >= ymd(paste(year(Termination), "-07-01", sep = "")) ~ year(Termination),
    TRUE ~ NA))

Thus, I have determined the start and end years of each IIA’s enforcement. As with the Parties variable, I create a vector containing all years of enforcement for each.

iia_table <- iia_table %>% 
  mutate(Period = future_map2(Start, End, ~seq.int(.x, .y, by = 1)))

For example, the first IIA in iia_table, titled Afghanistan - Germany BIT (2005), was signed on 2005-04-20, entered into force on 2007-10-12, and has an undefined termination date. Hence, its enforcement period is as follows:

iia_table$Period[[1]]

 [1] 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020
[14] 2021 2022 2023 2024

Using Index, Parties and Period columns from iia_table, I can begin to construct this matrix, for example:

iia_matrix <- iia_table %>% 
  select(Index, Origin = Parties, Destination = Parties, Period)

# Example
print(filter(iia_matrix, Index == 1))

# A tibble: 1 × 4
  Index Origin    Destination Period    
  <int> <list>    <list>      <list>    
1     1 <chr [2]> <chr [2]>   <dbl [17]>

As opposed to imposing some structure on a matrix of countries, and subsequently applying a function to determine a country pair’s joint affiliation to the same IIA in a given year, I instead allow for the matrix to spring from IIA data. The unnest function is very useful in this regard. I create a data frame explicitly containing all possible Origin-Destination combinations of countries, which appear in mapped IIAs. In addition, I create the indicator variable of interest agree_iia. Following the typical structure of gravity covariate datasets, agree_iia need not be dyadic across Origin-Destination pairs.

iia_matrix <- iia_matrix %>% 
  unnest(Origin) %>% 
  unnest(Destination) %>% 
  distinct() %>%  # to eliminate duplicate country pairs per IIA
  mutate(agree_iia = 1)

# Example
print(filter(iia_matrix, Index == 1))

# A tibble: 4 × 5
  Index Origin Destination Period     agree_iia
  <int> <chr>  <chr>       <list>         <dbl>
1     1 AFG    AFG         <dbl [17]>         1
2     1 AFG    DEU         <dbl [17]>         1
3     1 DEU    AFG         <dbl [17]>         1
4     1 DEU    DEU         <dbl [17]>         1

I follow the same unnesting procedure for the enforcement periods of each IIA. Subsequently, I aggregate agree_iia from the IIA-Origin-Destination-Year level to the Origin-Destination-Year level. Thus, agree_iia is now indicated for a given year if at least one IIA is enforced among a particular Origin-Destination pair in said year.

iia_matrix <- iia_matrix %>% 
  unnest(Period) %>% 
  group_by(Origin, Destination, Period) %>% 
  summarise(agree_iia = max(agree_iia, na.rm = T)) %>% 
  ungroup()

There are 127833 Origin-Destination-Year combinations arising from mapped IIAs. The earliest enforcement period start is 1962, and the latest end—by construction—is 2024. There are 188 unique countries covered by previously mapped IIAs, and IIAs’ cover 5558 actual combinations of Origins and Destinations.

Using the spread and gather functions, I transform the data frame into a balanced matrix format. If a particular Origin-Destination-Year observation of agree_iia was not present in the original set of 127833 observations, agree_iia now takes the value of zero for said combination. In addition, I convert agree_iia to zero where Origins and Destinations are symmetric, which is standard practice in the construction of bilateral trade facilitation variables.

iia_matrix <- iia_matrix %>% 
  spread(Period, agree_iia, fill = 0) %>% 
  gather(Year, agree_iia, 3:ncol(.)) %>% 
  spread(Destination, agree_iia, fill = 0) %>% 
  gather(Destination, agree_iia, 3:ncol(.)) %>% 
  select(Year, Origin, Destination, agree_iia) %>% 
  mutate(agree_iia = case_when(
    Origin == Destination ~ 0,
    TRUE ~ agree_iia
  ))

The final product is a balanced bilateral matrix of countries’ joint affiliation to at least one IIA—or lack thereof—in a given year. It comprises 2226672 observations of agree_iia, spanning the years 1962 to 2024. It encompasses 35344 unique combinations of Origin and Destination countries—the countries covered by mapped IIAs. An exemplary subset of this matrix is tabulated below; that is, observations of agree_iia for 2024. The tabulation is accompanied by a download link for the corresponding .csv data.

Limitations

I want to urge caution when adopting a similar approach to the one given here, or when employing the resulting dataset in empirical research. Please note the following limitations:

As it stands, the bilateral matrix of IIA involvement does not yet account for the entry and exit of countries into and out of country groupings, such as economic blocs and unions. This would entail acquiring a membership timeline for each grouping addressed in the Country Groupings section. Subsequently, country groupings’ lists of members should be made time-varying by accounting for deviations from current lists over time. Time-adjusted lists of members should be used to replace country groupings in the appearances of country groupings in the IIA data prior to constructing the bilateral matrix.
Thanks to great work of the team at UNCTAD, and their collaborators, we are able to construct a matrix such as this one. However, it should be noted that the mapping of IIAs has not yet been completed, and will also be updated regularly. Hence, the matrix is not exhaustive (in a retrospective sense), and will not necessarily be up-to-date henceforth.
Related to the point above, the outcome of these operations is only as comprehensive as the coverage of previously mapped IIAs permits. It may be that agree_iia should actually be indicated for a particular Origin-Destination-Year combination, but has instead been computed as \(0\), or not computed at all, as the IIA(s) of concern has not yet been mapped. Even though constructing the matrix from mapped IIAs—or from the ground up as I do here—is computationally efficient, the resulting coverage of countries, country pairs, and enforcement periods are similarly constrained. Thus, it is an important limitation to consider when using the matrix in empirical research, as it may not be representative of the universe of IIAs, and similarly, countries’ investment relationships.
Web scraping allows us to stand on the shoulders of giants when it comes to acquiring useful data in the public domain. However, unless performed ethically and within reasonable limits, web scraping may come at a great cost to the host. Read more about ethical data collection here.

I kindly invite willing and eager readers to reach out to me regarding any flaws, concerns, suggestions, etc. I would love to hear them!

Update

This post was last updated on 23 May 2024.

Session Information

─ Session info ─────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.2 (2023-10-31 ucrt)
 os       Windows 11 x64 (build 22631)
 system   x86_64, mingw32
 ui       RTerm
 language (EN)
 collate  English_South Africa.utf8
 ctype    English_South Africa.utf8
 tz       Africa/Johannesburg
 date     2024-05-23
 pandoc   3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)

─ Packages ─────────────────────────────────────────────────────────
 package       * version date (UTC) lib source
 base64enc       0.1-3   2015-07-28 [1] CRAN (R 4.3.0)
 bslib           0.7.0   2024-03-29 [1] CRAN (R 4.3.3)
 cachem          1.0.8   2023-05-01 [1] CRAN (R 4.3.1)
 chromote        0.2.0   2024-02-12 [1] CRAN (R 4.3.3)
 cli             3.6.2   2023-12-11 [1] CRAN (R 4.3.2)
 codetools       0.2-20  2024-03-31 [1] CRAN (R 4.3.3)
 colorspace      2.1-0   2023-01-23 [1] CRAN (R 4.3.1)
 countrycode   * 1.6.0   2024-03-22 [1] CRAN (R 4.3.3)
 crosstalk       1.2.1   2023-11-23 [1] CRAN (R 4.3.2)
 digest          0.6.35  2024-03-11 [1] CRAN (R 4.3.3)
 distill       * 1.6     2023-10-06 [1] CRAN (R 4.3.2)
 downlit         0.4.3   2023-06-29 [1] CRAN (R 4.3.1)
 dplyr         * 1.1.4   2023-11-17 [1] CRAN (R 4.3.3)
 DT            * 0.33    2024-04-04 [1] CRAN (R 4.3.3)
 evaluate        0.23    2023-11-01 [1] CRAN (R 4.3.2)
 fansi           1.0.6   2023-12-08 [1] CRAN (R 4.3.2)
 fastmap         1.1.1   2023-02-24 [1] CRAN (R 4.3.1)
 fontawesome     0.5.2   2023-08-19 [1] CRAN (R 4.3.1)
 forcats       * 1.0.0   2023-01-29 [1] CRAN (R 4.3.3)
 furrr         * 0.3.1   2022-08-15 [1] CRAN (R 4.3.1)
 future        * 1.33.2  2024-03-26 [1] CRAN (R 4.3.3)
 generics        0.1.3   2022-07-05 [1] CRAN (R 4.3.1)
 ggplot2       * 3.5.0   2024-02-23 [1] CRAN (R 4.3.3)
 globals         0.16.3  2024-03-08 [1] CRAN (R 4.3.3)
 glue            1.7.0   2024-01-09 [1] CRAN (R 4.3.2)
 gtable          0.3.4   2023-08-21 [1] CRAN (R 4.3.1)
 here          * 1.0.1   2020-12-13 [1] CRAN (R 4.3.3)
 highr           0.10    2022-12-22 [1] CRAN (R 4.3.1)
 hms             1.1.3   2023-03-21 [1] CRAN (R 4.3.1)
 htmltools     * 0.5.8.1 2024-04-04 [1] CRAN (R 4.3.2)
 htmlwidgets     1.6.4   2023-12-06 [1] CRAN (R 4.3.2)
 httr            1.4.7   2023-08-15 [1] CRAN (R 4.3.3)
 jquerylib       0.1.4   2021-04-26 [1] CRAN (R 4.3.1)
 jsonlite        1.8.8   2023-12-04 [1] CRAN (R 4.3.2)
 kableExtra    * 1.4.0   2024-01-24 [1] CRAN (R 4.3.3)
 knitr         * 1.46    2024-04-06 [1] CRAN (R 4.3.3)
 later           1.3.2   2023-12-06 [1] CRAN (R 4.3.2)
 lifecycle       1.0.4   2023-11-07 [1] CRAN (R 4.3.2)
 listenv         0.9.1   2024-01-29 [1] CRAN (R 4.3.2)
 lubridate     * 1.9.3   2023-09-27 [1] CRAN (R 4.3.3)
 magrittr        2.0.3   2022-03-30 [1] CRAN (R 4.3.1)
 memoise         2.0.1   2021-11-26 [1] CRAN (R 4.3.1)
 MetBrewer     * 0.2.0   2022-03-21 [1] CRAN (R 4.3.3)
 munsell         0.5.1   2024-04-01 [1] CRAN (R 4.3.3)
 pacman        * 0.5.1   2019-03-11 [1] CRAN (R 4.3.3)
 parallelly      1.37.1  2024-02-29 [1] CRAN (R 4.3.3)
 pillar          1.9.0   2023-03-22 [1] CRAN (R 4.3.1)
 pkgconfig       2.0.3   2019-09-22 [1] CRAN (R 4.3.1)
 printr        * 0.3     2023-03-08 [1] CRAN (R 4.3.3)
 processx        3.8.4   2024-03-16 [1] CRAN (R 4.3.3)
 promises        1.3.0   2024-04-05 [1] CRAN (R 4.3.2)
 ps              1.7.6   2024-01-18 [1] CRAN (R 4.3.2)
 purrr         * 1.0.2   2023-08-10 [1] CRAN (R 4.3.3)
 R6              2.5.1   2021-08-19 [1] CRAN (R 4.3.1)
 ragg            1.3.0   2024-03-13 [1] CRAN (R 4.3.3)
 Rcpp            1.0.12  2024-01-09 [1] CRAN (R 4.3.2)
 readr         * 2.1.5   2024-01-10 [1] CRAN (R 4.3.3)
 repr            1.1.7   2024-03-22 [1] CRAN (R 4.3.3)
 rlang           1.1.3   2024-01-10 [1] CRAN (R 4.3.2)
 rmarkdown       2.26    2024-03-05 [1] CRAN (R 4.3.3)
 rprojroot       2.0.4   2023-11-05 [1] CRAN (R 4.3.2)
 rstudioapi      0.16.0  2024-03-24 [1] CRAN (R 4.3.3)
 rvest         * 1.0.4   2024-02-12 [1] CRAN (R 4.3.2)
 sass            0.4.9   2024-03-15 [1] CRAN (R 4.3.3)
 scales          1.3.0   2023-11-28 [1] CRAN (R 4.3.2)
 sessioninfo     1.2.2   2021-12-06 [1] CRAN (R 4.3.2)
 skimr         * 2.1.5   2022-12-23 [1] CRAN (R 4.3.1)
 stringi         1.8.3   2023-12-11 [1] CRAN (R 4.3.2)
 stringr       * 1.5.1   2023-11-14 [1] CRAN (R 4.3.3)
 svglite         2.1.3   2023-12-08 [1] CRAN (R 4.3.2)
 systemfonts     1.0.6   2024-03-07 [1] CRAN (R 4.3.3)
 textshaping     0.3.7   2023-10-09 [1] CRAN (R 4.3.2)
 tibble        * 3.2.1   2023-03-20 [1] CRAN (R 4.3.3)
 tidyr         * 1.3.1   2024-01-24 [1] CRAN (R 4.3.3)
 tidyselect      1.2.1   2024-03-11 [1] CRAN (R 4.3.3)
 tidyverse     * 2.0.0   2023-02-22 [1] CRAN (R 4.3.3)
 timechange      0.3.0   2024-01-18 [1] CRAN (R 4.3.2)
 tzdb            0.4.0   2023-05-12 [1] CRAN (R 4.3.1)
 utf8            1.2.4   2023-10-22 [1] CRAN (R 4.3.2)
 uuid            1.2-0   2024-01-14 [1] CRAN (R 4.3.2)
 vctrs           0.6.5   2023-12-01 [1] CRAN (R 4.3.2)
 viridisLite     0.4.2   2023-05-02 [1] CRAN (R 4.3.3)
 websocket       1.4.1   2021-08-18 [1] CRAN (R 4.3.3)
 withr           3.0.0   2024-01-16 [1] CRAN (R 4.3.2)
 xaringanExtra * 0.7.0   2022-07-16 [1] CRAN (R 4.3.2)
 xfun            0.43    2024-03-25 [1] CRAN (R 4.3.3)
 xml2            1.3.6   2023-12-04 [1] CRAN (R 4.3.2)
 yaml            2.3.8   2023-12-11 [1] CRAN (R 4.3.2)

 [1] C:/Users/marai/AppData/Local/R/win-library/4.3
 [2] C:/Program Files/R/R-4.3.2/library

────────────────────────────────────────────────────────────────────

Blonigen, Bruce A., and Ronald B. Davies. 2004. “The Effects of Bilateral Tax Treaties on U.S. FDI Activity.” International Tax and Public Finance 11 (5): 601–22. https://doi.org/10.1023/B:ITAX.0000036693.32618.00.

Blonigen, Bruce A., and Jeremy Piger. 2014. “Determinants of Foreign Direct Investment: Determinants of Foreign Direct Investment.” Canadian Journal of Economics/Revue Canadienne d’économique 47 (3): 775–812. https://doi.org/10.1111/caje.12091.

Di Giovanni, Julian. 2005. “What Drives Capital Flows? The Case of Cross-Border M&A Activity and Financial Deepening.” Journal of International Economics 65 (1): 127–49. https://doi.org/10.1016/j.jinteco.2003.11.007.

Tax Analysts. 2001. “Worldwide Tax Treaty Index.” Washington, DC.

Tax treaty data was obtained from Tax Analysts (2001).↩︎
https://investmentpolicy.unctad.org/international-investment-agreements ↩︎
https://investmentpolicy.unctad.org/international-investment-agreements/iia-mapping ↩︎
https://investmentpolicy.unctad.org/uploaded-files/document/Mapping%20Project%20Description%20and%20Methodology.pdf ↩︎
https://www.treaties.tax/en/↩︎
I remove the Text column, because I am not interested in IIAs’ original documentation.↩︎
This was done by observing the parties with the longest names, for those containing additional commas typically have longer names.↩︎
I like the user-friendly and simple to use CSS selector offered by the SelectorGadget Chrome extension.↩︎
For example, /international-investment-agreements/groupings/11/acp-african-caribbean-and-pacific-group-of-states-.↩︎

Comment on this article Share:

Scraping International Investment Agreement Data