Building an Intuitive API Wrapper - US Commodities

Introduction & Motive

Application Programming Interfaces or APIs make it easy to access data from external systems, but repeated calls could prove to be somewhat cumbersome. In some situations, it may save a lot of time if you construct an API wrapper that makes repeated API calls as simple as calling a function, and supplying relevant arguments. I’ve already written an article on basic programming in R that you can find here, this article will build off of it by outlining a method that could be used to build an intuitive API wrapper.

The API we’ll be using in this article contains commodity data, which is hosted by Census.gov, and is listed under “International Trade”. I’ll go ahead and link it right here. Specifically, it returns time series data on international trade, for various commodities. Only yearly data is contained back to the year 2005, but we’re able to access monthly data from 2013 - present.

Navigating to the documentation here, we see that there are various endpoints that we could utilize, depending on what kind of information we want. For now, we’ll go ahead and use the simple Import/Export data by Harmonized System (HS). This endpoint gives us access to a myriad of import/export data. Not only can we query data about general commodities, and their import/export quantities through time, but we can delve even deeper. As an example, we can query the API for roasted decaffinated coffee, or we can get a little more detailed and query it for roasted decaffinated coffee in retail containers weighing less than 2kg. That’s pretty cool!

It’s worth noting that most APIs have a daily, weekly, or monthly call limit, so it would be wise to consult the documentation to get a better understanding on the limitations you may encounter. In the case of this international commodity API there’s a public limit of 500 daily calls. This means if you plan on calling the API fewer than 500 times, an API key is unnecessary. However if you plan on exceeding 500 API calls you can go ahead and create an API key as per the documentation.

API: Request & Response Structure

Before we get started, we have to become familiar with the request and response structure of the pertinent end-points. In this article we’ll mainly be using the general “Harmonized System” import/export endpoints, and not the state, port, or NAICs end points. Although if you wish to use those, you can use a lot of the methods outline in this article for such an application.

Let’s take a look at the salient parameters when making an API call. Following the documentation, we can see all of the input/output arguments pertinent to this end-point here. The only required argument that has to be supplied for data retrieval is time, which is required to be in the YYYY-MM format.We are interested in obtaining metrics of particular commodities over time, so we’ll go ahead and make the HS code for the commodity mandatory as well.

Prior to building the wrapper, let’s take a look at a typical API call using the fromJSON function. This call is for monthly general values of the commodity “09” in the Harmonized System, starting in January of 2018. You can see that the commodity code 09, is noted in the segment of the string as I_COMMODITY=09, and the time is supplied in time=from+2018-01 separated by ampersands &. For a look-up table of commodities by their HS code, you can use this handy reference.

testcall <- fromJSON("https://api.census.gov/data/timeseries/intltrade/imports/hs?get=CTY_NAME,CTY_CODE,I_COMMODITY_LDESC,GEN_VAL_MO&I_COMMODITY=09&time=from+2018-02")
head(testcall) %>% kable("html") %>% kable_styling("striped") %>% scroll_box(width = "100%", height = "10%")
CTY_NAME CTY_CODE I_COMMODITY_LDESC GEN_VAL_MO I_COMMODITY time
OPEC 0001 COFFEE, TEA, MATE AND SPICES 1491493 09 2018-02
EUROPEAN UNION 0003 COFFEE, TEA, MATE AND SPICES 33892339 09 2018-02
PACIFIC RIM COUNTRIES 0014 COFFEE, TEA, MATE AND SPICES 59865480 09 2018-02
CAFTA-DR 0017 COFFEE, TEA, MATE AND SPICES 80370999 09 2018-02
NAFTA 0020 COFFEE, TEA, MATE AND SPICES 72497925 09 2018-02

What gets returned by the call? Well, as we see in the call string, after ?get=, we have requested the country name, country code, commodity description, general monthly value, and commodity code. Why did we include the commodity code? Later on we may wish to call the API for multiple commodies, and not include the description. We also receive the year and month of the monthly values, which we’ll go ahead and add “-01” to, so we can easily coerce it into a date variable later on.

Let’s go ahead and take a look at the class of the object returned by the fromJSON call.

class(testcall)
## [1] "matrix" "array"

That’s not the class we’d like to have. If we’re going to use this data for analytics, let’s go ahead and coerce this object into a data frame. We can do that by wrapping the call in as.data.frame. Let’s go ahead and use the first row as the column names, clearing it out afterwards.

get.imports.simple <- function(){
  
  df <- as.data.frame(fromJSON("https://api.census.gov/data/timeseries/intltrade/imports/hs?get=CTY_NAME,CTY_CODE,I_COMMODITY_LDESC,GEN_VAL_MO&I_COMMODITY=09&time=from+2018-02"), stringsAsFactors = FALSE)
  names(df) <- df[1,]
  df <- df[-1,]
  df$time <- paste0(df$time,"-01")
  return(df)
  
}

head(get.imports.simple()) %>% kable("html") %>% kable_styling("striped") %>% scroll_box(width = "100%", height = "10%")
CTY_NAME CTY_CODE I_COMMODITY_LDESC GEN_VAL_MO I_COMMODITY time
2 OPEC 0001 COFFEE, TEA, MATE AND SPICES 1491493 09 2018-02-01
3 EUROPEAN UNION 0003 COFFEE, TEA, MATE AND SPICES 33892339 09 2018-02-01
4 PACIFIC RIM COUNTRIES 0014 COFFEE, TEA, MATE AND SPICES 59865480 09 2018-02-01
5 CAFTA-DR 0017 COFFEE, TEA, MATE AND SPICES 80370999 09 2018-02-01
6 NAFTA 0020 COFFEE, TEA, MATE AND SPICES 72497925 09 2018-02-01
7 TWENTY LATIN AMERICAN REPUBLICS 0021 COFFEE, TEA, MATE AND SPICES 313937591 09 2018-02-01

Building The Wrapper

Now that we’re familiar with the structure of the returned data, and the call string, we can go ahead and start building our wrapper by adding the arguments we wish to pass to the call string.

Let’s go ahead and start with the required call parameters noted before. The commodity HS code, and the starting date. This endpoint requires that the months be encoded in double digits (i.e 01-12). To avoid any errors that could occur when calling the API with a start date that includes a month earlier than October, we’ll insert a 0 before the month number.

You can see we have declared the arguments com_hs, strt_mnth, and strt_year, for the commodity code, starting month, and starting year respectively.

get.imports <- function(com_hs, strt_mnth, strt_year){
      ##If start month is a single digit quantity, we need to insert a 0 to prevent an error.
      if(str_length(strt_mnth) < 2){strt_mnth <- paste0("0",strt_mnth)}
    callstr <- paste0("https://api.census.gov/data/timeseries/intltrade/imports/hs?get=CTY_NAME,CTY_CODE,I_COMMODITY_LDESC,GEN_VAL_MO&I_COMMODITY=",com_hs,"&time=from+",strt_year,"-",strt_mnth)
  df <- as.data.frame(fromJSON(callstr), stringsAsFactors = FALSE)
  names(df) <- df[1,]
  df <- df[-1,]
  df$time <- paste0(df$time,"-01")
 
  return(df)
  
}

let’s go ahead and test it out.

get.imports("09", "1", "2019") %>% head() %>% kable("html") %>% kable_styling("striped") %>% scroll_box(width = "100%", height = "10%")
CTY_NAME CTY_CODE I_COMMODITY_LDESC GEN_VAL_MO I_COMMODITY time
2 TOTAL FOR ALL COUNTRIES
COFFEE, TEA, MATE AND SPICES 669268480 09 2019-01-01
3 OPEC 0001 COFFEE, TEA, MATE AND SPICES 2001224 09 2019-01-01
4 EUROPEAN UNION 0003 COFFEE, TEA, MATE AND SPICES 30047745 09 2019-01-01
5 PACIFIC RIM COUNTRIES 0014 COFFEE, TEA, MATE AND SPICES 81013213 09 2019-01-01
6 CAFTA-DR 0017 COFFEE, TEA, MATE AND SPICES 37008241 09 2019-01-01
7 NAFTA 0020 COFFEE, TEA, MATE AND SPICES 54163613 09 2019-01-01

Great! We haven’t ran into any errors, and the data correctly reflects our function arguments. We can go ahead and make our wrapper even more robust, by allowing users to retreive a set of commodities if they supply a vector of strings. We can achieve this by allowing com_hs to take in a singular string, or array of strings. We’ll insert our API call into a loop that iteratively calls the API, with the same starting date, cycling through the commodity code array.

get.imports <- function(com_hs, strt_mnth, strt_year){
  if(str_length(strt_mnth) < 2){strt_mnth <- paste0("0",strt_mnth)}
  df.app <- data.frame()
  for(code in com_hs){
    callstr <- paste0("https://api.census.gov/data/timeseries/intltrade/imports/hs?get=CTY_NAME,CTY_CODE,I_COMMODITY_LDESC,GEN_VAL_MO&I_COMMODITY=",code,"&time=from+",strt_year,"-",strt_mnth)
    df <- as.data.frame(fromJSON(callstr), stringsAsFactors = FALSE)
    names(df) <- df[1,]
    df <- df[-1,]
    df$time <- paste0(df$time,"-01")
    df.app <- rbind(df.app, df)
  }
  return(df.app)
  
}

We’ll call it for general commodities (04-09). Additionally, we’ll go ahead filter for the country Mexico, and plot the time-series trend of the total monthly value as a line graph, faceted by the commodity.

comm <- get.imports(c("07", "08", "09", "06", "05", "04"), "01", "2019")
comm %>% mutate(time = as.Date(ymd(time)), GEN_VAL_MO = as.numeric(GEN_VAL_MO)) %>% filter(CTY_NAME == "MEXICO") %>% ggplot(aes(x = time, y = GEN_VAL_MO)) + geom_line() + facet_wrap(~I_COMMODITY_LDESC, nrow = 3, scales = "free_y") + ggtitle("General Value Of Imports From Mexico",subtitle = "Monthly Totals In USD") + ylab("Monthly Value") + xlab("Date")

Awesome! In that short bit of code, we have managed to query the API for relevant data, filter, coerce and visualize!

Simple Error Handling

Let’s try and improve our API wrapper one more time, by implementing some simple error handling. When a user makes a typo, or inserts the arguments in the wrong order without explicitly declaring them, they’ll be greeted with a HTTP 400 error. What is that? Well, a quick google search will tell you it’s a bad request response. Some individuals that may be using this code may not be familiar with what that means. To remedy that, every time we receive a "HTTP 400 error* we’ll let the user know that they should check the arguments they’ve supplied.

Let’s go ahead and look at what a bad request returns. Let’s switch the year and months, as well as inserting “00” for the HS commodity code (there’s no 00 category).

get.imports("00", "2018", "001")
## Error in open.connection(con, "rb"): HTTP error 400.

Using the base R function tryCatch, we can catch the error that could occur if a user doesn’t supply the arguments in order, or supplied incorrect HS codes. In conjunction with tryCatch, we can use grepl on the string of the error to convert the error into something much more intuitive to an end user.

This is accomplished in the following block of code. We have our previous function get.imports, but we’ve wrapped the actual API call in tryCatch. If the API call is successful, a dataframe is returned. If the API call is unsuccessful, we return a much more intuitive error that notifies the end-user of the potential cause.

get.imports <- function(com_hs, strt_mnth, strt_year){
  if(str_length(strt_mnth) < 2){strt_mnth <- paste0("0",strt_mnth)}
  df.app <- data.frame()
  for(code in com_hs){
    callstr <- paste0("https://api.census.gov/data/timeseries/intltrade/imports/hs?get=CTY_NAME,CTY_CODE,I_COMMODITY_LDESC,GEN_VAL_MO&I_COMMODITY=",code,"&time=from+",strt_year,"-",strt_mnth)
    
tryCatch(expr = {
    df <- as.data.frame(fromJSON(callstr), stringAsFactors = FALSE)
}       ,error = function(e){
            if(grepl("400",e["message"])){
              stop("Error: Your arguments may be incorrect, or in the wrong order.")
            }   else if(grepl("fromJSON",e["message"])){
                  stop("Error: This function requires you load jsonlite")
                }   else if(grepl("500",e["message"])){
                      stop("Error: A server-side error has occured.")
                  } else {stop(e["message"])
                    }
            
        }
)    
    
    names(df) <- df[1,]
    df <- df[-1,]
    df$time <- paste0(df$time,"-01")
    df.app <- rbind(df.app, df)
  }
  return(df.app)
  
}

Let’s try an API call with a wrong HS code.

get.imports("00", "2018", "001")
## Error in value[[3L]](cond): Error: Your arguments may be incorrect, or in the wrong order.

Let’s try one without loading the package jsonlite

detach("package:tidyverse", unload = TRUE)
detach("package:jsonlite", unload = TRUE)
get.imports("09", "01", "2019")
## Error in value[[3L]](cond): Error: This function requires you load jsonlite
library(jsonlite)
## 
## Attaching package: 'jsonlite'
## The following object is masked from 'package:purrr':
## 
##     flatten

Export End-Point

Fortunately, the export API end-point utilizes a very similar scheme. Therefore, interchanging the call string and creating the wrapper is a breeze. Here’s the resulting function for the export end-point. Here’s our function that calls the API for export data, it also allows you to query multiple commodity HS codes.

get.exports <- function(com_hs, strt_mnth, strt_year){
  if(str_length(strt_mnth) < 2){strt_mnth <- paste0("0",strt_mnth)}
  df.app <- data.frame()
  for(code in com_hs){
    callstr <- paste0("https://api.census.gov/data/timeseries/intltrade/exports/hs?get=CTY_NAME,CTY_CODE,E_COMMODITY_LDESC,ALL_VAL_MO&E_COMMODITY=",code,"&time=from+",strt_year,"-",strt_mnth)
    
tryCatch(expr = {
    exp.df <- as.data.frame(fromJSON(callstr), stringsAsFactors = FALSE)
}       ,error = function(e){
            if(grepl("400",e["message"])){
              stop("Error: Your arguments may be incorrect, or in the wrong order.")
            }   else if(grepl("fromJSON",e["message"])){
                  stop("Error: This function requires you load jsonlite")
                }   else if(grepl("500",e["message"])){
                      stop("Error: A server-side error has occured.")
                  } else {stop(e["message"])
                    }
            
        }
)    
    names(exp.df) <- exp.df[1,]
    exp.df <- exp.df[-1,]
    exp.df$time <- paste0(exp.df$time,"-01")
    df.app <- rbind(df.app, exp.df)
  }
  return(df.app)
  
}

Let’s call it just to test.

exp.df <- get.exports("09", "01", "2019") 
exp.df %>% head() %>% kable("html") %>% kable_styling("striped") %>% scroll_box(width = "100%", height = "10%")
CTY_NAME CTY_CODE E_COMMODITY_LDESC ALL_VAL_MO E_COMMODITY time
2 TOTAL FOR ALL COUNTRIES
COFFEE, TEA, MATE AND SPICES 89234940 09 2019-01-01
3 OPEC 0001 COFFEE, TEA, MATE AND SPICES 1998779 09 2019-01-01
4 EUROPEAN UNION 0003 COFFEE, TEA, MATE AND SPICES 5047706 09 2019-01-01
5 PACIFIC RIM COUNTRIES 0014 COFFEE, TEA, MATE AND SPICES 19963838 09 2019-01-01
6 CAFTA-DR 0017 COFFEE, TEA, MATE AND SPICES 881756 09 2019-01-01
7 NAFTA 0020 COFFEE, TEA, MATE AND SPICES 56167463 09 2019-01-01
exp.df %>% mutate(time = as.Date(ymd(time)), ALL_VAL_MO = as.numeric(ALL_VAL_MO)) %>% filter(CTY_NAME %in% c("JAPAN", "MEXICO", "KOREA,SOUTH", "EUROPE", "CANADA")) %>% ggplot(aes(x = time, y = ALL_VAL_MO)) + geom_line() + facet_wrap(~CTY_NAME, scale = "free_y") 

Additional work & conclusion

This article was primarily meant to be an example that illustrates the usefulness of building an API wrapper. However, theres a multitude of ways you can improve upon this wrapper. You can allow the user to input only the variables they wish to retreive from the API, allow the user to filter by country, or country code, etc. Regardless of the API you use, this method should help speed things up, and avoid the convolusion of repeated calls and string modifications.

If you liked this article, please feel free to check out the other articles on my website! There’s are several about analytics and programming that could help you in your day to day tasks.