Car parks, ticketing systems, hourly rates, no vacancies. I was able to avoid most of these issues in cities I have visited, by riding the metro, or by taking a cab. The public transportation in Tokyo was so punctual and ingrained into the city’s infrastructure, it made it seem like owning a car would be completely meritless. In most cases it is! Walking around Sumida, and Taito city, I saw strange vertical solutions to parking lots more than once. I remember doing a double take the first time I saw cars stacked on shelf-like apparatuses.
That brings us to the inspiration of this article. I found this gem of an API, on data.gov.sg. It gives you the lot availability per hour for thousands of car parks in Singapore. Navigating to this page you can obtain further characteristics about each car park. The information enclosed is the unique car park number, its address, the x and y coordinates in SVY21, the type of parking system employed in the car park, if it allows short term parking (the only type we’ll be looking at), and if it allows night time parking.
Our objective during this analysis, is to find parking lots with discernable patterns in availability. While doing this, we’ll also try to answer questions such as:
By answering these questions, we hope to outline a possible way to build a powerful tool that allows drivers gain a better sense of availability for car parks near them, or their destination. During our analysis, we’ll make some broad assumptions. We’ll assume most car parks have an influx and outflow of traffic around the bounds of business hours when most people commute to work. Also, it would be rational to assume that the availability throughout the day will be similar on days during the week. Conversely, we’ll assume some car parks stay stationary, or increase during weekends, and evenings, when people are shopping or dining.
Our initial data set CarParkAttrib contains the total lot count, and available lot count for each car park. Here’s a small glimpse at the structure of the car park data.
kable(head(CarParkAttrib), "html") %>% kable_styling("striped") %>% scroll_box(width = "100%", height = "10%")
carpark_number | address | x_coord | y_coord | car_park_type | type_of_parking_system | short_term_parking | free_parking | night_parking | car_park_decks | gantry_height | car_park_basement | LatLong |
---|---|---|---|---|---|---|---|---|---|---|---|---|
ACB | BLK 270/271 ALBERT CENTRE BASEMENT CAR PARK | 30314.79 | 31490.49 | BASEMENT CAR PARK | ELECTRONIC PARKING | WHOLE DAY | NO | YES | 1 | 1.8 | Y | 1.3010571246531804,103.85411804988033 |
ACM | BLK 98A ALJUNIED CRESCENT | 33758.41 | 33695.52 | MULTI-STOREY CAR PARK | ELECTRONIC PARKING | WHOLE DAY | SUN & PH FR 7AM-10.30PM | YES | 5 | 2.1 | N | 1.3209980485169983,103.88506094748588 |
AH1 | BLK 101 JALAN DUSUN | 29257.72 | 34500.36 | SURFACE CAR PARK | ELECTRONIC PARKING | WHOLE DAY | SUN & PH FR 7AM-10.30PM | YES | 0 | 0.0 | N | 1.3282772191512717,103.84461988911718 |
AK19 | BLOCK 253 ANG MO KIO STREET 21 | 28185.44 | 39012.67 | SURFACE CAR PARK | COUPON PARKING | 7AM-7PM | NO | NO | 0 | 0.0 | N | 1.3690847475169592,103.83498485033613 |
AK31 | BLK 302/348 ANG MO KIO ST 31 | 29482.03 | 38684.18 | SURFACE CAR PARK | COUPON PARKING | NO | NO | NO | 0 | 0.0 | N | 1.3661139697165128,103.84663563158024 |
AK52 | BLOCK 513 ANG MO KIO ST 53 | 29889.35 | 39382.81 | SURFACE CAR PARK | COUPON PARKING | WHOLE DAY | NO | YES | 0 | 0.0 | N | 1.3724321490118094,103.85029569737912 |
Some car parks don’t allow for short-term parking. Since our goal is to build a tool for short-term parking, so we will filter out all parking lots that don’t support this type of transaction. Let’s take a look at the count of each car park type.
total.lots <- CarParkAttrib %>% nrow()
trans.types <- CarParkAttrib %>% ggplot(aes(x = short_term_parking, fill = short_term_parking)) + geom_bar() + ggtitle(paste0("Count of Transaction Types | ", total.lots, " Total Car Parks"))
ggplotly(trans.types)
We can see that only 70 parking lots do not support short-term parking. That’s a miniscule amount, considering our sample size of 2126 unique car parks. Let’s try to gain a better understanding of car parks that support short term parking, and their spatial distribution across singapore. To accomplish this, we’ll utilize mapview. This package is great for geo-spatial visualizations, and even supports some interactive elements.
plotmapview <- function(df, strcol = NULL, graph = FALSE, graphobj){
### Grab the dataframe, and split the LatLong column into columns named Lat and Long, convert to numeric.
CarParkAttribsp <- df %>% separate(LatLong, into = c("Lat", "Long"), sep = ",") %>% mutate(Lat = as.numeric(Lat), Long = as.numeric(Long))
## Specify which columns we'll use as geo-spatial coordinates
coordinates(CarParkAttribsp) <- ~ Long + Lat
## Encoding
proj4string(CarParkAttribsp) <- "+init=epsg:4326"
## plot with z column called by function argument strcol
if(graph == FALSE){
mapview(CarParkAttribsp, zcol = strcol, burst = TRUE)
} else{
mapview(CarParkAttribsp, zcol = strcol, burst = FALSE, legend = TRUE, popup = popupGraph(graphobj))
}
}
We can go ahead and call this function after filtering out the car parks that don’t support short term parking, and selecting specified columns of interest in the pop-up option.
CarParkAttrib %<>% filter(short_term_parking != "NO")
plotmapview(df = CarParkAttrib, "short_term_parking")
It’s apparent that most lots support short-term parking during the entire day, with 7:00 AM - 7:00 AM & 7:00 AM - 10:00 PM car parks peppered throughout the island. It’s worth noting that you’re able to filter out each type by clicking the icon in the top left that looks similar to a stack of papers. Once it’s clicked, you’ll see checkboxes for each type of short_term_parking. You can also click each point to obtain more data about the car park
Now that we’re familiar with the structure of our car park data, we turn to the collection of the hourly measurement data. I mentioned in a preceding section, that you’re able to get hourly car park measurements from an API on data.gov.sg. This is the webpage where the API documentation can be accessed.
To collect the data used in this article, I wrote a quick script that loops between dates and hours. I’ve included a copy of the script below, as well as some comments in the code that describe each line.
## Set up an array of hours, incrementing by one, with a fixed length of 2 characters.
hour <- formatC(seq(0, 23, by = 1), width = 2, flag = 0)
day1_i <- rep(hour)
## Set up an array of days
day <- c(15,16,17,18,19,20,21)
## loading jsonlite, so we can retreive the data from api
library(jsonlite)
## set up a blank dataframe, that we'll be repeatedly appending to
df.append <- data.frame()
## loop through each day and hour
for(j in day){
for(i in hour){
day1_i <- fromJSON(paste0("https://api.data.gov.sg/v1/transport/carpark-availability?date_time=2020-02-",j,"T", i, "%3A30%3A00"))
lots <- day1_i[["items"]][["carpark_data"]][[1]] %>% nest(carpark_info, data = c(carpark_info))
lots %<>% unnest(data) %>% unnest(carpark_info) %>% mutate(hour = i)
df.append <- rbind(df.append, lots)
print(head(lots))
}
## print to command-line to monitor
print(paste0("2020-02-0",j))
}
We start by declaring an array for the hours of interest (in a 24 hour format), as well as an array for the dates of interest. After loading jsonlite, and declaring an empty dataframe df.append, we can loop through these hours and dates, calling the API in each iteration. We can unnest the carpark data, and append it to our dataframe previously declared. The rest of the code is optional, I just included it for debugging.
Using the script above, I was able to obtain a dataframe that contains the update date-time, total lots, lot type, and available lots. I’ve included a snapshot of the dataframe below.
kable(head(CarParkRec), "html") %>% kable_styling("striped") %>% scroll_box(width = "100%", height = "10%")
carpark_number | update_datetime | total_lots | lot_type | lots_available | hour | update_date | count |
---|---|---|---|---|---|---|---|
HE12 | 2020-02-15 00:27:46 | 91 | C | 88 | 0 | 2020-02-15 | 168 |
HLM | 2020-02-15 00:28:35 | 583 | C | 484 | 0 | 2020-02-15 | 168 |
RHM | 2020-02-15 00:28:16 | 322 | C | 183 | 0 | 2020-02-15 | 168 |
BM29 | 2020-02-15 00:28:15 | 97 | C | 88 | 0 | 2020-02-15 | 168 |
Q81 | 2020-02-15 00:28:29 | 96 | C | 74 | 0 | 2020-02-15 | 168 |
C20 | 2020-02-15 00:28:32 | 173 | C | 118 | 0 | 2020-02-15 | 168 |
Previously, we had mentioned the fact that we would be able to assume that the availability throughout the day would be similar day-by-day throughout weekdays. Weekends may be similar as well, but we’re assuming that the fluctuation of availability could be modeled by the weekday, business hours, and weekends. What’s my basis for this assumption? Well, it’s my own anecdotal evidence. Let’s go ahead and check to see if our assumptions are correct.
CarParkRec %>% mutate(availability = lots_available/total_lots, update_hour = hour(update_datetime), business_hours = if_else((update_hour >= 8 & update_hour <= 17), "Business", "Off")) %>% filter(carpark_number %in% c("HLM", "RHM", "Q81", "C20", "C6")) %>% select(update_datetime, availability, carpark_number,update_hour, business_hours, total_lots) %>% ggplot(aes(x = update_datetime, y = availability)) + geom_line() + geom_point(aes(color = business_hours)) + facet_wrap(~paste0(carpark_number, "| Total Lots: ", total_lots),ncol = 1)
Awesome, we see that there are periodic patterns repeating each day. To further observe the daily patterns, we can visualize a scatter plot solely of the availability by the hour for weekdays. This will effectively overlay the availability of each car lot, for each hour throughout the weekdays.
weekdays <- CarParkRec %>% mutate(Availability = lots_available/total_lots, update_hour = hour(update_datetime), business_hours = if_else((update_hour >= 8 & update_hour <= 17), "Business", "Off"), weekday = weekdays(update_date)) %>% filter(!(weekday %in% c("Saturday", "Sunday"))) %>% filter(carpark_number %in% c("HLM", "RHM", "Q81", "C20", "C6")) %>% ggplot(aes(x = hour, y = Availability, color = business_hours, name = update_date), alpha = 0.1) + geom_point() + facet_wrap(~paste0(carpark_number, "| Total Lots: ", total_lots),ncol = 2)
ggplotly(weekdays)
Some car lots have wider bands, and thus a higher variability of the availability for each hour. Let’s turn our attention to lot C6 in the upper right-hand corner, as well as lot RHM, in the lower left hand. We can see that lot C6 has a much narrower band, than lot RHM throughout the day. This means that we can reliability expect C6 to have the same availability on weekdays for each hour, irrespective of the day. While lot RHM would be a bit of a gamble (not really though, since it never seems to reach 0.0 availability).
Let’s take a look at the weekend.
weekends <- CarParkRec %>% mutate(Availability = lots_available/total_lots, update_hour = hour(update_datetime), business_hours = if_else((update_hour >= 8 & update_hour <= 17), "Business", "Off"), weekday = weekdays(update_date)) %>% filter((weekday %in% c("Saturday", "Sunday"))) %>% filter(carpark_number %in% c("HLM", "RHM", "Q81", "C20", "C6")) %>% ggplot(aes(x = hour, y = Availability, color = business_hours, name = update_date), alpha = 0.5) + geom_point() + facet_wrap(~carpark_number, ncol = 2)
ggplotly(weekends)
Our sample size for the weekend is quite small, since we only have two weekend dates. One notable feature is that the patterns are very similar to the weekday samples for these five car parks. Nevertheless, for good measure we’ll segregate the two.
Before we start transforming and analyzing, it’s imperative to refer back to the goals previously mentioned in the objective section. We’d like to build a tool that allows drivers to assess which car parks would be the most accessible during certain peak times. To do this, we have to consider a multitude of metrics, such as a car park’s mean availability for each hour, the peak availability and occupancy during business hours, and outside of business hours.
Initially, it was my intention to use LOESS to model the fluctuation of availability for weekdays and weekends, but I eventually settled with a simple average and standard deviation per hour. We’ll start with this, and you can examine the visualization of the mean availability for each car park (during weekdays) below. The ribbons illustrate the extent of two standard deviations for a particular hour.
weekday_avgs <- CarParkRec %>% mutate(Availability = lots_available/total_lots, update_hour = hour(update_datetime), business_hours = if_else((update_hour >= 8 & update_hour <= 17), "Business", "Off"), weekday = weekdays(update_date)) %>% filter(!(weekday %in% c("Saturday", "Sunday"))) %>% filter(carpark_number %in% c("HLM", "RHM", "Q81", "C20", "C6")) %>% group_by(carpark_number, update_hour, business_hours) %>% summarize(mean_availability = mean(Availability), sd_availability = sd(Availability))
weekday <- weekday_avgs %>% ggplot(aes(x = update_hour, y = mean_availability)) + geom_ribbon(aes(ymin = mean_availability - 2*sd_availability, ymax = mean_availability + 2*sd_availability), alpha = 0.2, fill = "black") + geom_line() + geom_point(aes(color = business_hours)) + facet_wrap(~carpark_number, ncol = 2)
ggplotly(weekday)
What exactly does the standard deviation illustrate in this particular instance? Simply put, it’s a good measure of the reliability of the vacancies. If the standard deviation (ribbon) is quite large, it’s a sign that some days there may be a high vacancy, or a low vacancy.
We can go ahead and take a look at the weekend availability per hour. There will be no standard deviation ribbon, since most hours have only 2 measurements.
It seems like the curves are very similar for this group of car parks.
Also, we can develop a method (albeit rudimentary) for summarizing the car park availability per hour throughout weekdays and weekends using this average curve. We can proceed by identifying a particular metric of interest. The lowest availability on weekdays during business hours, and off business hours. We can obtain this using which.min in conjunction with slice. We also can apply this to the weekends. Note that we’re ignoring the off business hours in the morning 12:00 AM - 8:00 AM, concentrating on the evening hours.
weekday.mins <- CarParkRec %>% mutate(Availability = lots_available/total_lots, update_hour = hour(update_datetime), business_hours = if_else((update_hour >= 8 & update_hour <= 17), "Business", "Off"), weekday = weekdays(update_date)) %>% filter(!(weekday %in% c("Saturday", "Sunday"))) %>% filter(carpark_number %in% c("HLM", "RHM", "Q81", "C20", "C6") & update_hour > 8 & update_hour < 23) %>% group_by(carpark_number, update_hour, business_hours) %>% summarize(mean_availability = round(mean(Availability),2)) %>% ungroup() %>% group_by(carpark_number, business_hours) %>% slice(which.min(mean_availability))
weekday <- weekday_avgs %>% mutate(mean_availability = round(mean_availability,2)) %>% ggplot(aes(x = update_hour, y = mean_availability)) + geom_line() + geom_point(aes(color = business_hours)) + geom_vline(data = weekday.mins, aes(xintercept = update_hour, color = business_hours)) + facet_wrap(~carpark_number, ncol = 2)+ ggtitle("Hourly Availability (7 - Day Average)")
ggplotly(weekday)
We can go ahead and do the same to find the hours of maximum occupancy during business hours, and off business hours.
Using these minimum and maximum values for the weekdays and weekends, we can construct a handy little tool that identifies car lots with high - medium - and low availability during business hours. There’s a myriad of methods we can use to identify car parks with low availability during business hours. In this particular instance, I have binned the minimum availability during business hours, partitioning those below 20% as low, below 60% but at or above 20% as Mid, and those at or exceeding 60% as High.
weekday.mins %>% filter(business_hours == "Business") %>% mutate(Availability = if_else( mean_availability < 0.2, "Low", if_else(mean_availability < 0.6, "Mid", "High"))) %>% inner_join(CarParkAttrib, by = "carpark_number") %>% plotmapview(df = ., strcol = "Availability")
Here’s the same visualization for Off Business hours, after additional filtering for hours in the evening.
weekday.mins %>% filter(business_hours == "Off") %>% mutate(Availability = if_else( mean_availability < 0.2, "Low", if_else(mean_availability < 0.6, "Mid", "High"))) %>% inner_join(CarParkAttrib, by = "carpark_number") %>% plotmapview(df = ., strcol = "Availability")
Avail.Exp <- CarParkRec %>% mutate(Availability = lots_available/total_lots, update_hour = hour(update_datetime), business_hours = if_else((update_hour >= 8 & update_hour <= 17), "Business", "Off"), weekday = weekdays(update_date)) %>% filter(!(weekday %in% c("Saturday", "Sunday"))) %>% group_by(carpark_number, update_hour, business_hours) %>% summarize(mean_availability = mean(Availability), sd_availability = sd(Availability))
Avail.Exp %>% write.csv(., file = "WeekdayAverages.csv",row.names = FALSE)
The package mapview is great for visualizing geo-spatial data, but unfortunately it isn’t very interactive. In order to give an end-user the power to dig through the data themselves, we’re going to have to use a tool that’s a bit more interactive. I’ve used Tableau in previous articles since it’s a powerful tool, so I went ahead and utilized it for this purpose as well.
The dashboard is displayed below, through an iframe but you can find a better mobile friendly version of it here
The geospatial scatter plot on the right hand side contains all of the car park locations, color coded by the availability for the hour selected on the filter in the upper right hand side. I have also gone ahead and applied the standard deviation for the availability mentioned previously, as the dependability of the availability binned at “High”, “Somewhat” and “Low” choosing arbitrary binning intervals. A user will be able to sort by lots with free parking, lots that support short term parking, and night time parking. Furthermore, as the user hovers over (or clicks in mobile) a car park location on the map, they’re able to see the average fluctuation on a bar chart on the lower right hand side.
We’ve managed to build a tool that helps drivers make informed decisions about which car park to use across Singapore. Our process was extremely simple, and could be expanded upon for additional robustness.
In summary we:
Thanks for taking the time to read my article!