Why can't I breathe?
Background
For the past 3 weeks I’ve found it difficult to breathe and felt like for the first time in my life my asthma was not under control. I couldn’t stop coughing, my chest constantly felt tight, and sometimes even talking was totally exhausting.
As someone who’s lived in many cities including Baltimore, Syracuse, and Sydney, I know that allergies and asthma can change with different climates and geographic locations. My most recent move has brought me to Houston, Texas.
Living in the states most of my life, I was prepared for my allergies to be bad in the fall. So I was pleasantly surprised when I hardly noticed any allergy symptoms during my first fall in Houston. I did end up getting Covid last October, but even then I didn’t experience any trouble breathing.
So what has triggered my asthma these past few weeks? In this blog post, I investigate whether it could be related to air quality and/or allergies.
Air Quality in Houston
I’ll start off with exploring air quality because I happened to listen to a podcast the other week on Science Vs. called “A Mystery in the Air”. This episode tells the story of Ella Adoo-Kissi-Debrah who tragically died at only 9 years old. Ella was a healthy girl until she started experiencing severe asthma triggered by poor air quality that ultimately ended up leading to her death. Ella was the first person in the UK to have air pollution listed as a cause of death. I recommend listening to the podcast to hear Ella’s story or checking out this BBC article for more information.
This podcast made me think more seriously about air quality - something I’ve always taken for granted. Houston is a major city… with a lot of traffic! I’ve only lived here for about 6 months, but I live in the heart of the city. I’ve never bothered to check air quality before, but after listening to the podcast and hearing Ella’s story and knowing how much my asthma has been acting up lately, I was curious to learn more about the air quality around me.
So let’s look at some data! I’ll get my data from Open Weather API. This is free, but you need to sign up in order to get an API key.
Check out my previous blog on “My First API: Which NFL games should I watch?” for a more thorough walk through of using APIs in R.
Calling the Open Weather API
The first thing to do is call the API to get the historic data. To do this, in the first code chunk below, I created a function to get the API data. I turned this into a function because I’ll end up recycling this code again later.
This function makes 2 GET requests:
- The first GET request gets the longitude and latitude of the city passed in the function.
- The second GET request uses the longitude and latitude data from the first GET request along with start and end dates to pull the historic data.
The function returns the historic data.
# loading libraries
library(tidyverse)
library(httr)
library(glue)
library(gt)
library(lubridate)
library(ggeasy)
library(ggrepel)
library(RColorBrewer)
library(here)
setwd(here("content/blog/air_quality/"))
source("api_key.R") # this source code contains my API key
city_function <- function(api_key, my_city, my_state, my_country, start_time, end_time) {
# parameters
api_key = api_key # taken from the source code
city = my_city
state = my_state
country = my_country
start = start_time
end = end_time
# GET request #1
res_lat_long = GET(glue('http://api.openweathermap.org/geo/1.0/direct?q={city},{state},{country}&appid={api_key}'))
lat_long <- content(res_lat_long, as="text")
lat_long <- jsonlite::fromJSON(lat_long, flatten=TRUE)
lat = lat_long$lat
lon = lat_long$lon
# GET request #2
res_historic_data = GET(glue('http://api.openweathermap.org/data/2.5/air_pollution/history?lat={lat}&lon={lon}&start={start}&end={end}&appid={api_key}'))
historic_data <- content(res_historic_data, as="text")
historic_data <- jsonlite::fromJSON(historic_data, flatten=TRUE)
return(historic_data$list)
}
Getting Houston Data
Now that I have my function ready to go, let’s get data from Houston between August 12, 2022 and February 12, 2023.
-
Looking at the API documentation, I noticed the start and end times were in a weird format. For example it says:
Start date (unix time, UTC time zone), e.g. start=1606488670
. -
I had never heard of unix time before, so naturally I Googled it. Turns out, unix time is a way to track time in seconds and the count starts at the Unix Epoch on January 1st, 1970 at UTC. Fortunately, I also found this website to easily convert dates to unix time.
- August 12, 2022 = 1660280400
- February 19, 2023 = 1676786400
houston <- city_function(api_key, my_city = "Houston", my_state = "TX", my_country = "US", start_time = 1660280400, end_time = 1676786400)
houston %>%
head(10) %>%
gt()
dt | main.aqi | components.co | components.no | components.no2 | components.o3 | components.so2 | components.pm2_5 | components.pm10 | components.nh3 |
---|---|---|---|---|---|---|---|---|---|
1660280400 | 1 | 323.77 | 0.14 | 26.05 | 15.38 | 5.13 | 5.72 | 6.52 | 0.40 |
1660284000 | 1 | 283.72 | 0.01 | 18.68 | 25.75 | 4.95 | 5.04 | 5.61 | 0.22 |
1660287600 | 1 | 250.34 | 0.00 | 14.05 | 33.62 | 5.90 | 4.63 | 5.12 | 0.12 |
1660291200 | 1 | 250.34 | 0.00 | 15.08 | 29.68 | 6.85 | 5.11 | 5.69 | 0.10 |
1660294800 | 1 | 253.68 | 0.01 | 16.62 | 25.39 | 6.79 | 5.70 | 6.35 | 0.11 |
1660298400 | 1 | 257.02 | 0.02 | 17.65 | 21.28 | 5.84 | 6.27 | 6.96 | 0.15 |
1660302000 | 1 | 267.03 | 0.08 | 19.36 | 16.09 | 4.89 | 6.86 | 7.66 | 0.23 |
1660305600 | 1 | 307.08 | 1.19 | 26.39 | 6.44 | 4.23 | 7.97 | 8.98 | 0.49 |
1660309200 | 1 | 377.18 | 9.95 | 29.47 | 3.26 | 3.93 | 9.39 | 10.65 | 0.80 |
1660312800 | 1 | 407.22 | 14.53 | 29.47 | 12.16 | 4.23 | 9.87 | 11.25 | 0.82 |
Understanding the Data
- The table above contains the first 10 rows of the dataset.
- Every row contains data for 1 hour for every day. The first row (1660280400) is for August 12th 2022 12am (Central Time). The second row (1672556400) is for August 12th 2022 1am and so on.
- There’s a lot of data here, but for the purposes of this exploratory analysis and blog post, we’ll focus on the
main.aqi
variable, which is the Air Quality Index (AQI) with the following possible values: 1, 2, 3, 4, 5, where 1 = Good, 2 = Fair, 3 = Moderate, 4 = Poor, 5 = Very Poor.
Cleaning the data
In this function, I clean the data in a few ways:
- I convert unix time to a date format using the
lubridate
package. - I group the data by month and day and summarize the results to get an average api score per day.
- Finally, I create a new
date
column.
clean_data_function <- function(my_data) {
clean_data <- my_data %>%
as_tibble() %>%
mutate(date = as_datetime(dt, tz="America/Chicago"),
year = lubridate::year(date),
month = lubridate::month(date),
day = lubridate::day(date)) %>%
group_by(year, month, day) %>%
summarise_at(vars(main.aqi:components.pm10), mean) %>%
mutate(across(main.aqi:components.pm10, round, 2),
date = make_date(year = year, month = month, day = day)) %>%
relocate(date, .before = main.aqi) %>%
return(clean_data)
}
houston_clean <- clean_data_function(houston)
houston_clean %>%
head(10) %>%
gt()
day | date | main.aqi | components.co | components.no | components.no2 | components.o3 | components.so2 | components.pm2_5 | components.pm10 |
---|---|---|---|---|---|---|---|---|---|
2022 - 8 | |||||||||
12 | 2022-08-12 | 1.62 | 270.23 | 1.65 | 18.04 | 68.42 | 7.30 | 7.85 | 8.54 |
13 | 2022-08-13 | 1.25 | 217.93 | 0.32 | 12.85 | 45.19 | 4.18 | 5.44 | 5.92 |
14 | 2022-08-14 | 1.00 | 202.22 | 0.85 | 10.69 | 22.37 | 3.79 | 2.83 | 3.74 |
15 | 2022-08-15 | 1.42 | 229.69 | 4.63 | 12.28 | 42.58 | 4.26 | 5.86 | 10.64 |
16 | 2022-08-16 | 1.17 | 254.23 | 6.69 | 10.91 | 39.74 | 2.93 | 4.78 | 8.38 |
17 | 2022-08-17 | 1.29 | 241.44 | 5.01 | 10.88 | 40.15 | 4.14 | 5.42 | 8.86 |
18 | 2022-08-18 | 1.00 | 218.49 | 0.52 | 13.75 | 30.06 | 4.45 | 3.79 | 5.89 |
19 | 2022-08-19 | 1.08 | 288.31 | 8.89 | 14.86 | 19.55 | 4.76 | 5.48 | 6.65 |
20 | 2022-08-20 | 1.00 | 252.50 | 5.22 | 11.11 | 19.83 | 3.19 | 4.26 | 5.59 |
21 | 2022-08-21 | 1.42 | 223.15 | 0.35 | 9.36 | 26.55 | 2.06 | 8.89 | 10.48 |
Graphing the Results
An AQI score of 2 is “fair” and it’s recommended that “Unusually sensitive individuals should consider limiting prolonged outdoor exertion”, so I’ve added a black dashed line to the graph where the AQI score equals 2.
ggplot(houston_clean, aes(date, main.aqi, color = "#D95F02")) +
geom_line() +
labs(title = "Houston AQI Aug 2022 - Feb 2023", x = "Date", y = "AQI") +
geom_hline(yintercept = 2, color = "black", linetype = 2) +
theme_bw() +
easy_text_size(15) +
easy_remove_legend()
Interpreting the Results
- It doesn’t look like there are any noticeable increases in the AQI scores in the past month, so it is unlikely that air quality explains my recent asthma problems.
- However, it is interesting (and perhaps a little bit concerning) to note that there are some spikes in the data where air quality exceeds a score of 2, especially last fall (September - November).
- So I could stop here, but now I’m curious: how does Houston’s AQI compare to other cities that I’ve lived in?… Let’s find out!
Air Quality Over Time in Cities I’ve Lived
Although I originally hoped to explore the data over the past 30 years (since I’m 30), I discovered that this API only has historical data dating back to November 27th 2020. So, that’s what I’ll use!
My goal here is to graph average API scores in the cities where I’ve lived (Baltimore, Syracuse, Sydney, and Houston) since November 2020. Because this is a blog post, I’ll spare the details of the code and simply show you the data and the results (but I use the same functions as above).
City Data: Baltimore, Houston, Sydney, and Syracuse
- First, I made separate API calls to get data for Baltimore, Houston, Sydney and Syracuse.
- Then, I combined all the data into one data set called
city_data
. - Finally, I summarized the data to get an average API score for every month in every city.
- Here are the first 10 rows:
city_data %>%
head(10) %>%
gt()
city | month | year | mean_aqi | date |
---|---|---|---|---|
baltimore | 11 | 2020 | 1.69 | 2020-11-01 |
houston | 11 | 2020 | 1.15 | 2020-11-01 |
sydney | 11 | 2020 | 1.32 | 2020-11-01 |
syracuse | 11 | 2020 | 1.08 | 2020-11-01 |
baltimore | 12 | 2020 | 1.53 | 2020-12-01 |
houston | 12 | 2020 | 1.68 | 2020-12-01 |
sydney | 12 | 2020 | 1.21 | 2020-12-01 |
syracuse | 12 | 2020 | 1.21 | 2020-12-01 |
baltimore | 1 | 2021 | 1.76 | 2021-01-01 |
houston | 1 | 2021 | 1.61 | 2021-01-01 |
Graphing the Results
- In general, it looks like AQI scores have been increasing over the past couple of years, with the most noticeable increases in Houston and Syracuse.
- Remember, higher AQI means poorer air quality, so this is not a good thing!
- Out of these cities, it looks like Sydney has done the best job at maintaining a relatively low and stable AQI score.
- It would be really interesting to see these results over the past 30 years… so if anyone knows of another open source API with air quality scores, please let me know!
Allergies in Houston
Unfortunately, I couldn’t find an open source API that has allergy data. But, I was able to find some useful websites: Pollen.com and asthmaforecast.com.
Images are screenshots taken directly from the respective websites.
Allergies
- I immediately discovered that we are approaching peak allergy season (at least for trees) in Houston.
- Reading more on the website, I learned that the most significant allergens are: ash-leaf maple, green ash, red mulberry, and white ash.
- This is unfortunate for me because according to my allergy scratch test in 2016, I am very allergic to maple and ash and it looks like my allergies may get even worse in March 😫.
Asthma
- After learning that winter is a peak season for allergies in Houston, I was not all that surprised to see that asthma levels are also on a rise this time of year.
So, why can’t I breathe?
It seems clear that my difficulty breathing is because my allergies are bad this time of year in Houston. Even though I’ve never experienced bad allergies in winter, Houston is a completely different climate compared to other cities I’ve lived. For example, as we can see from the allergy map, tree allergy is low this time of year in Baltimore, where I grew up.
While it seems unlikely that air quality contributed to my difficulty breathing, I still learned a lot from the data!
Takeaways
Here are a few key takeaways I’d like to share:
- I should be prepared for bad allergies in the winter, at least while I’m living in Houston.
- I learned that the air quality in Houston can get bad, which could trigger asthma symptoms. So, just as a precaution I try to always have my rescue inhaler with me wherever I go.
- In general, I’m now more conscious of air quality. For example, I love running, but I’ve recently downloaded the AirVisual app and I check the air quality before setting off for a run (especially if it’s a longer run). If the AQI is anything other than “Good”, I’ll either do a different indoor workout or I’ll postpone my run until the AQI has improved.
- Seeing the increase in AQI over the past few years in some of the cities where I’ve lived, I wondered how climate change could be impacting our air quality. Here are two articles from the CDC and EPA that discuss how climate change negatively affects our air quality:
As always, thanks for reading!