forked from dataquestio/solutions
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Mission571Solutions.Rmd
156 lines (147 loc) · 6.19 KB
/
Mission571Solutions.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
---
title: 'Guided Project: New York Solar Resource Data'
author: "Dataquest"
date: "11/26/2020"
output: html_document
---
# Introduction
- Title: Analyzing New York solar data.
- Using APIs gives us access to an incredible amount of data only available online. In this exercise, we want to extract New York City solar data. Such data can, for example, allow us to determine on average the most productive periods of the year for solar panel deployment.
# Finding the Suitable Endpoint and Parameters to Query the API
```{r}
# Storing my api key in a variable
the_key = "" #TODO Store your API key here
# Identifying the API URL
url <- "https://developer.nrel.gov/api/solar/solar_resource/v1.json"
# Specifying the necessary parameters to request the New York City solar data
parameters_list <- list(api_key = the_key, lat = 41, lon = -75)
```
# Extracting the New York Solar Resource Data
```{r}
# Loading the `httr` package
library(httr)
# Using the `GET()` function to request the data from the API with `url` and `parameters_list`
response <- GET(url, query = parameters_list)
# Tracking errors
## Displaying the status code with the `status_code()` function
status <- status_code(response)
status
## Displaying the API response format
response_type <- http_type(response)
response_type
# Extracting the API response content as text
content <- content(response, "text")
# Displaying this content to check how it looks visually.
print(content)
```
# Parsing the JSON into R Object
```{r}
# Parsing the `json_text` to a R object using the `jsonlite::fromJSON()` function
json_lists <- jsonlite::fromJSON(content)
# Displaying the structure of the R object using the `str()` function
str(json_lists)
```
# How to Create a Datarame from a Complex List
# Building Datarame from a Complex List
```{r}
# Extracting the outputs data
outputs_list <- json_lists$outputs
# Extracting the monthly vector (`monthly`) from the (`avg_dni`) list in the outputs data
avg_dni <- outputs_list$avg_dni$monthly
# Extracting the monthly vector (`monthly`) from the (`avg_ghi`) list in the outputs data
avg_ghi <- outputs_list$avg_ghi$monthly
# Extracting the monthly vector (`monthly`) from the (`avg_lat_tilt`) list in the outputs data
avg_lat_tilt <- outputs_list$avg_lat_tilt$monthly
# Combining the monthly vectors into a dataframe using the `tibble::tibble()` function
## Adding the `month` column containing month abbreviations: `Jan`, `Fev`,...,`Dec`
dataframe <- tibble::tibble("month" = month.abb,
"avg_dni" = avg_dni,
"avg_ghi" = avg_ghi,
"avg_lat_tilt" = avg_lat_tilt)
# Displaying the dataframe
dataframe
```
- (Instruction 4's answer)
We can see that all the columns are still lists containing one item. For future use of this dataframe, it would likely be necessary to convert these columns to numeric data type.
# Extracting Datarame from a Complex List:
```{r}
# Extracting the outputs list
outputs_list <- json_lists$outputs
# Simplifying the outputs list
simplified_outputs_list <- unlist(outputs_list)
# Restructuring the simplified list into a matrix of 13 rows (the annual value and 12 months values)
data_matrix <- matrix(data = simplified_outputs_list, nrow = 13)
# Removing the annual values from the data matrix
data_matrix <- data_matrix[-1, ]
# Converting the matrix into a dataframe using the `as.data.frame()` function
another_dataframe <- as.data.frame(data_matrix)
# Displaying the dataframe
another_dataframe
```
- (Instruction 6's answer)
We can see that all the columns are numeric. However, we haven't appended the `month` column yet.
# Putting all together
```{r}
library(httr)
library(dplyr)
the_key = "" #TODO Store your API key here
# Creating the custom `nrel_api_json_get_df()` function inspiring from what we did in the previous missions
## The function has two parameters
### The `endpoint` parameter represents the endpoint we need
### The `queries` parameter represents the list of API request parameters.
nrel_api_json_get_df <- function(endpoint, queries = list()) {
## Preparing the URL
url <- modify_url("https://developer.nrel.gov", path = endpoint)
## Querying the API
response <- GET(url, query = queries)
## Tracking errors
if ( http_error(response) ){
print(status_code(response))
print(http_status(response))
stop("Something went wrong.", call. = FALSE)
}
if (http_type(response) != "application/json") {
stop("API did not return json", call. = FALSE)
}
## Extracting content
json_text <- content(response, "text")
## Converting content into Dataframe
table_lst <- jsonlite::fromJSON(json_text)
dataframe <- tibble::tibble("month" = month.abb,
"avg_dni" = as.numeric(table_lst$outputs$avg_dni$monthly),
"avg_ghi" = as.numeric(table_lst$outputs$avg_ghi$monthly),
"avg_lat_tilt" = as.numeric(table_lst$outputs$avg_lat_tilt$monthly))
## Returning the dataframe
dataframe
}
# Using the custom `nrel_api_json_get_df()` function to extract the solar resource as a dataframe
## Providing the `"api/solar/solar_resource/v1.json"` as the `endpoint` parameter
## Providing the `parameters_list` variable as `queries` parameter
solar_resource_df <- nrel_api_json_get_df("api/solar/solar_resource/v1.json", parameters_list)
# Printing the output dataframe
solar_resource_df
```
# Visualizing New York City Solar Resource Data
```{r}
# Loading the `ggplot2` and `dplyr` packages
library(ggplot2)
library(dplyr)
# Using the `ggplot()` function to plot the `avg_dni` value for each month
ggplot(data = solar_resource_df,
aes(x = month, y = avg_dni, group = 1)) +
geom_line() +
geom_point() +
theme_bw()
# Converting the `month` column into factor using the following command
solar_resource_df <- solar_resource_df %>%
mutate(month = factor(month, levels = month.abb))
# Replotting the `avg_dni` value for each month
ggplot(data = solar_resource_df,
aes(x = month, y = avg_dni, group = 1)) +
geom_line() +
geom_point() +
theme_bw()
```
- (Instruction 5's answer)
The first plot x-axis is ordered alphabetically, while the second is ordered chronologically from January to December.
This operation allows ordering the labels in the plot as we wish.