4 Part I: Moose Populations in Newfoundland
Install the dplyr package using install.packages() function. You only need to do this once.
Question 1
Then, with dplyr installed, use the library() function to load dplyr so that its functions are available to use.
Question 2 read.csv()
Next import the MoosePopulation.csv dataset. Name this dataset moosedata. If you use the programmatic way (i.e., command line), your command may look something like this:
Question 3
Take a moment to look through the data you just imported by using View() function. You might notice that some cells are empty or “NA” — these are missing values. When dealing with large biological datasets, it is common to start by removing missing data.
Write a line of code where you use na.omit() to remove rows with missing values. Save your dataset with the cleaned version as moose_clean (as in the example code below).
Question 4
Next, let’s simplify the dataset to only include the columns of interest. Add a line of code where you use the select() function to select the following columns: Ecoregion, Year, Area, Estimated_Moose_Pop.
Remember to save your dataset with the above data as moose_sel.
Question 5
a. What is the oldest observation in the dataset? Add a line where you use the min() function to find the oldest Year and save the result as year_min.
- Use the
max()function to what is the highest ‘Estimated_Moose_Pop’ recorded? Save the result ofmax()asmoose_max.
Question 6
Different ecoregions cover different land areas, so comparing raw population numbers is not very helpful. Standardize your data by calculating moose density for each ecoregion. Using the mutate() function, create a new column called MooseDensity in the moose_sel dataset you made above. Save the result as moosedata2. Remember density is equal to population divided by area.
Question 7
- Now, let’s visualize the data. Using the
plot()function, make a line graph to show the the changes inMooseDensityoverYear. Full marks for including axis labels and figure title (xlab,ylab,main).
plot(moosedata2$Year, moosedata2$MooseDensity,
xlab = "year",
ylab = "Moose per sq km",
main = "Moose density in Newfoundland ecoregions over time")Question 8
The research team were particularly interested in how moose populations have changed over time in the Western Forests Ecoregion.
- Create a new dataset, where you use the
filter()function to only include observations from theWestern_Forestsecoregion. Save the result asmoose_west.
b) Use the plot() function to make a line graph (type = "l") showing how moose density has changed over time in Western_Forests region. Full marks for including axis labels and figure title ( xlab , ylab , main). HINT: Adapt the code given to you in Question 7 above.
Question 9
The research team was interested in trends for all ecoregions for just recent years.
- Using the original, unfiltered dataset you created in Question 6 above, use the
filter()function to filter for the year2020, and save the dataset asmoose_2020. HINT: Adapt the code given to you in Question 8 above.
- The research team considered moose densities above 2.0 moose/km² to be high. Using the dataset you just created,
filter()theMooseDensityto only show ecoregions where moose density is greater than 2.0. Save the dataset asmoose_2020_high. HINT: Adapt the code given to you in Question 8.
- With the dataset you just created, use the
arrange()function to sort theMooseDensitycolumn in descending order. Save the result asmoose_2020_high_byD. HINT: You can follow this template:
arrange(Dataset, desc(ColumnName))
Question 10
Pipes %>% allow you to connect one line of code to the next, so you don’t have to save the dataset under a new name for each step.
Repeat the steps from the previous question using a pipe at the end of each line. Using your cleaned dataset moosedata2 from Question 6 above, filter() for the year 2020, then filter() for MooseDensity above 2.0, then arrange() in desc() order, then finally print() the output. Code showing how to use pipes %>% is below, however you may need to adapt it to the way you have named your dataframes. Save the final result as moosefinal.
moosefinal <- moosedata2 %>%
filter(Year == 2020) %>%
filter(MooseDensity > 2.0) %>%
arrange(desc(MooseDensity)) %>%
print()