4 Part I: Moose Populations in Newfoundland
Question 1
Install the dplyr package using install.packages() function. You only need to do this once.
Then, use the library() function to load dplyr so that its functions are available to use.
Question 2 read.csv()
Next import the MoosePopulation.csv dataset. You may call your dataset Moosedata or give it a different name. If you use the programmatic way (i.e., command line), your command may look something like this:
Question 3
Take a moment to look through the data you just imported by using View() function. You might notice that some cells are empty or “NA” — these are missing values. When dealing with large biological datasets, it is common to start by removing missing data.
Write a line of code where you use na.omit() to remove rows with missing values. Remember to save your dataset with the cleaned version with a new name (or you can overwrite the old one, as in the example code below).
Question 4
Next, let’s simplify the dataset to only include the columns of interest. Add a line of code where you use the select() function to select the following columns: Ecoregion, Year, Area, Estimated_Moose_Pop.
Remember to save your dataset with the only the above data with a new name (or you can overwrite the old one, as in the example code below).
Question 5
a. What is the oldest observation in the dataset? Add a line where you use the min() function to find the oldest Year. Then write your answer as a comment #.
- Use the
max()function to what is the highest ‘Estimated_Moose_Pop’ was recorded? Then write your answer as a comment#. Is this number for a particular ecoregion or for all of Newfoundland?
Question 6
Different ecoregions cover different land areas, so comparing raw population numbers is not very helpful. Standardize your data by calculating moose density for each ecoregion. Using the mutate() function, create a new column called MooseDensity. Remember density is equal to population divided by area.
Question 7
- Now, let’s visualize the data. Using the
plot()function, make a line graph to show the the changes inMooseDensityoverYear. Full marks for including axis labels and figure title (xlab,ylab,main).
plot(Moosedata$Year, Moosedata$MooseDensity,
xlab = "year",
ylab = "Moose per sq km",
main = "Moose density in Newfoundland ecoregions over time")Question 8
The research team were particularly interested in how moose populations have changed over time in the Western Forests Ecoregion.
- Create a new dataset, where you use the
filter()function to only include observations from theWestern_Forestsecoregion.
b) Use the plot() function to make a line graph (type = "l") showing how moose density has changed over time in Western_Forests region. Full marks for including axis labels and figure title ( xlab , ylab , main). HINT: Adapt the code given to you in Question 7 above.
Question 9
The research team was interested in trends for all ecoregions for just recent years.
- Using the original, unfiltered dataset you created in Question 6 above, use the
filter()function to filter for the year2020, and save the dataset under a new name (e.g., MooseData_2020). HINT: Adapt the code given to you in Question 8 above.
- The research team considered moose densities above 2.0 moose/km² to be high. Using the dataset you just created,
filter()theMooseDensityto only show ecoregions where moose density is greater than 2.0. Save the dataset under a new name (e.g., MooseData_2020_b). HINT: Adapt the code given to you in Question 8.
- With the dataset you just created, use the
arrange()function to sort theMooseDensitycolumn in descending order. HINT: You can follow this template:
arrange(Dataset, desc(ColumnName))
Question 10
Pipes %>% allow you to connect one line of code to the next, so you don’t have to save the dataset under a new name for each step.
Repeat the steps from the previous question using a pipe at the end of each line. Using your inital dataset, filter() for the year 2020, then filter() for MooseDensity above 2.0, then arrange() in desc() order, then finally print() the output. Code showing how to use pipes %>% is below, however you may need to adapt it to the way you have named your dataframes.
MooseData_final <- Moosedata %>%
filter(Year == 2020) %>%
filter(MooseDensity > 2.0) %>%
arrange(desc(MooseDensity)) %>%
print()