4 Part I: Moose Populations in Newfoundland

Install the dplyr package using install.packages() function. You only need to do this once.

Question 1
Then, with dplyr installed, use the library() function to load dplyr so that its functions are available to use.

Question 2 read.csv()
Next import the MoosePopulation.csv dataset. Name this dataset moosedata. If you use the programmatic way (i.e., command line), your command may look something like this:

moosedata <- read.csv("MoosePopulation.csv")

Question 3
Take a moment to look through the data you just imported by using View() function. You might notice that some cells are empty or “NA” — these are missing values. When dealing with large biological datasets, it is common to start by removing missing data.

Write a line of code where you use na.omit() to remove rows with missing values. Save your dataset with the cleaned version as moose_clean (as in the example code below).

moose_clean <- na.omit(moosedata)

Question 4
Next, let’s simplify the dataset to only include the columns of interest. Add a line of code where you use the select() function to select the following columns: Ecoregion, Year, Area, Estimated_Moose_Pop.

Remember to save your dataset with the above data as moose_sel.

moose_sel <- select(moose_clean, Ecoregion, Year, Area, Estimated_Moose_Pop)

Question 5
a. What is the oldest observation in the dataset? Add a line where you use the min() function to find the oldest Year and save the result as year_min.

  1. Use the max() function to what is the highest ‘Estimated_Moose_Pop’ recorded? Save the result of max() as moose_max.

Question 6
Different ecoregions cover different land areas, so comparing raw population numbers is not very helpful. Standardize your data by calculating moose density for each ecoregion. Using the mutate() function, create a new column called MooseDensity in the moose_sel dataset you made above. Save the result as moosedata2. Remember density is equal to population divided by area.

moosedata2 <- mutate(moose_sel, MooseDensity = Estimated_Moose_Pop / Area)

Question 7

  1. Now, let’s visualize the data. Using the plot() function, make a line graph to show the the changes in MooseDensity over Year. Full marks for including axis labels and figure title ( xlab , ylab , main).
plot(moosedata2$Year, moosedata2$MooseDensity, 
     xlab = "year", 
     ylab = "Moose per sq km", 
     main = "Moose density in Newfoundland ecoregions over time")

Question 8
The research team were particularly interested in how moose populations have changed over time in the Western Forests Ecoregion.

  1. Create a new dataset, where you use the filter() function to only include observations from the Western_Forests ecoregion. Save the result as moose_west.
moose_west <- filter(moosedata2, Ecoregion == "Western_Forests")


b) Use the plot() function to make a line graph (type = "l") showing how moose density has changed over time in Western_Forests region. Full marks for including axis labels and figure title ( xlab , ylab , main). HINT: Adapt the code given to you in Question 7 above.

Question 9
The research team was interested in trends for all ecoregions for just recent years.

  1. Using the original, unfiltered dataset you created in Question 6 above, use the filter() function to filter for the year 2020, and save the dataset as moose_2020. HINT: Adapt the code given to you in Question 8 above.
  1. The research team considered moose densities above 2.0 moose/km² to be high. Using the dataset you just created, filter() the MooseDensity to only show ecoregions where moose density is greater than 2.0. Save the dataset as moose_2020_high. HINT: Adapt the code given to you in Question 8.
  1. With the dataset you just created, use the arrange() function to sort the MooseDensity column in descending order. Save the result as moose_2020_high_byD. HINT: You can follow this template:

arrange(Dataset, desc(ColumnName))

Question 10
Pipes %>% allow you to connect one line of code to the next, so you don’t have to save the dataset under a new name for each step.
Repeat the steps from the previous question using a pipe at the end of each line. Using your cleaned dataset moosedata2 from Question 6 above, filter() for the year 2020, then filter() for MooseDensity above 2.0, then arrange() in desc() order, then finally print() the output. Code showing how to use pipes %>% is below, however you may need to adapt it to the way you have named your dataframes. Save the final result as moosefinal.

moosefinal <- moosedata2 %>%
  filter(Year == 2020) %>%
  filter(MooseDensity > 2.0) %>%
  arrange(desc(MooseDensity)) %>%
  print()