Skip to main content

Course Content

This course combines lectures on theory and concepts with significant time practicing data science tools within the R environment. This is not a course on statistics – rather we intend to create a flexible toolbox for ecologists and environmental scientists who seek to better manage and use data. We will explore current methods of data management using Program R – from data acquisition to wrangling and reporting. 

In this course, we will cover the following topics in-depth: 

  • Modern principles of data management 
  • Applied data normalization and wrangling using tidyr and dplyr 
  • Best practices in modern R coding 
  • Memory management 
  • Environments in R: What are they and how to use them with intention 
  • Joins and queries: Working with tidy data using dplyr 
  • Dealing with dates and times with lubridate and hms 
  • Strings: Using regular expressions and stringr to query and mutate strings 
  • Iteration with purrr map functions 

We will provide introductory content on: 

  • Evaluating code performance with lobstr 
  • Building reports using Quarto 
  • Working with big data with arrow and dbplyr 
  • Streamlining your workflow by creating your own R packages 
  • Version control with GitHub using bash and RStudio tools 

 More advanced topics may be covered depending on available time and feedback of participants. 

Course Format

Participants should expect to spend a minimum of 10 hours per week working through the course material and assignments. This minimum estimate assumes comfort working in the R environment. As a half-semester course, those taking the course for credit through George Mason (who are required to submit the weekly assignments and final project) should expect to spend up 16 hours per week, depending on previous experience in R. Note that those requiring a completion certificate must successfully complete (70% or higher) three of six weekly problem sets and a final comprehensive problem set. Each week will include: 

  • 1-2 hours of recorded presentations introducing new theory, concepts, and applications 
  • One or more html documents, web applications, or well-commented demonstration code scripts teaching new tools and applications 
  • At least one assignment, where participants will adapt the demonstration material to complete a novel process and answer a series of conceptual questions 
  • An optional live Q & A session with the instructor 
  • An optional live review session to walk through assignment solutions 
  • Two opportunities for “virtual office hours” with instructor where time blocks can be reserved by individual participants 

Software and Programs

All exercises and some lectures will be performed in R, through the user-friendly interface RStudio. Pre-course work will be suggested for all participants not completely comfortable with the program R and emailed to participants at least 3 weeks prior to the course. All participants should be comfortable working in the R environment by the time the course begins.

Apply for a Training Course

Create a profile and start your application today.