Introduction to Data Science in R

Content overview

R is a powerful, open-source programming language for data science. It is used widely in both academia and industry for data analysis, visualization, and modelling. Many cutting-edge statistical and machine learning approaches are first released in R before being adapted to other software.

In this workshop, we will introduce beginner R users to the fundamentals of R programming and set you up for a deeper understanding of how R works with your data. We will then introduce a series of R packages that make data manipulation and transformation easy and accessible to users of all skill levels.

We will use examples and exercises relevant to data science throughout the course and provide explanations of all solutions. By the end of the one-day workshop, you will be comfortable wrangling your own data in R, and you’ll know how to find resources for more advanced tasks.

Prerequisites: Basic understanding of data science fundamentals, such as descriptive statistics and linear regression, is recommended but not required. We assume no or minimal prior experience using R.

Topics

Part 1: Getting comfortable in R
  • R and RStudio
  • Best practices in data science
  • Object types
  • Data types
  • Getting and setting values
  • Subsetting using vectors
  • Must know base R commands
Part 2: Easier data manipulation and transformation
  • Data manipulation with the dplyr package
  • Data reshaping with the tidyr package
  • Date-time handling with the lubridate package
  • Chaining commands with piping

Format

Fully virtual teaching with interactive examples and exercises. Question periods and breaks will be given during the workshop.

Workshop material

Slides, exercises, and solutions will be made available to all participants.