Introduction to Data Science in R

Content overview

R is a powerful, open-source programming language for data science. It is used widely in both academia and industry for data analysis, visualization, and modelling. Many cutting-edge statistical and machine learning approaches are first released in R before being adapted to other software.

In this workshop, we will introduce beginner R users to the fundamentals of R programming and set you up for a deeper understanding of how R works with your data. We will then introduce a series of R packages that make data manipulation and transformation easy and accessible to users of all skill levels.

We will use examples and exercises relevant to data science throughout the course and provide explanations of all solutions. By the end of the one-day workshop, you will be comfortable wrangling your own data in R, and you’ll know how to find resources for more advanced tasks.

Prerequisites: Basic understanding of data science fundamentals, such as descriptive statistics and linear regression, is recommended but not required. We assume no or minimal prior experience using R.

Topics

Part 1: Getting comfortable in R

R and RStudio
Best practices in data science
Object types
Data types
Getting and setting values
Subsetting using vectors
Must know base R commands

Part 2: Easier data manipulation and transformation

Data manipulation with the dplyr package
Data reshaping with the tidyr package
Date-time handling with the lubridate package
Chaining commands with piping

Format

Fully virtual teaching with interactive examples and exercises. Question periods and breaks will be given during the workshop.

Workshop material

Slides, exercises, and solutions will be made available to all participants.

Introduction to Data Science in R

Content overview

Topics

Format

Workshop material

Other Past Workshops