Having spent a while looking at speed in PowerPivot, I wanted to start our work in R in the same place. This will be a nice (possibly boring) introduction to using R while also showing how work in Excel can be replicated in R (maybe even how it can be easier). Today will cover how to bring an excel sheet into RStudio.
Let’s go forward on the basis we have the same sets of data we were working in previously, which were:
- Player Info
- Season Info
First lets quickly explain the RStudio interface which is split into 4 sections:
- Top Left is where you will write your script into
- Bottom Left is where the output of your script will be shown
- Top Right is where any files, data frames, lists, functions you bring into RStudio will be shown
- Bottom Right has a bit more to it then
- Packages will show installed packages with a tick in the box next to it
- Plots will show any graphs or charts you have produced recently
- Files will show where you have set your working directory to
I briefly mentioned the idea of packages previously and today we will look at loading and using a few of these. In order to utilise packages we have to install them once but load them every time we want to use them. As we are going to be bringing in an excel sheet into R today, lets go through installing and loading the “Readxl” package.
When we are installing packages the function we want to use is: “install.packages”. For us to install readxl, it would look like the following:
To run the line you can either select run in the top right or press CTRL+ENTER (CMD+enter for Macs). Any time you are installing a package into RStudio for the first time, it must have the quotation marks around.
As packages can use functions which rely on other packages, we must have those packages installed as well. We can ensure this happens when we are installing a package by adding another piece to the above code:
- install.packages(“readxl”, dependencies = TRUE)
Next we must load the package with the following:
- NOTE no quotation marks as we have the package installed
Now we are in a position where we can fully utilise the Readxl package!
To bring an excel file into RStudio we must reference the files location on our computer and give it a name. Which we can through using the following:
- FileName = read_excel(“C:\\Users\\admin\\Desktop\\HR_GPS Data.xlsx”)
- NOTE quotation marks around filename and double backspaces
- For Mac users the format is different:
- FileName = read_excel(“Users/admin/Desktop/HR_GPS Data.xlsx”)
If you are loading in from an excel spreadsheet with more than one sheet in it, we must specify which sheet we want using the following:
- FileName = read_excel(“C:\\Users\\admin\\Desktop\\HR_GPS Data.xlsx”, sheet=1)
- Sheet number from order they appear in the spreadsheet
- Hidden sheets are still relevant here
- Default is to load in the first sheet
Now your file should be loaded into RStudio and able to be viewed from the top right section.
First lets go through some basic info about the file:
- Number of Rows
- number of columns
- First 6 rows
- head(data, 10)
- first 10 rows
- last 6 rows
- tail(data, 10)
- last 10 rows
- Basic descriptives about your dataframe
- Info on your data within your dataframe
If you are ever confused about how a function is used or should be written you can place a question mark before it, then when you run the line info about the package will appear i.e ?str
The info for your function will then appear in the console area of your RStudio.
So there we covered how to load data into R from an excel sheet and some of the basics around looking at your data. Next we will look into how we can ensure the data is formatted how we want and basics for filtering the data.
Note: I have skipped past basics around matrices and vectors in R, as the majority of the time you will be loading data in from a file to create a dataframe. As we come across instances where using matrices are beneficial we will look into them more.