Starting from where we left off previously with the file loaded into R, there is a good chance one or more columns have loaded in the wrong format (with any time or date related columns being the main culprits here).
For my file, the session duration column has loaded in ‘DateTime’ format rather than just time. To fix this we must extract the time only and specify how we want it formatted. We have two options as to what to do with the Duration column, we can override it and make the Duration column show time only or create a separate column showing time only:
- New Column
- Filename$NewColumn <- strftime(FileName$Duration, format=’%H:%M:%S’)
- Override Duration Column
- Filename$Duration <- strftime(FileName$Duration, format=’%H:%M:%S’)
If you have doubts about how to specify the time format then ?strftime will give you more information or this site also gives a good breakdown.
Note: Dates will probably loaded into R in YYYY-MM-DD format. It is best to leave them in this format while working R as it works best and other formats can cause issues. See here for more info
The number of decimal points a column loads in with can also often be too many and is something we can easily remedy. For now we will discuss rounding individual columns and in another post we will look at how to round multiple columns together
- Individual Columns
- FileName$’Max Velocity’ <- round(FileName$’Max Velocity’, 2)
- FileName$’Total Distance’ <- round(FileName$’Total Distance’, 2)
NOTE: The column names in this example have quotation marks due to the presence of a space between the two words in the name.
Perhaps due to the issue highlighted above where having a space in the column names can cause problems, you decided the first step will be to rename your columns. We have a few options when it comes to renaming columns in R:
- Individual Columns
- By Name
- names(FileName)[names(FileName) == “Max Velocity”] <- “MaxVelocity”
- names(FileName)[names(FileName) == “Total Distance”] <- “TotalDistance”
- By Column Number
- names(FileName)[8] <- “MaxVelocity”
- names(FileName)[7] <- “TotalDistance”
- By Name
- All Columns
- colnames(FileName) <- c(“Name1”, “Name2”, “Name3″, ………”Name_n”)
- NOTE this method must have a name for every column, in the correct order as your dataframe. Without enough names it will leave columns nameless. It is only useful for low numbers of columns and will not be shown here as a result.
- colnames(FileName) <- c(“Name1”, “Name2”, “Name3″, ………”Name_n”)
We can also use a function from a package called “plyr” to rename columns (See here for install info)
- FileName <- rename(FileName, c(“Max Velocity” = “MaxVelocity”, “Total Distance” = “TotalDistance”))
Finally we will take a quick look at filtering data in your dataframe. It’s not unusual to want to only look at an individual athlete separate to everyone else and in R its a quick process to filter out an individuals data, again we have a number of ways to look at.
- Basic Approach:
- Individual <- FileName[FileName$’Player Name’ == “Player 1”,]
Note the final comma is an indication we want all rows & columns for Player 1
Throughout the blog I will start to reference ‘base r’ approaches and ‘tidyverse‘ approaches. ‘base r’ approaches are ones that generally do not rely on packages and ‘tidyverse’ approaches rely on a group on packages known collectively as the ‘tidyverse’ approach. The above is a ‘base r’ approach and next is a ‘tidyverse’ approach. You need either the ‘tidyverse’ or ‘dplyr’ package installed and loaded for this approach.
- Single Athlete
- Individual <- FileName %>% filter(“Player 1” == `Player Name`)
Theres a few things to have a look at above. The %>% symbol is known as a piping function (‘magrittr’ package required) and essentially means we do not have to reference the dataframe name repeatedly. We will come across variations of this quite frequently. The columns name must be surrounded by “, not “” or ” if the contain spaces or begin with a number.

That’s all for today!