In the previous blog we looked at the initial steps taken to import data using PowerQuery. Here we will cover how to use PowerQuery (See here to activate) to manipulate our data or add custom columns.
We will start off where we left the data in the last blog, with the data loaded, removed some top rows and made our column headers. Here are a few of the options PowerQuery gives you:
Remove Columns (Remove unwanted metrics from your export to reduce data storage needs)
- We have two ways of doing this: select the columns we want to keep, right click, remove other columns OR select the columns you want to remove, right click, remove columns.
Remove Rows (Filter out invalid data or specific types of data)
- Remove rows is slightly more complicated: You have the button at the top where you can remove specific rows (Top, Bottom, Alternate, Duplicate, Errors) as used previously
- OR you can specify data within the row that means you want to remove it or remove all the other data. To do this highlight the cell, right click, Equals, then you have the option how you want to interact with this data to affect the table (=, <, >, =<, =>, <>)
- Add Columns Tab (Top of Page)
- Add Custom column
- Generally we be looking to add three types of custom columns: one which involves a calculation carried out on other columns; a column in which we want to add custom data or words; a blank column for us to use later.
- To add a calculated column (Useful for: distance covered in multiple speed bands; time above %HR; change time formats (Hours -> Mins))
- Select custom column
- Name the column
- Select the required columns from the menu on the right with the calculation carried out on them, ensure the green tick is showing under the calculation
- Select OK and your new column should appear at the far right of your data
- Add custom info or blank column (Add info specific to your environment or leave space to add later)
- Select custom column
- Name the column
- If you are adding some number based information then simply typing it out will work, ie 20182019 for a season column. However if you are adding letter based information then it will need to be surrounded by quotation marks, i.e. “Hello”.
- Similarly, for a blank column you just need the two quotation marks so the formula will be = “”
It’s not unusual to be in a situation where all the above are combined such as:
- Step 1 – Create: Metric X = [Metric A]+[Metric B]
- Step 2 – Remove [Metric A] & [Metric B]
As PowerQuery will work through your data in step-by-step process every time, you can remove columns from your data at the end of the Query without affecting steps involving them at the start.
Hopefully that gave you an insight into how to use PowerQuery to automate the initial data manipulation process. In the next blog we look at how to bring two separate queries together to produce one table.
NB - It's important that anything you add in at this stage involves changes to the data you want to make on a regular basis but not any changes which will need daily review. In other words, you shouldn't have to go back into the Query once its setup to change parts around regularly.
PS – If you have put together a long query and need to make adjustments to early steps within the query, at times it can be easier to select “Advanced Editor” in the top left of your screen and manipulate the script in there. Note query uses the programming language called M, however you shouldn’t need to learn about this unless you go quite far down the Query road!
PPS – If you are unfortunate enough to deal with very messy data exports, which no matter how much you play around with in Excel, are never in a format useful for analysis I would suggest looking into the R software. In particular the “tidyverse‘ package within R. Although rare, due to data export formats I have come across an instance where data analysis involved:
Software –> R –> Excel/Query –> R –> Excel
Lucky after the initial suffering all of this was automated for future efforts!