Initial Steps in a Data Analysis Pathway: PowerQuery

Hi all, so following on from our previous post where we outlined what the data might include for our regular data collection, today we will look how to use PowerQuery (here for how to activate) in Excel to automate the breakdown of data exports, which can include GPS, HR, VBT, AMS etc. The benefit of using a Query to perform the initial analysis is we can automate and standardise the initial steps in our data analysis while ensuring data is standardised across different sources.

In this case when I speak about standardising how the data looks, I am referring to the format (number, decimal, string/character, date, date/time etc.), layout and the spelling of any character based metrics. While these could be considered basic aspects as we can automate this through a Query, it can be carried out quickly and prevent these aspects forcing us to go back into the data and look for these errors later on in the analysis.

  • The first step depends on which version of Excel you are running
    • Newer Excel
      • Select Get Data, under the Data Tab, then From File (Most softwares will export using CSV/Text format, some may use Excel)
    • Older Excel
      • Select Power Query from Ribbon
      • New Query
      • Once the Query Editor page opens, select Get Data, then From File (Most softwares will export using CSV/Text format, some may use Excel)
  • Find the export file you wish to use, then select open and load
  • As you can see in our example the file imports with some info along the top, which is not the easiest for us to work with.
  • However, the Query bar along the right hand side of the excel sheet should have appeared now, and if you right click on the Query you have just created and select Edit we can stand to alter the import
  • Using the Remove Top Rows function, we can remove the info along the top.
  • Use First Row as Headers and now the data is in a much better format for us to use.
  • Here it is worth scanning through the data and seeing if anything stands out as abnormal, data involving time/duration/dates needs to be checked and data type changed by selecting the icon the left of the column name.
  • If you are happy,  select Close & Load, making sure to save any changes.
  • Your data should now load as a table into the Excel sheet.

That’s a very basic way to format a data export and now our data is in a more usable format, however we may want to add some information to the dataset before storing such as season or sessions type. Next we will cover how to use Query to automate this process every day (Hint: its the filename & location) and adding in some customisations.

EDIT 30/01/2019

The original version of this blog included images to help however following a change to WordPress these have disappeared.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.