Initial Steps in a Data Analysis Pathway: 4

Today we will cover PowerQuery to merge two separate data-sets to produce one final table. See here for information on how to load the data-sets into PowerQuery and then you can go here for ideas about what PowerQuery can do in terms of manipulating your data.

In order to combine data-sets, there must be one or more columns with identical data in both data-sets such as a date column as seen below.

  • If we start with 2 data-sets loaded into an Excel spreadsheet through 2 separate Queries in PowerQuery but now we want to combine them:
    • In the Query with the majority of the data you want present, select merge query.
    • Now we have the option of how to merge: single column such as Date OR through more than one column
      • For one column, highlighting the relevant columns and ensure information at the bottom shows all rows merged.
      • For 2 or more two columns, such as Season then Date, we must select in the order we want the Query to look through the data.
    • Once you are happy with how you want the merge to occur, select load and the merged data-set should appear as the last column on far right of your data.
    • If you select the icon to the right of the column name, you can identify which of the columns (or all)  to import. Unless you are looking to merge data-sets with very similar or identical data then deselecting “Use Column Name as Prefix” option will help prevent having to rename columns later.

With our example, you can see the information from the merged data-set gets repeated every time a Date match is found.

The above is a small example of how you might bring some information specific to a training or match day into a data export.

NB - This is where ensuring the same data format is used across multiple platforms, i.e. spelling of names, data formats etc. If the structure is different, then Power Query may not merge all the data correctly. A common place for this to happen is athletes with double barrel surnames and with hyphens (-) or apostrophes (') in their name.

Once the required columns are loaded into your data set then you can re-position them by highlighting and dragging then into the right position.

If you are finished with Power Query and have moved onto looking at the table it has produced yet both data-sets are present. You can stop the Query from loading into a sheet by:

  • Right clicking on it in the Query bar
  • Select “Load To”
  • Select “Load to Query only”
    • This prevents data present in the Query from loading into your spreadheet

Perfect, now we’ve briefly covered manipulating data-sets in power query, next we will cover using pivot tables with the data before storing it.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.