Pivoting Stacked Data with Power Query

No matter the program you use to build your self-service BI solution, data will be at the heart of it.  And the challenge with data – of course – is that it’s seldom ready to be consumed.  Even if it’s as simple as just changing a column name, you’ll no doubt want to modify something.

Unfortunately, not everything is as easy as just changing a column name, is it?  Take the following example, for instance:

Who would provide data in a format like this?  Credit card statements are famous for it, but they’re far from the only one!

So how would you convert this into a nice clean tabular format so that you can feed it into a chart, Pivot Table or data model table?  Your credit card provider certainly isn’t going to do it for you, are they?

The key thing to recognize about this data is that there is a pattern.  After that header row we can see that the file has a repeating pattern of Date, Vendor, Amount, Space.  It then repeats in blocks of 4 rows through the entire file.  And that – right there – is the key.  The pattern is consistent through the entire file.

Let’s look at how to clean this data format using Power Query.  The great news is that this will work in either Excel or Power BI, as both softwares use Power Query to Get & Transform data today.

We start by creating a new query against the data source.  Assuming this data is stored in a text file, we would do this by navigating to:

Excel 2016: New Query -> From File -> From Text
Excel 2019/365: Get Data -> From File -> From Text/CSV
Power BI Desktop: Get Data -> Text/CSV

We’d then navigate to and select the file, click Import and then click the button to the left of cancel.  (It will read Edit or Transform Data, depending on the program you’re using.)

At this point, you’ll be looking at the data table. Power Query will automatically promote the first row to be a header, leaving us to sort out the rest of the data.

The first step is to add a column which contains a Line ID number for the file:
Go to Add Column tab -> Index Column -> From 0

This creates a new column called “Index” with sequential numbers starting at 0.  



 
Before we move forward, we need to identify the “magic number” for this pattern.  We identify this number by looking in the Index column for the value associated with the first line of the second transaction.  In this case, given that Power Query started counting from 0, the first line of the second transaction shows a value of 4.  Remember that we need the number from the Index column, NOT the row number!

Armed with our magic number, we can now continue to the next piece of the pattern that will let us create a Transaction ID number:

Select the [Index] column -> Add Column tab -> Standard -> Integer Divide -> *Magic Number

This creates a new column where each of the first four rows will show a value of 0, the second four rows will show 1 and so on.

Next up, we need to convert the File Line ID to be the Transaction Line ID number, as we’ll need this to finish our data set.  Make sure that you run the next command from the Transform tab, and not the Add Column tab, as we want to convert the column, not create a new one

Select the [Index] column -> Transform tab -> Standard -> Modulo -> *Magic Number

We’re now set to make some real magic happen:

Select the [Index] column -> Transform tab -> Pivot Column
Set the Values field to our data column (Transactions in this case)
Click the arrow to expand the Advanced options and choose “Don’t Aggregate”
Click OK

And boom!

 

How cool is that?  Now all that is left to do is:
Remove the Integer-Divide column
Remove any unnecessary columns
Rename the columns as desired
Set your Data Types
Choose a Load destination

These steps are actually a pattern that can be applied to any data set that exhibits this stacked repeating pattern, and it’s one of the most useful data transformation patterns to know.  It can be applied to unpivot single column sets like this one, stacked pairs, horizontal pairs and others.  And the best part is – since it’s using Power Query technology – it’s 100% refreshable.  Just save next month’s data file over the old one, hit the Refresh button, and all that’s left to do is enjoy the amount of time you saved!

Let’s face it… manual cleanup is error-prone, time consuming and boring.  If you’re still doing manual data cleanup and copying and pasting into your BI solutions, you are wasting your time.  You need to learn Power Query and get your life back.

Would you like more useful data patterns?  Check out our Power Query Recipes collection.  And if you really want to Master Your Data, we also have the most advanced Power Query Academy on the planet to take you from newbie to master.