I have a data set that has columns for number of units sold in a given month - the problem being that the monthly units columns are named in MM/yyyy format, meaning that I have 12 columns of units information per record.
So for instance, my data looks like:
ProductID | CustomerID | 04/2018 | 03/2018 | 02/2018 | FileDate |
a1032 | c1576 | 36 | 12 | 19 | 04/20/2018 |
What causes this to be problematic is that a new file comes in every month, with the same file name, but different column headers for the units information based on the last 12 months.
What I would like to do, is rename the monthly units columns to Month1, Month2, Month3... based on a simple regex such as ([0-9]*)/([0-9]*) that will result in the output:
ProductID | CustomerID | Month1 | Month2 | Month3 | FileDate |
a1032 | c1576 | 36 | 12 | 19 | 04/20/2018 |
I know that this should be possible using Python, but as I have never used Python before (I am an old .Net developer) I honestly have no idea how to achieve this.
I have done a bit of research on renaming columns in Python, but none of them mentioned pattern matching to rename a column, eg:
df = df.rename(columns={'oldName1': 'newName1', 'oldName2': 'newName2'})
UPDATE: The data that I am showing in my example is only a subset of the columns; total, in my data set I have 120 columns, only 12 of which need to be renamed, this is why I thought that regex might be the simplest way to go.