How to add a MultiIndex after loading csv data into a pandas dataframe?

Question

I am trying to add additional index rows to an existing pandas dataframe after loading csv data into it.

So let's say I load my data like this:

columns = ['Relative_Pressure','Volume_STP']
df = pd.read_csv(StringIO(contents), skiprows=4, delim_whitespace=True,index_col=False,header=None)
df.columns = columns

where contents is a string in csv format. The resulting DataFrame might look something like this:

For clarity reasons I would now like to add additional index rows to the DataFrame as shown here:

However in the link these multiple index rows are generated right when the DataFrame is created. I would like to add e.g. rows for unit or descr to the columns.

How could I do this?

The solution provided at your link looks ingenious, and a bit too so. Using multi-index for metadata storage has non-trivial impacts on performance and sub-optimal maintainability for future updates. The easiest solution is to provide a README for the data. A better solution is to create a subclass, but only add a metadata property with print_metadata to print it. You can optionally override __str__ and __unicode__ to print metadata first, and then the super().__str__ and super().__unicode__. But if you are distributing a library with data, it's easier to give them a text README. — Pik-Mai Hui
– Pik-Mai Hui, Commented Mar 27, 2019 at 13:42

willk · Accepted Answer · 2019-03-27 13:32:24Z

You can create a MultiIndex on the columns by specifically creating the index and then assigning it to the columns separately from reading in the data.

I'll use the example from the link you provided. The first method is to create the MultiIndex when you make the dataframe:

df = pd.DataFrame({('A',1,'desc A'):[1,2,3],('B',2,'desc B'):[4,5,6]})
df.columns.names=['NAME','LENGTH','DESCRIPTION']
df

NAME             A      B
LENGTH           1      2
DESCRIPTION desc A desc B
0                1      4
1                2      5
2                3      6

As stated, this is not what you are after. Instead, you can make the dataframe (from your file for example) and then make the MultiIndex from a set of lists and then assign it to the columns:

df = pd.DataFrame({'desc A':[1,2,3], 'desc B':[4,5,6]})

# Output
   desc A  desc B
0       1       4
1       2       5
2       3       6

# Create a multiindex from lists
index = pd.MultiIndex.from_arrays((['A', 'B'],  [1, 2], ['desc A', 'desc B']))

# Assign to the columns
df.columns = index


# Output
       A      B
       1      2
  desc A desc B
0      1      4
1      2      5
2      3      6


# Name the columns
df.columns.names = ['NAME','LENGTH','DESCRIPTION']

# Output
NAME             A      B
LENGTH           1      2
DESCRIPTION desc A desc B
0                1      4
1                2      5
2                3      6

There are other ways to construct a MultiIndex, for example, from_tuples and from_product. You can read more about Multi Indexes in the documentation.

Collectives™ on Stack Overflow

How to add a MultiIndex after loading csv data into a pandas dataframe?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related