1

I am trying to add additional index rows to an existing pandas dataframe after loading csv data into it.

So let's say I load my data like this:

columns = ['Relative_Pressure','Volume_STP']
df = pd.read_csv(StringIO(contents), skiprows=4, delim_whitespace=True,index_col=False,header=None)
df.columns = columns

where contents is a string in csv format. The resulting DataFrame might look something like this:

Imported csv data

For clarity reasons I would now like to add additional index rows to the DataFrame as shown here:

MulitIndex dataframe

However in the link these multiple index rows are generated right when the DataFrame is created. I would like to add e.g. rows for unit or descr to the columns.

How could I do this?

1
  • The solution provided at your link looks ingenious, and a bit too so. Using multi-index for metadata storage has non-trivial impacts on performance and sub-optimal maintainability for future updates. The easiest solution is to provide a README for the data. A better solution is to create a subclass, but only add a metadata property with print_metadata to print it. You can optionally override __str__ and __unicode__ to print metadata first, and then the super().__str__ and super().__unicode__. But if you are distributing a library with data, it's easier to give them a text README. Commented Mar 27, 2019 at 13:42

1 Answer 1

1

You can create a MultiIndex on the columns by specifically creating the index and then assigning it to the columns separately from reading in the data.

I'll use the example from the link you provided. The first method is to create the MultiIndex when you make the dataframe:

df = pd.DataFrame({('A',1,'desc A'):[1,2,3],('B',2,'desc B'):[4,5,6]})
df.columns.names=['NAME','LENGTH','DESCRIPTION']
df

NAME             A      B
LENGTH           1      2
DESCRIPTION desc A desc B
0                1      4
1                2      5
2                3      6

As stated, this is not what you are after. Instead, you can make the dataframe (from your file for example) and then make the MultiIndex from a set of lists and then assign it to the columns:

df = pd.DataFrame({'desc A':[1,2,3], 'desc B':[4,5,6]})

# Output
   desc A  desc B
0       1       4
1       2       5
2       3       6

# Create a multiindex from lists
index = pd.MultiIndex.from_arrays((['A', 'B'],  [1, 2], ['desc A', 'desc B']))

# Assign to the columns
df.columns = index


# Output
       A      B
       1      2
  desc A desc B
0      1      4
1      2      5
2      3      6


# Name the columns
df.columns.names = ['NAME','LENGTH','DESCRIPTION']

# Output
NAME             A      B
LENGTH           1      2
DESCRIPTION desc A desc B
0                1      4
1                2      5
2                3      6

There are other ways to construct a MultiIndex, for example, from_tuples and from_product. You can read more about Multi Indexes in the documentation.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.