8

I am trying to merge time-course data from different participants. I am iteratively extracting a dataframe per participant and concatenating them at the end of the loop. Before I concatenate, I would like to add the ID of my participants to an additional index.

This seems REALLY straightforward, but I was unable to find anything on this issue :(

I would like to turn this

    col
0     1
1   1.1
2   NaN

Into:

          col
ID    0     1
      1   1.1
      2   NaN

I know I could make a new index like:

multindex = [np.array(ID*len(data)),np.array(np.arange(len(data)))]

But that's inelegant without end, and - seeing as I am measuring with high frequency over half an hour - would even get kind of slow :/

I would like to mention that I have recently found my question to be a duplicate of this other question. However mine apparently has more upvotes and better answers. “Prepend” apparently doesn't seem to draw as many hits.

2
  • 3
    It might be better to just add the ID as a column, then add to index when you concat ? Commented Nov 20, 2013 at 0:59
  • 1
    Andy, that is indeed the answer given in the duplicate as well. Seems to me like a better answer than the one accepted here. Commented Oct 8, 2014 at 5:49

1 Answer 1

14

Maybe you can use keys argument of concat:

import numpy as np
import pandas as pd

df1 = pd.DataFrame(np.random.rand(3, 2))
df2 = pd.DataFrame(np.random.rand(4, 2))
df3 = pd.DataFrame(np.random.rand(5, 2))

print pd.concat([df1, df2, df3], keys=["A", "B", "C"])

output:

            0         1
A 0  0.863774  0.794880
  1  0.578503  0.418619
  2  0.215317  0.146167
B 0  0.655829  0.116917
  1  0.862316  0.812847
  2  0.500126  0.689218
  3  0.653439  0.270427
C 0  0.825213  0.882963
  1  0.579436  0.332047
  2  0.456948  0.718893
  3  0.795074  0.826773
  4  0.049676  0.697471

If you want to append other dataframes later:

df4 = pd.DataFrame(np.random.rand(6, 2))
pd.concat([df, pd.concat([df4], keys=["D"])])
Sign up to request clarification or add additional context in comments.

1 Comment

Though the keys can be useful at times, I don't think this actually answers the question. This way another concat cannot be made afterwards, which the question seemed to imply would be necessary. So, e.g. I don't see how you could easily append df4 = pd.DataFrame(np.random.rand(3,2)) with a new key "D" after having done the above. This answer seems more suited to this purpose.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.