0

I'm looking for something equivalent to pd.read_table(path/to/file, index_col=[0,1]) for an existing pd.DataFrame.

I frequently encounter pd.DataFrames that have the following format:

# Index Data
iters = 3*[1] + 3*[2] + 3*[3]
clusters = 3*[1,2,3]

# Recreate DataFrame
DF_A = pd.DataFrame([iters, clusters], index = ["iteration", "cluster"]).T
DF_B = pd.DataFrame(np.random.RandomState(0).normal(size=(100,9)), index = ["attr_%d"%_ for _ in range(100)]).T
DF_concat = pd.concat([DF_A, DF_B], axis=1).set_index("iteration", drop=True)
DF_concat.head()

enter image description here

If I loaded these into Python, I would just do index_col=[0,1] like I described above but how can I convert a prexisting pd.DataFrame pd.Index into a pd.MultiIndex so iteration is the outer index level and cluster is the inner index level?

I tried the following but the assignments got messed up. There should only be 3 per iteration for the simple example I made:

DF_B.index = pd.MultiIndex(levels=[DF_concat["cluster"].index.tolist(), DF_concat["cluster"].tolist()], labels=[DF_concat["cluster"].index.tolist(), DF_concat["cluster"].tolist()], names=["iteration", "cluster"])
DF_B

enter image description here

1 Answer 1

1

How about this..

DF_concat.set_index([DF_concat.index, 'cluster'])
Sign up to request clarification or add additional context in comments.

1 Comment

I didn't know you could called the index while you're setting it. Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.