0

I would like to reconstruct a dataframe from a contingency table stored as dataframe. For example from ctab I would like to build df1 or df2. Is there a command to do that or do I need a loop?

import pandas as pd
ctab = pd.DataFrame([[1,2], [2, 1]], columns=["A", "B"], index=["A", "B"])
print(ctab)
df1 = pd.DataFrame([["A","A", 1], ["A","B", 2], ["B","A", 2], ["B","B", 1]], columns=["col", "index", "freq"])
print(df1)
df2 = pd.DataFrame([["A","A"], ["A","B"], ["A","B"], ["B","A"], ["B","A"], ["B","B"]], columns=["col", "index"])
print(df2)
0

1 Answer 1

2

You can use rename_axis, stack, and reset_index:

out = ctab.rename_axis(index='index', columns='col').stack().reset_index(name='freq')

Output:

  index col  freq
0     A   A     1
1     A   B     2
2     B   A     2
3     B   B     1

For the second one, replicate the rows with Index.repeat:

out = ctab.rename_axis(index='index', columns='col').stack().reset_index(name='freq')

out = out.loc[out.index.repeat(out.pop('freq'))]

Output:

  index col
0     A   A
1     A   B
1     A   B
2     B   A
2     B   A
3     B   B
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.