2

I am trying to merge multiple columns within a csv into a single column with each original column's header being repeated as shown below.

userA   userB
A1  B1
A2  B2
A2  B3
A2  B4

Into this:

userA   A1
userA   A2
userA   A3
userA   A4
userB   B1
userB   B2
userB   B3
userB   B4

Does anyone have any suggestions on how to do this. I do have some experience in pandas but I'm currently at a loss.

UPDATE: I found how to merge the columns

df = pd.read_csv(filename, sep='\t')
df = df.combine_first(pd.Series(df.values.ravel('F')).to_frame('merged'))

FINAL UPDATE: Solved using melt()

df = pd.melt(df)
5
  • With a dataframe of just those two columns, you could do df.stack().reset_index(level=1) Commented Apr 19, 2018 at 0:35
  • @cmaher This is great but the entries are not ordered properly. It's now alternating between userA and userB. Got an idea how to produce the above order? Commented Apr 19, 2018 at 0:49
  • That's not what your output in your question indicates. Can you update the expected output your question first? Commented Apr 19, 2018 at 0:51
  • Good work finding a solution, can you post that as the answer and accept it so we can close this question? Commented Apr 19, 2018 at 0:55
  • 1
    @cmaher I solved it! Your first comment was all I needed. I then used: df.sort_values(by=[0]) to sort properly. Thanks! Commented Apr 19, 2018 at 0:55

3 Answers 3

2

You can using melt

df.melt()
Out[702]: 
  variable value
0    userA    A1
1    userA    A2
2    userA    A2
3    userA    A2
4    userB    B1
5    userB    B2
6    userB    B3
7    userB    B4
Sign up to request clarification or add additional context in comments.

2 Comments

df=pd.melt(df) is exactly what I needed! Nothing more is required. Thank you.
@testac1234 yw :-) happy coding
2

construct with ravel and repeat

pd.Series(df.values.ravel(), df.columns.repeat(len(df)))

userA    A1
userA    B1
userA    A2
userA    B2
userB    A2
userB    B3
userB    A2
userB    B4
dtype: object

Comments

1

Solved first using:

With a dataframe of just those two columns, you could do df.stack().reset_index(level=1) – cmaher

Following by a simple sort to order properly:

df.sort_values(by=[0])

See pd.melt(df) above for better answer.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.