0

I have to create a dataframe from a file that contains some columns repeated and their values split as it follows:

enter image description here

As you can see c1 for example is split into 3 parts or c2 into 2

What i want to get it is something like:

enter image description here

I know that i can merge the columns by:

df.sum(index=1) or df.max(index=1)

but don't know how to specify that I want to do it with specific columns.
Another possibility could be to create dataframes with only the repeated columns, apply either sum or max and then merge everything.

But I was wondering if there is something less "ugly".

1 Answer 1

4

In a much more simple fashion, you can use groupby for that.

In [1]: df = pd.DataFrame(np.random.random_integers(0,10,(5,8)), columns=['C1','C2','C3','C1','C4','C1','C5','C2'])

In [2]: df
Out[2]:
    C1  C2  C3  C1  C4  C1  C5  C2
0   5   0   9   1   7   3   3   8
1   3   1   10  7   1   2   3   8
2   1   0   0   0   4   10  6   10

In [3]: # Groupby level 0 on axis 1 (columns) and apply a sum
df.groupby(level=0, axis=1).sum()

Out[3]:
    C1  C2  C3  C4  C5
0   9   8   9   7   3
1   12  9   10  1   3
2   11  10  0   4   6
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.