2

I have a csv that looks like this:

header1 header2 header3 header4 header5 header6
row1    row1    row1    row1    row1    row1
row2    row2    row2    row2    row2    row2
row3    row3    row3    row3    row3    row3

What I want to achieve is to merge each row values into a string separated by #. For example the output would be:

row1#row1#row1#row1#row1#row1
row2#row2#row2#row2#row2#row2
row3#row3#row3#row3#row3#row3

I have already done this using this code:

df = pd.read_csv("test.csv",
                 na_filter=False)
test = df.stack().groupby(level=0).apply('#'.join)
print(test.to_dict())

The only issue with the code above is if a row has an empty value it would still append a "#" to the output making it look like this, assuming row 1 header 5 is empty:

row1#row1#row1#row1##row1

Where it should be like this if row 1 header 5 is empty:

row1#row1#row1#row1#row1

Anyone knows how can I fix this?

1 Answer 1

1

Here is necessary replace emty strings to NaNs, so DataFrame.stack by default remove this empty values:

print (df)
  header1 header2 header3 header4 header5 header6
0    row1    row1    row1    row1            row1
1    row2                            row2    row2
2    row3    row3    row3    row3    row3    row3

test = df.replace('', np.nan).stack().groupby(level=0).apply('#'.join)
print(test.to_dict())
{0: 'row1#row1#row1#row1#row1',
 1: 'row2#row2#row2', 
 2: 'row3#row3#row3#row3#row3#row3'}

Or use:

test = df.replace('', np.nan).apply(lambda x: '#'.join(x.dropna()), axis=1)
print(test.to_dict())
{0: 'row1#row1#row1#row1#row1', 
 1: 'row2#row2#row2', 
 2: 'row3#row3#row3#row3#row3#row3'}

Or:

test = df.apply('#'.join, axis=1).str.replace('[#]+','#', regex=True)
print(test.to_dict())
{0: 'row1#row1#row1#row1#row1', 
 1: 'row2#row2#row2', 
 2: 'row3#row3#row3#row3#row3#row3'}

Thanks @Corralien for another solution:

df.apply(lambda x: '#'.join(i for i in x if i != ''), axis=1).to_dict()
Sign up to request clarification or add additional context in comments.

1 Comment

Or df.apply(lambda x: '#'.join(i for i in x if i != ''), axis=1).to_dict()

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.