Pandas - Merge each row values into a string ignoring the empty values

Question

I have a csv that looks like this:

header1 header2 header3 header4 header5 header6
row1    row1    row1    row1    row1    row1
row2    row2    row2    row2    row2    row2
row3    row3    row3    row3    row3    row3

What I want to achieve is to merge each row values into a string separated by #. For example the output would be:

row1#row1#row1#row1#row1#row1
row2#row2#row2#row2#row2#row2
row3#row3#row3#row3#row3#row3

I have already done this using this code:

df = pd.read_csv("test.csv",
                 na_filter=False)
test = df.stack().groupby(level=0).apply('#'.join)
print(test.to_dict())

The only issue with the code above is if a row has an empty value it would still append a "#" to the output making it look like this, assuming row 1 header 5 is empty:

row1#row1#row1#row1##row1

Where it should be like this if row 1 header 5 is empty:

row1#row1#row1#row1#row1

Anyone knows how can I fix this?

jezrael · Accepted Answer · 2021-10-07 06:12:46Z

1

Here is necessary replace emty strings to NaNs, so DataFrame.stack by default remove this empty values:

print (df)
  header1 header2 header3 header4 header5 header6
0    row1    row1    row1    row1            row1
1    row2                            row2    row2
2    row3    row3    row3    row3    row3    row3

test = df.replace('', np.nan).stack().groupby(level=0).apply('#'.join)
print(test.to_dict())
{0: 'row1#row1#row1#row1#row1',
 1: 'row2#row2#row2', 
 2: 'row3#row3#row3#row3#row3#row3'}

Or use:

test = df.replace('', np.nan).apply(lambda x: '#'.join(x.dropna()), axis=1)
print(test.to_dict())
{0: 'row1#row1#row1#row1#row1', 
 1: 'row2#row2#row2', 
 2: 'row3#row3#row3#row3#row3#row3'}

Or:

test = df.apply('#'.join, axis=1).str.replace('[#]+','#', regex=True)
print(test.to_dict())
{0: 'row1#row1#row1#row1#row1', 
 1: 'row2#row2#row2', 
 2: 'row3#row3#row3#row3#row3#row3'}

Thanks @Corralien for another solution:

df.apply(lambda x: '#'.join(i for i in x if i != ''), axis=1).to_dict()

edited Oct 7, 2021 at 6:12

answered Oct 7, 2021 at 5:57

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Corralien Over a year ago

Or df.apply(lambda x: '#'.join(i for i in x if i != ''), axis=1).to_dict()

Collectives™ on Stack Overflow

Pandas - Merge each row values into a string ignoring the empty values

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related