Combine multiple rows of different strings into one using pandas

Question

I have a dataframe with multiple rows. Is there any way in which they can be combined to form one rows? Marked with Yellow Colours. All I want to combine that into one row. Remember that I wanted to ignore the empty rows when you combine. See the output section of attached image("Problem. jpg").

Problem

I want my output look like this.

Output

I cannot find logic to this problem. Any idea?

Tried this code. But it is not working.

import pandas as pd
all_dfs_1 = pd.read_csv("Test.csv",header=None)
all_dfs_1.groupby(0)[1].apply(' '.join).reset_index()

Attach file:- Test.csv

Bernardo Trindade · Accepted Answer · 2021-08-18 14:53:20Z

If you move the "Drilling good ground all shift" to the leftmost column so that your file looks like:

Drilling good ground all shift
2 x Gyro Surveys
Mixing muds to condition the hole
Driller travelled home for shift change at end of shift

Equipment onsite=

then I believe you need ','.join(array) combined with split(',,', ',') to get rid of the empty lines, as shown below:

>>> import numpy as np
>>> data = np.loadtxt('Test.csv')
>>> data
array(['Drilling good ground all shift', '2 x Gyro Surveys',
       'Mixing muds to condition the hole',
       'Driller travelled home for shift change at end of shift', '',
       'Equipment onsite='], dtype='<U55')
>>> ','.join(data).replace(',,', ',')
'Drilling good ground all shift,2 x Gyro Surveys,Mixing muds to condition the hole,Driller travelled home for shift change at end of shift,Equipment onsite='

If you don't want to change Test.csv by hand, you can do so with Pandas, convert it to an array, and proceed as above:

>>> import pandas as pd
>>> all_dfs_1 = pd.read_csv(r"Test.csv", header=None)
>>> all_dfs_1
                                                  0                               1   2   3   4   ...  7   8   9   10  11
0                        Comments & Equip. Transfers  Drilling good ground all shift NaN NaN NaN  ... NaN NaN NaN NaN NaN
1                                   2 x Gyro Surveys                             NaN NaN NaN NaN  ... NaN NaN NaN NaN NaN
2                  Mixing muds to condition the hole                             NaN NaN NaN NaN  ... NaN NaN NaN NaN NaN
3  Driller travelled home for shift change at end...                             NaN NaN NaN NaN  ... NaN NaN NaN NaN NaN
4                                                NaN                             NaN NaN NaN NaN  ... NaN NaN NaN NaN NaN
5                                  Equipment onsite=                             NaN NaN NaN NaN  ... NaN NaN NaN NaN NaN

[6 rows x 12 columns]
>>> all_dfs_1.iloc[0, 0] = all_dfs_1.iloc[0, 1]
>>> all_dfs_1[0]
0                       Drilling good ground all shift
1                                     2 x Gyro Surveys
2                    Mixing muds to condition the hole
3    Driller travelled home for shift change at end...
4                                                  NaN
5                                    Equipment onsite=
Name: 0, dtype: object
>>> data = all_dfs_1[0].values
>>> data
array(['Drilling good ground all shift', '2 x Gyro Surveys',
       'Mixing muds to condition the hole',
       'Driller travelled home for shift change at end of shift', '',
       'Equipment onsite='], dtype='<U55')
>>> ','.join(data).replace(',,', ',')
'Drilling good ground all shift,2 x Gyro Surveys,Mixing muds to condition the hole,Driller travelled home for shift change at end of shift,Equipment onsite='

','.join(data).replace(',,', ',') This statement is not working. Error:TypeError: sequence item 4: expected str instance, float found
Weird. Do the outputs of >>> all_dfs_1 and >>> data look the way mine did?

mitoRibo · Accepted Answer · 2021-08-18 14:31:06Z

Not sure if this is exactly what you want, it returns a series which is a single row of a dataframe with all those values joined together

import pandas as pd
import io

#"reads in" the csv file from a string so it can be tested without the file
all_dfs_1 = pd.read_csv(
    io.StringIO(
"""
Comments & Equip. Transfers
2 x Gyro Surveys
Mixing muds to condition the hole
Driller travelled home for shift change at end of shift

Equipment onsite=
"""
    ), 
    header=None
)

#you'll want to do this instead since you have the file
#all_dfs_1 = pd.read_csv("Test.csv",header=None)


single_row = all_dfs_1.apply(lambda v: ','.join(v))
print(single_row)

The output is

0    Comments & Equip. Transfers,2 x Gyro Surveys,M...
dtype: object

If you just wanted a string you could also do:

','.join(all_dfs_1[0].values)

Collectives™ on Stack Overflow

Combine multiple rows of different strings into one using pandas

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related