0

I have a dataframe with multiple rows. Is there any way in which they can be combined to form one rows? Marked with Yellow Colours. All I want to combine that into one row. Remember that I wanted to ignore the empty rows when you combine. See the output section of attached image("Problem. jpg").

Problem

I want my output look like this.

Output

I cannot find logic to this problem. Any idea?

Tried this code. But it is not working.

import pandas as pd
all_dfs_1 = pd.read_csv("Test.csv",header=None)
all_dfs_1.groupby(0)[1].apply(' '.join).reset_index()

Attach file:- Test.csv

2 Answers 2

1

If you move the "Drilling good ground all shift" to the leftmost column so that your file looks like:

Drilling good ground all shift
2 x Gyro Surveys
Mixing muds to condition the hole
Driller travelled home for shift change at end of shift

Equipment onsite=

then I believe you need ','.join(array) combined with split(',,', ',') to get rid of the empty lines, as shown below:

>>> import numpy as np
>>> data = np.loadtxt('Test.csv')
>>> data
array(['Drilling good ground all shift', '2 x Gyro Surveys',
       'Mixing muds to condition the hole',
       'Driller travelled home for shift change at end of shift', '',
       'Equipment onsite='], dtype='<U55')
>>> ','.join(data).replace(',,', ',')
'Drilling good ground all shift,2 x Gyro Surveys,Mixing muds to condition the hole,Driller travelled home for shift change at end of shift,Equipment onsite='

If you don't want to change Test.csv by hand, you can do so with Pandas, convert it to an array, and proceed as above:

>>> import pandas as pd
>>> all_dfs_1 = pd.read_csv(r"Test.csv", header=None)
>>> all_dfs_1
                                                  0                               1   2   3   4   ...  7   8   9   10  11
0                        Comments & Equip. Transfers  Drilling good ground all shift NaN NaN NaN  ... NaN NaN NaN NaN NaN
1                                   2 x Gyro Surveys                             NaN NaN NaN NaN  ... NaN NaN NaN NaN NaN
2                  Mixing muds to condition the hole                             NaN NaN NaN NaN  ... NaN NaN NaN NaN NaN
3  Driller travelled home for shift change at end...                             NaN NaN NaN NaN  ... NaN NaN NaN NaN NaN
4                                                NaN                             NaN NaN NaN NaN  ... NaN NaN NaN NaN NaN
5                                  Equipment onsite=                             NaN NaN NaN NaN  ... NaN NaN NaN NaN NaN

[6 rows x 12 columns]
>>> all_dfs_1.iloc[0, 0] = all_dfs_1.iloc[0, 1]
>>> all_dfs_1[0]
0                       Drilling good ground all shift
1                                     2 x Gyro Surveys
2                    Mixing muds to condition the hole
3    Driller travelled home for shift change at end...
4                                                  NaN
5                                    Equipment onsite=
Name: 0, dtype: object
>>> data = all_dfs_1[0].values
>>> data
array(['Drilling good ground all shift', '2 x Gyro Surveys',
       'Mixing muds to condition the hole',
       'Driller travelled home for shift change at end of shift', '',
       'Equipment onsite='], dtype='<U55')
>>> ','.join(data).replace(',,', ',')
'Drilling good ground all shift,2 x Gyro Surveys,Mixing muds to condition the hole,Driller travelled home for shift change at end of shift,Equipment onsite='
Sign up to request clarification or add additional context in comments.

2 Comments

','.join(data).replace(',,', ',') This statement is not working. Error:TypeError: sequence item 4: expected str instance, float found
Weird. Do the outputs of >>> all_dfs_1 and >>> data look the way mine did?
0

Not sure if this is exactly what you want, it returns a series which is a single row of a dataframe with all those values joined together

import pandas as pd
import io

#"reads in" the csv file from a string so it can be tested without the file
all_dfs_1 = pd.read_csv(
    io.StringIO(
"""
Comments & Equip. Transfers
2 x Gyro Surveys
Mixing muds to condition the hole
Driller travelled home for shift change at end of shift

Equipment onsite=
"""
    ), 
    header=None
)

#you'll want to do this instead since you have the file
#all_dfs_1 = pd.read_csv("Test.csv",header=None)


single_row = all_dfs_1.apply(lambda v: ','.join(v))
print(single_row)

The output is

0    Comments & Equip. Transfers,2 x Gyro Surveys,M...
dtype: object

If you just wanted a string you could also do:

','.join(all_dfs_1[0].values)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.