Python Pandas filtering and creating new dataframe

Question

I am filtering a list for those records that contain a key word in one column. The overall list, outputs is given as:

outputs = 
sent_name   Name    Lat Lng type
    Abbey Road Station, London, UK  Abbey Road, London E15, UK  51.53193    0.00376 [u'transit_station', u'point_of_interest', u'establishment']
    Abbey Wood Station, London, UK  Abbey Wood, London SE2, UK  51.49106    0.12142 [u'transit_station', u'point_of_interest', u'establishment']

I search output[3] for the string 'station' and then append the results where this is true to an empty list, results. As per -

results = []

for output in outputs:
    if "station" in output[3]:
        results.append(output)

I wish to use Pandas for future analysis but do not know how to recreate a DataFrame after filtering these results.

OD = pd.read_csv('./results.csv', header=0)

Where, results.csv is again:

sent_name   Name    Lat Lng type
Abbey Road Station, London, UK  Abbey Road, London E15, UK  51.53193    0.00376 [u'transit_station', u'point_of_interest', u'establishment']
Abbey Wood Station, London, UK  Abbey Wood, London SE2, UK  51.49106    0.12142 [u'transit_station', u'point_of_interest', u'establishment']

Using iterrows, I am able to iterate over the rows in the pandas dataframe and filter out those where 'station' exists in the type column.

    for index, row in OD.iterrows():
        if "station" in row['type']:

However, I have not been able to create a new DataFrame from this. My ultimate aim is to create a new csv (that only contains records that feature 'station' in the type column) using the .to_csv function in Pandas.

I have tried to create a new dataframe with appropriate index names. Then filtering as above and attempting to append these results to the new dataframe

OD_filtered = pd.DataFrame(index=['sent_name','Name','Lat', 'Lng', 'type'])

for index, row in OD.iterrows():
    if "station" in row['type']:
        OD_filtered.append([row['sent_name'], row['Name'], row['Lat'], row['Lng'], row['type']])

pprint(OD_filtered)

However, this fails to write to dataframe and it remains empty. When I print(OD_filtered) it gives:

Empty DataFrame
Columns: []
Index: [sent_name, Name, Lat, Lng, type]

Your read_csv code shouldn't work as your csv has multiple commas but aside from that you should be able to do new_df = OD[OD.apply(lambda x: 'station' in x['type'], axis=1)] I think — EdChum
– EdChum, Commented Sep 3, 2015 at 9:34
Very elegant. I missed the OD.apply method. Please put that as an answer and I can mark it correct — LearningSlowly
– LearningSlowly, Commented Sep 3, 2015 at 9:36

EdChum · Accepted Answer · 2015-09-03 09:38:57Z

You can create a boolean mask by calling apply on 'type' column to create your new df:

In [37]:
import io
import pandas as pd
t="""sent_name;Name;Lat;Lng;type
Abbey Road Station, London, UK;Abbey Road, London E15, UK;51.53193;0.00376;[u'transit_station', u'point_of_interest', u'establishment']
Abbey Wood Station, London, UK;Abbey Wood, London SE2, UK;51.49106;0.12142;[u'transit_station', u'point_of_interest', u'establishment']"""
df = pd.read_csv(io.StringIO(t), sep=';')
df

Out[37]:
                        sent_name                        Name       Lat  \
0  Abbey Road Station, London, UK  Abbey Road, London E15, UK  51.53193   
1  Abbey Wood Station, London, UK  Abbey Wood, London SE2, UK  51.49106   

       Lng                                               type  
0  0.00376  [u'transit_station', u'point_of_interest', u'e...  
1  0.12142  [u'transit_station', u'point_of_interest', u'e...  

In [39]:    
# filter the df
df[df['type'].apply(lambda x: 'station' in x)]

Out[39]:
                        sent_name                        Name       Lat  \
0  Abbey Road Station, London, UK  Abbey Road, London E15, UK  51.53193   
1  Abbey Wood Station, London, UK  Abbey Wood, London SE2, UK  51.49106   

       Lng                                               type  
0  0.00376  [u'transit_station', u'point_of_interest', u'e...  
1  0.12142  [u'transit_station', u'point_of_interest', u'e...

So in your case the following should work:

new_df = OD[OD['type'].apply(lambda x: 'station' in x)]

Collectives™ on Stack Overflow

Python Pandas filtering and creating new dataframe

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related