I am filtering a list for those records that contain a key word in one column. The overall list, outputs is given as:
outputs =
sent_name Name Lat Lng type
Abbey Road Station, London, UK Abbey Road, London E15, UK 51.53193 0.00376 [u'transit_station', u'point_of_interest', u'establishment']
Abbey Wood Station, London, UK Abbey Wood, London SE2, UK 51.49106 0.12142 [u'transit_station', u'point_of_interest', u'establishment']
I search output[3] for the string 'station' and then append the results where this is true to an empty list, results. As per -
results = []
for output in outputs:
if "station" in output[3]:
results.append(output)
I wish to use Pandas for future analysis but do not know how to recreate a DataFrame after filtering these results.
OD = pd.read_csv('./results.csv', header=0)
Where, results.csv is again:
sent_name Name Lat Lng type
Abbey Road Station, London, UK Abbey Road, London E15, UK 51.53193 0.00376 [u'transit_station', u'point_of_interest', u'establishment']
Abbey Wood Station, London, UK Abbey Wood, London SE2, UK 51.49106 0.12142 [u'transit_station', u'point_of_interest', u'establishment']
Using iterrows, I am able to iterate over the rows in the pandas dataframe and filter out those where 'station' exists in the type column.
for index, row in OD.iterrows():
if "station" in row['type']:
However, I have not been able to create a new DataFrame from this. My ultimate aim is to create a new csv (that only contains records that feature 'station' in the type column) using the .to_csv function in Pandas.
I have tried to create a new dataframe with appropriate index names. Then filtering as above and attempting to append these results to the new dataframe
OD_filtered = pd.DataFrame(index=['sent_name','Name','Lat', 'Lng', 'type'])
for index, row in OD.iterrows():
if "station" in row['type']:
OD_filtered.append([row['sent_name'], row['Name'], row['Lat'], row['Lng'], row['type']])
pprint(OD_filtered)
However, this fails to write to dataframe and it remains empty. When I print(OD_filtered) it gives:
Empty DataFrame
Columns: []
Index: [sent_name, Name, Lat, Lng, type]
read_csvcode shouldn't work as your csv has multiple commas but aside from that you should be able to donew_df = OD[OD.apply(lambda x: 'station' in x['type'], axis=1)]I think