Delete rows of a pandas data frame having string values in python 3.4.1

Question

I have read a csv file with pandas read_csv having 8 columns. Each column may contain int/string/float values. But I want to remove those rows having string values and return a data frame with only numeric values in it. Attaching the csv sample.
I have tried to run this following code:

import pandas as pd
import numpy as np  
df = pd.read_csv('new200_with_errors.csv',dtype={'Geo_Level_1' : int,'Geo_Level_2' : int,'Geo_Level_3' : int,'Product_Level_1' : int,'Product_Level_2' : int,'Product_Level_3' : int,'Total_Sale' : float})
print(df)

but I get the following error:

TypeError: unorderable types: NoneType() > int()

I am running with python 3.4.1. Here is the sample csv.

Geo_L_1,Geo_L_2,Geo_L_3,Pro_L_1,Pro_L_2,Pro_L_3,Date,Sale
1, 2, 3, 129, 1, 5193316745, 1/1/2012, 9
1 ,2, 3, 129, 1, 5193316745, 1/1/2013,  
1, 2, 3, 129, 1, 5193316745, , 8
1, 2, 3, 129, NA, 5193316745, 1/10/2012, 10
1, 2, 3, 129, 1, 5193316745, 1/10/2013, 4
1, 2, 3, ghj, 1, 5193316745, 1/10/2014, 6
1, 2, 3, 129, 1, 5193316745, 1/11/2012, 4
1, 2, 3, 129, 1, ghgj, 1/11/2013, 2
1, 2, 3, 129, 1, 5193316745, 1/11/2014, 6
1, 2, 3, 129, 1, 5193316745, 1/12/2012, ghgj
1, 2, 3, 129, 1, 5193316745, 1/12/2013, 5

You'll have to post raw data of your complete df, you either have to clean up the csv before or after reading it into pandas — EdChum
– EdChum, Commented Oct 27, 2014 at 8:17
the sample data has these errors in the aforesaid columns that is why i had given but the sample data consists of all the 8 columns. @fredtantini — sayak_ghosh90
– sayak_ghosh90, Commented Oct 27, 2014 at 8:50
i have just edited the main sample csv file i.e complete df. @EdChum — sayak_ghosh90
– sayak_ghosh90, Commented Oct 27, 2014 at 9:04

EdChum · Accepted Answer · 2014-10-27 10:10:27Z

So the way I would approach this is to try to convert the columns to an int using a user function with a Try/Catch to handle the situation where the value cannot be coerced into an Int, these get set to NaN values. Drop the row where you have an empty value, for some reason it actually has a length of 1 when I tested this with your data, it may work for you using len 0.

In [42]:
# simple function to try to convert the type, returns NaN if the value cannot be coerced
def func(x):
    try:
        return int(x)
    except ValueError:
        return NaN
# assign multiple columns 
df['Pro_L_1'], df['Pro_L_3'], df['Sale'] = df['Pro_L_1'].apply(func), df['Pro_L_3'].apply(func), df['Sale'].apply(func)
# drop the 'empty' date row, take a copy() so we don't get a warning
df = df.loc[df['Date'].str.len() > 1].copy()
# convert the string to a datetime, if we didn't drop the row it would set the empty row to today's date
df['Date']= pd.to_datetime(df['Date'])
# now convert all the dtypes that are numeric to a numeric dtype
df = df.convert_objects(convert_numeric=True)
# check the dtypes
df.dtypes

Out[42]:
Geo_L_1             int64
Geo_L_2             int64
Geo_L_3             int64
Pro_L_1           float64
Pro_L_2           float64
Pro_L_3           float64
Date       datetime64[ns]
Sale              float64
dtype: object
In [43]:
# display the current situation
df
Out[43]:
    Geo_L_1  Geo_L_2  Geo_L_3  Pro_L_1  Pro_L_2     Pro_L_3       Date  Sale
0         1        2        3      129        1  5193316745 2012-01-01     9
1         1        2        3      129        1  5193316745 2013-01-01   NaN
3         1        2        3      129      NaN  5193316745 2012-01-10    10
4         1        2        3      129        1  5193316745 2013-01-10     4
5         1        2        3      NaN        1  5193316745 2014-01-10     6
6         1        2        3      129        1  5193316745 2012-01-11     4
7         1        2        3      129        1         NaN 2013-01-11     2
8         1        2        3      129        1  5193316745 2014-01-11     6
9         1        2        3      129        1  5193316745 2012-01-12   NaN
10        1        2        3      129        1  5193316745 2013-01-12     5
In [44]:
# drop the rows
df.dropna()
Out[44]:
    Geo_L_1  Geo_L_2  Geo_L_3  Pro_L_1  Pro_L_2     Pro_L_3       Date  Sale
0         1        2        3      129        1  5193316745 2012-01-01     9
4         1        2        3      129        1  5193316745 2013-01-10     4
6         1        2        3      129        1  5193316745 2012-01-11     4
8         1        2        3      129        1  5193316745 2014-01-11     6
10        1        2        3      129        1  5193316745 2013-01-12     5

For the last line assign it so df = df.dropna()

Collectives™ on Stack Overflow

Delete rows of a pandas data frame having string values in python 3.4.1

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related