I am importing a CSV file using pandas,
CSV Column header - Year, Model, Trim, Result
The values coming in from the csv file are as follows -
Year | Model | Trim | Result
2012 | Camry | SR5 | 1
2014 | Tacoma | SR5 | 1
2014 | Camry | XLE | 0
etc..
There are 2500+ rows in the data set containing over 200 unique models.
All Values are then converted to numerical values for analysis purposes.
Here the inputs are the first 3 columns of the csv file and the output is the fourth result column
Here is my script:
import pandas as pd
inmport numpy as np
c1 = []
c2 = []
c3 = []
input = []
output = []
# read in the csv file containing 4 columns
df = pd.read_csv('success.csv')
df.convert_objects(convert_numeric=True)
df.fillna(0, inplace=True)
# convert string values to numerical values
def handle_non_numerical_data(df):
columns = df.columns.values
for column in columns:
text_digit_vals = {}
def convert_to_int(val):
return text_digit_vals[val]
if df[column].dtype != np.int64 and df[column].dtype != np.float64:
column_contents = df[column].values.tolist()
unique_elements = set(column_contents)
x = 0
for unique in unique_elements:
if unique not in text_digit_vals:
text_digit_vals[unique] = x
x+=1
df[column] = list(map(convert_to_int, df[column]))
return df
df = handle_non_numerical_data(df)
# extract each column to insert into input array later
c1.append(df['Year'])
c2.append(df['Model'])
c3.append(df['Trim'])
#create input array containg the first 3 rows of the csv file
input = np.stack_column(c1,c2,c3)
output.append(df['Result'])
This works fine except append only excepts 1 value, would I use extend as that seems it would attach it to the end of the array?
UPDATE
Essentially all of this works great, my problem is creating the input array, I would like the array to consist of 3 columns - Year, Model, Trim.
input = ([['Year'], ['Model'], ['Trim']],[['Year'], ['Model'], ['Trim']]...)
I can only seem to add one value on top of the other rather than having them sequence..
What I get now -
input = ([['Year'], ['Year'], ['Year']].., [['Model'], ['Model'], ['Model']]..[['Trim'], ['Trim'], ['Trim']]...)
pd.read_csvis not acceptable. I suspect that whatever you are trying to accomplish can be done in a much more straightforward manner.handle_non_numerical_datais probably not the best way to convert your values integers, that can be handled much more easily and efficiently using built-in pandas/numpy functions. Also, why you are putting all the columns in a list, intead of usingdf.valuesis not clear either. I will repeat, I suspect that whatever you are trying to accomplish can be done in a much more straightforward manner.