I am given a csv file which contains numbers ranging from 800 to 3000. The problem is numbers greater than thousand has a comma in them e.g. 1,227 or 1,074 or 2,403. When I want to calculate their mean, variance or standard deviation using scipy or numpy, I get error: ValueError: could not convert string to float: '1,227'. How convert them to numbers so that I could do calculations on them. CSV file should not be changed as it is read only file.
2 Answers
Thanks, guys! I fixed it by using replace function. hpaulj's link was useful.
my_string=[val[2] for val in csvtext]
my_string=[x.replace(',', '') for x in my_string]
my_float=[float(i) for i in my_string]
This is the code, in which, 1st line loads csv string list to my_string and 2nd line removes comma and 3rd line produces numbers that are easy for calculation. So, there is no need for editing the file or creating a new one. Just a list manipulation will do the job.
Comments
This really is a locale issue, but a simple solution would be to simply call replace on the string first:
a = '1,274'
float(a.replace(',','')) # 1274.0
Another way is to use pandas to read the csv file. Its read_csv function has a thousands argument.
If you do know something about the locale, then it's probably best to use the locale.atof() function
4 Comments
pandas is available, then I'd use that.
csvinto an array. stackoverflow.com/questions/6633523/… hasreplaceandlocalesolutions.tr -d ',' < originalFile.csv > noCommas.csv?