0

I am given a csv file which contains numbers ranging from 800 to 3000. The problem is numbers greater than thousand has a comma in them e.g. 1,227 or 1,074 or 2,403. When I want to calculate their mean, variance or standard deviation using scipy or numpy, I get error: ValueError: could not convert string to float: '1,227'. How convert them to numbers so that I could do calculations on them. CSV file should not be changed as it is read only file.

4
  • You haven't shown any code. Theres loads of ways to do this, depending on your actual approach when reading the csv Commented Oct 7, 2017 at 18:35
  • This isn't a formatting issue but rather a reading issue - how to load a csv into an array. stackoverflow.com/questions/6633523/… has replace and locale solutions. Commented Oct 7, 2017 at 19:21
  • How about writing a new version of the file without commas? tr -d ',' < originalFile.csv > noCommas.csv? Commented Oct 7, 2017 at 21:14
  • my_string=[val[2] for val in csvfile] my_float=[float(my_string.replace(',', '')) for i in my_string)] this is what I am trying to do. So my_string has string list. e.g. numbers with comma. I am trying to convert to my_float where replace would have worked. Since it is a list of strings, this code is not working. Commented Oct 7, 2017 at 23:04

2 Answers 2

1

Thanks, guys! I fixed it by using replace function. hpaulj's link was useful.

my_string=[val[2] for val in csvtext]
my_string=[x.replace(',', '') for x in my_string]
my_float=[float(i) for i in my_string]

This is the code, in which, 1st line loads csv string list to my_string and 2nd line removes comma and 3rd line produces numbers that are easy for calculation. So, there is no need for editing the file or creating a new one. Just a list manipulation will do the job.

Sign up to request clarification or add additional context in comments.

Comments

0

This really is a locale issue, but a simple solution would be to simply call replace on the string first:

a = '1,274'
float(a.replace(',',''))  # 1274.0

Another way is to use pandas to read the csv file. Its read_csv function has a thousands argument.

If you do know something about the locale, then it's probably best to use the locale.atof() function

4 Comments

Not if you use numpy to read in the CSV, or even the base CSV module. You need clarification from OP to hope to answer this.
I agree. The question isn't very clear. However, the ValueError message does indicate that he is dealing with numbers as strings.
Then don't shoot for an answer. Ask for clarification first. Rep gain is secondary to providing something that's useful.
I found an old SO question that gives essentially these two answers. But if pandas is available, then I'd use that.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.