3

How can I parse strings using the same logic Pandas would use when reading a CSV, where casting "False" to bool would give me False. I have text values entered by users that I need to insert into a DataFrame, they should automatically be cast to the dtype of the column being inserted to using this logic. The example below shows an attempt to insert a value into a boolean column but the result is wrong.

import pandas as pd

x = pd.DataFrame([{'id': 0, 'flag': True},
                  {'id': 1, 'flag': False},
                  {'id': 2, 'flag': True}])

text = "False"
value = x['flag'].dtype.type(text)  # Want this to be False not True
x.loc[0, 'flag'] = value

3 Answers 3

1

Use json.loads() and then convert the dtype of flag to its previous type. It will work for "False", "false", "1", "0" etc

previous_type = x.flag.dtype
x.loc[0, 'flag'] = json.loads(text.lower())
x.flag = x.flag.astype(previous_type)

Complete code:

import pandas as pd
import json

x = pd.DataFrame([{'id': 0, 'flag': True},
                  {'id': 1, 'flag': False},
                  {'id': 2, 'flag': True}])
text = "False"
previous_type = x.flag.dtype
x.loc[0, 'flag'] = json.loads(text.lower())
x.flag = x.flag.astype(previous_type)
print(x)

    id  flag
0   0   False
1   1   False
2   2   True
Sign up to request clarification or add additional context in comments.

4 Comments

This has the same problem as the snippet in my question, the value in the first row ends up being True not False
Edited my answer. Now check it out
Thanks! I like the JSON idea. I considered eval but it wasn't as lenient as the logic behind read_csv because capitalization matters so "false" would fail and similar caveats for other dtypes. One minor issue you might want to edit in the json example is you'd still need to convert to the correct dtype after json.loads, for example if the user text is "0" then json.loads casts that to 0 but that still needs to be cast to False
Yaa, needs to convert it to its previous type in that case.
0

Here is a workaround that works but may not have very good performance

from io import StringIO
import pandas as pd
value = pd.read_csv(StringIO(text), dtype=column_dtype, header=None).values[0][0]

Comments

0

Please remember, In Python

>>bool("faa")
True
>>bool("True")
True
>>bool("False")
True
>>bool("")
False

So In your case,

import pandas as pd

x = pd.DataFrame([{'id': 0, 'flag': True},
                  {'id': 1, 'flag': False},
                  {'id': 2, 'flag': True}])

text = bool("")
value = x['flag'].dtype.type(text)  # Want this to be False not True
print(value) // False
x.loc[0, 'flag'] = value

should do

Another solution might be

import pandas as pd

x = pd.DataFrame([{'id': 0, 'flag': True},
                  {'id': 1, 'flag': False},
                  {'id': 2, 'flag': True}])

text = "False"
value = x['flag'].dtype.type(eval(text))  # Want this to be False not True
print(value) // False
x.loc[0, 'flag'] = value

3 Comments

Changing the value of text is not a viable solution for me, the text is a user input. Users enter values into a GUI displaying a DataFrame as a spreadsheet like Excel and the DataFrame needs to update that value. Having the user enter False by submitting a blank cell would be bad UX, it needs to behave like they're modifying a CSV, that's why I'm looking for how to apply the logic behind read_csv
Ok. I have another solution tho it may not be to your liking but if you use eval(text) , you can get your desired output. I've edited my answer.
Also I'm not sure if you can use converters in defining the types of user input

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.