Interpreting Strings as Other Data Types in Python

Question

I'm reading a file into python 2.4 that's structured like this:

field1: 7
field2: "Hello, world!"
field3: 6.2

The idea is to parse it into a dictionary that takes fieldfoo as the key and whatever comes after the colon as the value.

I want to convert whatever is after the colon to it's "actual" data type, that is, '7' should be converted to an int, "Hello, world!" to a string, etc. The only data types that need to be parsed are ints, floats and strings. Is there a function in the python standard library that would allow one to make this conversion easily?

The only things this should be used to parse were written by me, so (at least in this case) safety is not an issue.

wim · Accepted Answer · 2012-01-31 00:54:09Z

6

First parse your input into a list of pairs like fieldN: some_string. You can do this easily with re module, or probably even simpler with slicing left and right of the index line.strip().find(': '). Then use a literal eval on the value some_string:

>>> import ast
>>> ast.literal_eval('6.2')
6.2
>>> type(_)
<type 'float'>
>>> ast.literal_eval('"Hello, world!"')
'Hello, world!'
>>> type(_)
<type 'str'>
>>> ast.literal_eval('7')
7
>>> type(_)
<type 'int'>

edited Jan 31, 2012 at 0:54

answered Jan 31, 2012 at 0:35

wim

368k114 gold badges681 silver badges817 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Dan Over a year ago

The version of python I'm using doesn't have the ast module.

wim Over a year ago

@MikeSamuel obviously the input must be preprocessed into fieldn: string pairs first, but that part is trivial. @julio.alegria _ is a handy shortcut for the last returned value in the interactive interpreter. @Dan ..erm.. now you tell me ;) upgrade python? is there a reason why you need to use such an old version?

Dan Over a year ago

@Mike Samuel: Safety isn't an issue for me. I don't need to parse anything that I haven't written myself with another program. +1 on your comment for pointing it out, though.

wim Over a year ago

mail.python.org/pipermail/python-list/2009-September/… here someone backported literal_eval to 2.4, but it all sounds a bit hacky to me. i would prefer to upgrade python than use that, personally.

Dan Over a year ago

@wim: I figured out I could just use eval(). See answer below, and thanks for pointing me in the right direction.

|

Farshid Varno · Accepted Answer · 2022-02-18 03:11:16Z

4

You can use yaml to parse the literals which is better than ast in that it does not throw you an error if strings are not wrapped around extra pairs of apostrophes or quotation marks.

>>> import yaml
>>> yaml.safe_load('7')
7
>>> yaml.safe_load('Hello')
'Hello'
>>> yaml.safe_load('7.5')
7.5

answered Feb 18, 2022 at 3:11

Farshid Varno

515 bronze badges

Comments

jjwchoy · Accepted Answer · 2012-01-31 00:56:16Z

2

You can attempt to convert it to an int first using the built-in function int(). If the string cannot be interpreted as an int a ValueError exception is raised. You can then attempt to convert to a float using float(). If this fails also then just return the initial string

def interpret(val):
    try:
        return int(val)
    except ValueError:
        try:
            return float(val)
        except ValueError:
            return val

answered Jan 31, 2012 at 0:56

jjwchoy

1,92816 silver badges20 bronze badges

Comments

JBernardo · Accepted Answer · 2012-01-31 00:57:48Z

1

For older python versions, like the one being asked, the eval function can be used but, to reduce evilness, a dict to be the global namespace should be used as second argument to avoid function calls.

>>> [eval(i, {"__builtins__":None}) for i in ['6.2', '"Hello, world!"', '7']]
[6.2, 'Hello, world!', 7]

edited Jan 31, 2012 at 0:57

answered Jan 31, 2012 at 0:50

JBernardo

33.6k13 gold badges92 silver badges120 bronze badges

1 Comment

Mark Anthony Libres Over a year ago

it raise "SyntaxError: unexpected EOF while parsing" when applying "alphanumeric" values instead to interpret a string.

Rik Poggi · Accepted Answer · 2012-01-31 00:57:35Z

1

Since the "only data types that need to be parsed are int, float and str", maybe somthing like this will work for you:

entries = {'field1': '7', 'field2': "Hello, world!", 'field3': '6.2'}

for k,v in entries.items():
    if v.isdecimal():
        conv = int(v)
    else:
        try:
            conv = float(v)
        except ValueError:
            conv = v
    entries[k] = conv

print(entries)
# {'field2': 'Hello, world!', 'field3': 6.2, 'field1': 7}

answered Jan 31, 2012 at 0:57

Rik Poggi

29.5k7 gold badges69 silver badges84 bronze badges

Comments

tworec · Accepted Answer · 2018-05-09 17:56:36Z

1

There is strconv lib.

In [22]: import strconv
/home/tworec/.local/lib/python2.7/site-packages/strconv.py:200: UserWarning: python-dateutil is not installed. As of version 0.5, this will be a hard dependency of strconv fordatetime parsing. Without it, only a limited set of datetime formats are supported without timezones.
  warnings.warn('python-dateutil is not installed. As of version 0.5, '

In [23]: strconv.convert('1.2')
Out[23]: 1.2

In [24]: type(strconv.convert('1.2'))
Out[24]: float

In [25]: type(strconv.convert('12'))
Out[25]: int

In [26]: type(strconv.convert('true'))
Out[26]: bool

In [27]: type(strconv.convert('tRue'))
Out[27]: bool

In [28]: type(strconv.convert('12 Jan'))
Out[28]: str

In [29]: type(strconv.convert('12 Jan 2018'))
Out[29]: str

In [30]: type(strconv.convert('2018-01-01'))
Out[30]: datetime.date

answered May 9, 2018 at 17:56

tworec

4,8052 gold badges31 silver badges34 bronze badges

1 Comment

lucasr300 Over a year ago

Actually, it does not handle unicode strings, see github.com/bruth/strconv/issues/2

cola · Accepted Answer · 2012-01-31 00:40:34Z

0

Hope this helps to do what you are trying to do:

#!/usr/bin/python

a = {'field1': 7}
b = {'field2': "Hello, world!"}
c = {'field3': 6.2}

temp1 = type(a['field1'])
temp2 = type(b['field2'])
temp3 = type(c['field3'])

print temp1
print temp2
print temp3

answered Jan 31, 2012 at 0:40

cola

12.5k36 gold badges109 silver badges166 bronze badges

2 Comments

Dan Over a year ago

I don't want to get the types of objects in a dictionary, I want to convert strings in a dictionary that are annotated as python types to the types they represent.

cola Over a year ago

Can you post example input and output, that will easier to understand?

Community · Accepted Answer · 2017-05-23 11:56:51Z

0

Thanks to wim for helping me figure out what I needed to search for to figure this out.

One can just use eval():

>>> a=eval("7")
>>> b=eval("3")
>>> a+b
10
>>> b=eval("7.2")
>>> a=eval("3.5")
>>> a+b
10.699999999999999
>>> a=eval('"Hello, "')
>>> b=eval('"world!"')
>>> a+b
'Hello, world!'

edited May 23, 2017 at 11:56

CommunityBot

11 silver badge

answered Jan 31, 2012 at 0:57

Dan

12.8k15 gold badges52 silver badges86 bronze badges

3 Comments

tzot Over a year ago

Great! Now make sure you don't import os in your source, to avoid evaluating values like os.system("rm *"). And that's not the only way. So this method works, but it's not recommended.

Dan Over a year ago

It's evil and insecure, but this entire script is a quick and dirty fix that should (ideally) be thrown away in a few months.

tzot Over a year ago

I had a Q&D awk script that I wrote in 1989 implementing a very crude commercial order processor “until the app we wait is ready” that was still being used up to 1996 that I know of, and a Q&D 1995 QBasic army service chores assigner (whatever you might understand of it :) that was still used in 2007 (albeit modified by others to no end, I presume), so I'm certain “quick&dirty” programs are as quick but lots more dirtier than people usually think they are.

Voy · Accepted Answer · 2024-06-04 07:02:29Z

I put together this function to help with the type inference of lists.

def infer_dtypes(values:List, sample_size:int=300, stop_after:int=300):
    """
    Infers the data type by randomly sampling from a list. Values are explicitly converted to string before checking.

    Args:
        values (list): A list to infer data types from.
        sample_size (int, optional): The number of values to sample from the list. Entire list will be sampled if set to None. Defaults to 300.
        stop_after (int, optional): The maximum number of non-empty values needed for the test. Equal to sample_size if set to None. Defaults to 300.

    Returns:
        str: The inferred data type ('int', 'float', 'bool', 'str', 'mixed', 'empty').
    """
    found = 0
    non_empty_count = 0

    sample_size = sample_size if sample_size is not None else len(values)
    stop_after = stop_after if stop_after is not None else sample_size

    for v in np.random.choice(values, sample_size):
        v = str(v)
        if v != '':
            non_empty_count += 1
            if non_empty_count > stop_after:
                break
            try:
                int(v)
                found |= 1
            except ValueError:
                try:
                    float(v)
                    found |= 2
                except ValueError:
                    if v.lower() in ['true', 'false']:
                        found |= 4
                    else:
                        found |= 8


    # Check if the data is mixed
    if bin(found).count('1') > 1:
        return 'mixed'

    if found & 8:
        return 'str'
    elif found & 4:
        return 'bool'
    elif found & 2:
        return 'float'
    elif found & 1:
        return 'int'
    else:
        return 'empty'

Produces:

infer_dtypes(['', '', '1', '2', '3', '4', '5'])  # int
infer_dtypes(['', '', '1.0', '2.0', '', '3.0', '4.4', '5.0'])  # float
infer_dtypes(['', '', 'True', 'False', '', '', 'False', 'True'])  # bool
infer_dtypes(['', '', 'never', 'gonna', '', '', 'give', ''])  # str
infer_dtypes(['', '', 'never', '', '5', 'True', '5.2', ''])  # mixed
infer_dtypes(['', '', '', '', '', '', '', ''])  # empty

Rationale, feel free to skip this:

I wrote this function as currently Pandas' df.convert_dtypes, df.infer_objects and pd.to_numeric don't work nicely if you have columns with empty strings. This could be solved (source 1, source 2) if a DataFrame has columns of uniform datatypes, for example if we know that it only has floats we could replace '' with np.nan and then infer. However for a DataFrame with mixed column types (strings, floats, ints), replacing '' with np.nan wouldn't work. This function helps solve this issue by running:

values = np.where(pd.isnull(df.T.values), '', df.T.values)
for l in values:
    infer_dtypes(l)

See this GitHub Gist for a full example. Hope it helps!

Collectives™ on Stack Overflow

Interpreting Strings as Other Data Types in Python

9 Answers 9

7 Comments

Comments

Comments

1 Comment

Comments

1 Comment

2 Comments

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

9 Answers 9

7 Comments

Comments

Comments

1 Comment

Comments

1 Comment

2 Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related