0

I have a dataset and one of the columns contains integers for some rows and strings for other rows. The column type is object.

e.g:

Index     Column of interest
1         21678849
2         37464859
3         barbara
4         28394821
5         francis

I can't force the column to change type using .astype('str'). And I am unable to use .isstring, .isdigit, or .isinstance. I've tried looking at solutions for converting on objects to string but these don't seem to work.

I've also tried:

[True if x.isin([1,2,3,4,5,6,7,8,9,0]) else False for x in df['column_of_interest']]

But that just gives me: AttributeError: 'str' object has no attribute 'isin'

Anyone have any other ideas of how I can manage this?

Ideally I would like to create a third column that categorises whether the row is an int or a str.

3
  • 2
    you can use pd.to_numeric(df['Column of interest'],errors='coerce') will force your strings into nulls. Commented Oct 8, 2020 at 14:56
  • So this could also work. I would then just need to add another step to make the column that identifies whether something is int or Null. Commented Oct 8, 2020 at 15:48
  • 1
    you could chain it into one, df['DataType'] = np.where(pd.to_numeric(df['Column of interest'],errors='coerce').isnull(), 'Text','Number') lots of ways to do this, pandas has built in datatypes you could always leverage those too. Commented Oct 8, 2020 at 19:36

4 Answers 4

2

You can try is instance:

[isinstance(x, int) for x in df['column_of_interest']]
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks. Unfortunately, all the values come out as False. It seems the dtype is stuck in some limbo.
@Jameson do df.to_dict() and paste the output instead of your data. It's hard to tell which is text which is not.
Sorry the reason I couldn't share the actual df is because there was a lot of personal information on it.
1

Okay, this works and I tested it:

import pandas as pd

#----------------------------------------
# Prepare the data in df.
#----------------------------------------

from io import StringIO

TESTDATA = StringIO("""Index;column_of_interest
1;21678849
2;37464859
3;barbara
4;28394821
5;francis""")

df = pd.read_csv(TESTDATA, sep=";")

#----------------------------------------
# The actual code to solve the problem.
#----------------------------------------

def is_integer(x):
    try:
        int(x)
        return True
    except ValueError:
        return False

print([is_integer(x) for x in df['column_of_interest']])

Output is

[True, True, False, True, False]

Of course some of the code doesn't apply to you, but I wanted a full working example which I (and others) could actually test. I assume you can pick out what you need from it.

The code to test for integerness was taken from https://stackoverflow.com/a/1267145/1629102.

And finally code that adds the data as a new column:

import pandas as pd

#----------------------------------------
# Prepare the data in df.
#----------------------------------------

from io import StringIO

TESTDATA = StringIO("""Index;column_of_interest
1;21678849
2;37464859
3;barbara
4;28394821
5;francis""")

df = pd.read_csv(TESTDATA, sep=";")

#----------------------------------------
# The actual code to solve the problem.
#----------------------------------------

def is_integer(x):
    try:
        int(x)
        return True
    except ValueError:
        return False

is_integer_list = [is_integer(x) for x in df['column_of_interest']]

df["Is_integer"] = is_integer_list

print(df)

with this output:

   Index column_of_interest  Is_integer
0      1           21678849        True
1      2           37464859        True
2      3            barbara       False
3      4           28394821        True
4      5            francis       False

1 Comment

You're welcome, Jameson! And thanks for prompting me to learn a little bit about pandas! 🙂 I heard it mentioned many times, but this was my first time actually trying it.
0

Try this-

[True if x in [1,2,3,4,5,6,7,8,9,0] else False for x in df['column_of_interest']]

2 Comments

Ok, so the code worked. Thank you! Unfortunately, the output are all False values.
That code does not work. It tests whether x is a single digit number, i.e. betwen 0 and 9. x.is_integer() likely works better, cf. my answer.
0

I admit I don't know pandas, but from reading about it I boldly suggest using

x.is_integer()

instead of

x.isin([1,2,3,4,5,6,7,8,9,0])

So the code would be

[x.is_integer() for x in df['column_of_interest']]

5 Comments

Thanks for the offer, but it doesn't work. The error is "AttributeError: 'str' object has no attribute 'is_integer'".
Ah, x is a string (type str). Then you could check whether it is an integer with bool(re.match(r"^\d+$", x)) and import re earlier in the code.
Well, maybe I should just back out, as I don't know enough about pandas and shouldn't suggest too much that turns out not to work!
Okay, I didn't give up after all. Instead I read further about pandas and came up with an answer that I actually tested and actually works.
Hahahah I love the enthusiasm!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.