Python Pandas convert column data type

Question

I know a question like this has been asked zillion types, but so far I have not been able to find an answer to this question.

I have joined two .csv files together with Pandas and now I would like to add some more columns to the new joined .csv file and the values calculate based on the already available data.

However, I keep getting this error:

"The truth value of a is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()."

Now that obviously seems to be a problem with the data type of my column (which is all integers), but I have not found a (working) way to set that column as integers.

Here is my code:

import pandas

def nscap(ns):
    if ns <= 13:
        x = ns
    elif ns > 13:
        x = 13
    return x

df_1 = pandas.read_csv("a.csv", sep=';', names=["DWD_ID", "NS"], header=0)
df_2 = pandas.read_csv("b.csv", sep=';', names=["VEG", "DWD_ID"], header=0)
df_joined = pandas.merge(df_1, df_2, on="DWD_ID")
df_joined["NS_Cap"] = nscap(df_joined["NS"])

If i set

df_joined["NS_Cap"] = nscap(20)

the code works fine

I have tried functions like .astype(int) or .to_numeric() but unless I had the syntax wrong, it didn't work for me.

Thanks in advance!

Hi, welcome to Stack Overflow. IIUYC, you want to apply nscap against the NS column to get NS_Cap, am I correct? — WGS
– WGS, Commented Sep 26, 2016 at 8:38
You're after df_joined['NS_Cap'] = df_joined['NS'].clip_upper(13) see: pandas.pydata.org/pandas-docs/stable/generated/… there error here is that you're trying to compare an array using an operator that understand scalar values, if you did df_joined['NS'].apply(nscap) then it should work — EdChum
– EdChum, Commented Sep 26, 2016 at 8:39
That worked like a charm! Thank you so much! So far never came across that syntax before! — Kai
– Kai, Commented Sep 26, 2016 at 8:50
Hi, a new question requires a new post. If any of the answers satisfy/ies your original question, kindly upvote/accept it. Just leave a link here to your new post/question. Thanks! — WGS
– WGS, Commented Sep 26, 2016 at 10:52

WGS · Accepted Answer · 2016-09-26 08:46:20Z

1

As with @EdChum's comment, you need to use clip(upper=13) or clip_upper(13). One other option which can help you in the long run with instances like this is to use apply with a lambda function. This is a really nifty all-around method.

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(5,18,size=(5, 4)), columns=list('ABCD'))
nscap = lambda x: min(x, 13)

print df.head()
print '-' * 20

df['NSCAP'] = df['D'].apply(nscap)

print df.head()

Result:

Take note of the last 2 lines of the second dataframe.

Hope this helps.

answered Sep 26, 2016 at 8:46

WGS

14.2k5 gold badges50 silver badges51 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Ami Tavory Over a year ago

Good answer. Note that apply + lambda is usually much slower than the vectorized functions, so clip_upper is probably better.

Kai Over a year ago

I tried both options and yes they both seem much faster than what I was trying before. So that clip_upper(xy) does not only stop counting onwards from that number but also sets every number that is higher to also that number?

WGS Over a year ago

@Khaled Exactly. You are technically setting an upper bound and clipping whatever comes after that.

Ami Tavory · Accepted Answer · 2016-09-26 08:50:51Z

(Your code is missing a parenthesis at the end of nscap(df_joined["NS"].)

As @EdChum and @TheLaughingMan write, clip_upper is what you want here. This answer just addresses the direct reason for the error you're getting.

In the function

def nscap(ns):
    if ns <= 13:
        x = ns
    elif ns > 13:
        x = 13
    return x

effectively, ns <= 13 operations on a numpy.ndarray. When you compare such an array to a scalar, broadcasting takes place, and the result is an array where each element indicates whether it was true for it or not.

So

if ns <= 13:

translates to something like

if numpy.array([True, False, True, True]):

and it's impossible to understand whether this is true or not. That's the error you're getting: you need to specify whether you mean if all entries are true, if some entry is true, and so on.

Collectives™ on Stack Overflow

Python Pandas convert column data type

2 Answers 2

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related