Apply function to all columns of data frame python

Question

I have two dfs

xx

AVERAGE_CALL_DURATION	AVERAGE_DURATION	CHANGE_OF_DETAILS
267	298 0	0
421	609.33	0.33
330	334 0	0
240.5	666.5	0
628	713 0	0

and

NoC_c

AVERAGE_CALL_DURATION	AVERAGE_DURATION	CHANGE_OF_DETAILS
-5.93	-4.95	0.90
593.50	595.70	1.00

I want to return 1 if the xx column contains the range within NoC_c (where column names are the same

I can do this for one column

def check_between_ranges(xx, NoC_c):
    ranges = NoC_c['AVERAGE_CALL_DURATION']
    
    if (xx['AVERAGE_CALL_DURATION'] >= ranges.iloc[0]) and (xx['AVERAGE_CALL_DURATION'] <= ranges.iloc[1]):
        return 1
    return xx['AVERAGE_CALL_DURATION']

xx['AVERAGE_CALL_DURATION2'] = xx.apply(lambda x: check_between_ranges(x, NoC_c), axis=1)

However, I need remove the element of manually specifying the column name as the actual dfs contain many more columns.

I have tried

a = NoC_c.columns

def check_between_ranges(xx, NoC_c):
    ranges = NoC_c[a]
    
    if (xx[a] >= ranges.iloc[0]) & (xx[a] <= ranges.iloc[1]):
        return 1

xx.apply(lambda x: check_between_ranges(x, NoC_c[a]), axis=1)

However, I get the error ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I tried the solutions listed here, although, they were unsuccessful

Also read this to address the specific error but didn't aid in my issue

Any help would be appreciated.

Traceback (most recent call last):

  File "<ipython-input-11-2affca771555>", line 10, in <module>
    xx.apply(lambda x: check_between_ranges(x, NoC_c[a]), axis=1)

  File "C:\Program Files\Anaconda3\lib\site-packages\pandas\core\frame.py", line 7552, in apply
    return op.get_result()

  File "C:\Program Files\Anaconda3\lib\site-packages\pandas\core\apply.py", line 185, in get_result
    return self.apply_standard()

  File "C:\Program Files\Anaconda3\lib\site-packages\pandas\core\apply.py", line 276, in apply_standard
    results, res_index = self.apply_series_generator()

  File "C:\Program Files\Anaconda3\lib\site-packages\pandas\core\apply.py", line 305, in apply_series_generator
    results[i] = self.f(v)

  File "<ipython-input-11-2affca771555>", line 10, in <lambda>
    xx.apply(lambda x: check_between_ranges(x, NoC_c[a]), axis=1)

  File "<ipython-input-11-2affca771555>", line 6, in check_between_ranges
    if (xx[a] >= ranges.iloc[0]) & (xx[a] <= ranges.iloc[1]):

  File "C:\Program Files\Anaconda3\lib\site-packages\pandas\core\generic.py", line 1330, in __nonzero__
    f"The truth value of a {type(self).__name__} is ambiguous. "

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Edit:: Many thanks to @jch for the solution. I'm re-posting here as I had to modify some of the syntax for it to work with my datasets

def check_between_ranges(x):
    v = []
    
    for c in x.index:
        if (x[c] >= NoC_c.iloc[0][c]) & (x[c] <= NoC_c.iloc[1][c]):
            v += [1]
        else:
            v += [x[c]]
            
    return pd.Series(v, index=x.index)


xx.apply(check_between_ranges, axis=1)

that error raises when you're expecting a single value to be either (True, False) but you're actually feeding an object with multiple values, like a Series of Boolean values. Can you add the full trace so we can see what line is raising the Error? — Yuca
– Yuca, Commented Aug 9, 2022 at 15:54
as the other answer in your post, you have to apply a function (xx[a] >= ranges.iloc[0]).all() or (xx[a] >= ranges.iloc[0]).any() depending on your solution logic. That should answer the original question. If there are other issues after you solve for that, then you need to ask another question since the original is solved — Yuca
– Yuca, Commented Aug 9, 2022 at 17:55

sitting_duck · Accepted Answer · 2022-08-09 18:29:02Z

0

Would this work for you?

Comparison Function

def check_between_ranges(x):
    v = []
    
    for c in x.index:
        if (x[c] >= NoC_c.at[0,c]) & (x[c] <= NoC_c.at[1,c]):
            v += [1]
        else:
            v += [x[c]]
            
    return pd.Series(v, index=x.index)

Execution

xx.apply(check_between_ranges, axis=1)

Result

   AVERAGE_CALL_DURATION  AVERAGE_DURATION  CHANGE_OF_DETAILS
0                    1.0              1.00               0.00
1                    1.0            609.33               0.33
2                    1.0              1.00               0.00
3                    1.0            666.50               0.00
4                  628.0            713.00               0.00

answered Aug 9, 2022 at 18:29

sitting_duck

3,7901 gold badge17 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

bgun Over a year ago

I was getting an error and had to modify the .at on line 5. Otherwise, a working solution. Many thanks!

Let's try · Accepted Answer · 2022-08-09 15:56:06Z

0

You have almost the solution. Try to add .all(), docs here:

def check_between_ranges(xx, NoC_c):
    ranges = NoC_c[a]
    
    if (xx[a] >= ranges.iloc[0]).all() & (xx[a] <= ranges.iloc[1]).all():
        return 1

answered Aug 9, 2022 at 15:56

Let's try

1,0589 silver badges20 bronze badges

1 Comment

bgun Over a year ago

When running the above only a series is returned. Instead, the output should retain the shape of the original df only with the new imputation Output: 4537 nan 7245 nan 2334 nan 7023 nan 2152 nan

Collectives™ on Stack Overflow

Apply function to all columns of data frame python

2 Answers 2

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related