pandas numpy : setting an array element with a sequence while math operation

Question

I have a df named df4,you can get it buy following code:

df4s = """
contract    RB  BeginDate   ValIssueDate    EndDate Valindex0   48  46  47  49  50
2   A00118  46  19850100    19880901    99999999    50  1   2   3   7   7
3   A00118  47  19000100    19880901    19831231    47  1   2   3   7   7
5   A00118  47  19850100    19880901    99999999    50  1   2   3   7   7
6   A00253  48  19000100    19820101    19811231    47  1   2   3   7   7
7   A00253  48  19820100    19820101    19841299    47  1   2   3   7   7
8   A00253  48  19850100    19820101    99999999    50  1   2   3   7   7
9   A00253  50  19000100    19820101    19781231    47  1   2   3   7   7
10  A00253  50  19790100    19820101    19841299    47  1   2   3   7   7
11  A00253  50  19850100    19820101    99999999    50  1   2   3   7   7

"""

df4 = pd.read_csv(StringIO(df4s.strip()), sep='\s+', 
                  dtype={"RB": int, "BeginDate": int, "EndDate": int,'ValIssueDate':int,'Valindex0':int})

Out put would be:

contract    RB  BeginDate   ValIssueDate    EndDate Valindex0   48  46  47  49  50
2   A00118  46  19850100    19880901    99999999    50  1   2   3   7   7
3   A00118  47  19000100    19880901    19831231    47  1   2   3   7   7
5   A00118  47  19850100    19880901    99999999    50  1   2   3   7   7
6   A00253  48  19000100    19820101    19811231    47  1   2   3   7   7
7   A00253  48  19820100    19820101    19841299    47  1   2   3   7   7
8   A00253  48  19850100    19820101    99999999    50  1   2   3   7   7
9   A00253  50  19000100    19820101    19781231    47  1   2   3   7   7
10  A00253  50  19790100    19820101    19841299    47  1   2   3   7   7
11  A00253  50  19850100    19820101    99999999    50  1   2   3   7   7

I'm trying to build a new column by following logic,the value of new column will base on 2 existed columns' values :

def test(RB):
    n=1
    for i in np.arange(RB,50):
        n = n * df4[str(i)].values
    return  n


vfunc=np.vectorize(test)
df4['n']=vfunc(df4['RB'].values)

And then received error:

    res = array(outputs, copy=False, subok=True, dtype=otypes[0])

ValueError: setting an array element with a sequence.

df4[str(i)].values is an array so your return of n (assuming RB is low enough that you do loop) is an array like: [6 6 6 6 6 6 6 6 6] vectorize is attempting to assign this back to a 1D array. Are you looking to create a 2d array here? — Henry Ecker
– Henry Ecker ♦, Commented Aug 27, 2021 at 14:44
@HenryEcker, my answer shows that the error occurs in vectorize, not the asisgnment to the dataframe column. — hpaulj
– hpaulj, Commented Aug 27, 2021 at 15:43

hpaulj · Accepted Answer · 2021-08-27 15:48:42Z

Reconstructing your dataframe (thanks for using the StringIO approach)

In [82]: df4['RB'].values
Out[82]: array([46, 47, 47, 48, 48, 48, 50, 50, 50])
In [83]: test(46)
Out[83]: array([42, 42, 42, 42, 42, 42, 42, 42, 42])
In [84]: test(50)
Out[84]: 1
In [85]: [test(i) for i in df4['RB'].values]
Out[85]: 
[array([42, 42, 42, 42, 42, 42, 42, 42, 42]),
 array([21, 21, 21, 21, 21, 21, 21, 21, 21]),
 array([21, 21, 21, 21, 21, 21, 21, 21, 21]),
 array([7, 7, 7, 7, 7, 7, 7, 7, 7]),
 array([7, 7, 7, 7, 7, 7, 7, 7, 7]),
 array([7, 7, 7, 7, 7, 7, 7, 7, 7]),
 1,
 1,
 1]
In [86]: vfunc=np.vectorize(test)
In [87]: vfunc(df4['RB'].values)
TypeError: only size-1 arrays can be converted to Python scalars

The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "<ipython-input-87-8db8cd5dc5ab>", line 1, in <module>
    vfunc(df4['RB'].values)
  File "/usr/local/lib/python3.8/dist-packages/numpy/lib/function_base.py", line 2163, in __call__
    return self._vectorize_call(func=func, args=vargs)
  File "/usr/local/lib/python3.8/dist-packages/numpy/lib/function_base.py", line 2249, in _vectorize_call
    res = asanyarray(outputs, dtype=otypes[0])
ValueError: setting an array element with a sequence.

Note the full traceback. vectorize is having trouble creating the return array from this set of mixed size arrays. It 'guessed, based on a trial calculation that it should return an int` dtype.

If we tell it to return a object dtype array, we get:

In [88]: vfunc=np.vectorize(test, otypes=['object'])
In [89]: vfunc(df4['RB'].values)
Out[89]: 
array([array([42, 42, 42, 42, 42, 42, 42, 42, 42]),
       array([21, 21, 21, 21, 21, 21, 21, 21, 21]),
       array([21, 21, 21, 21, 21, 21, 21, 21, 21]),
       array([7, 7, 7, 7, 7, 7, 7, 7, 7]),
       array([7, 7, 7, 7, 7, 7, 7, 7, 7]),
       array([7, 7, 7, 7, 7, 7, 7, 7, 7]), 1, 1, 1], dtype=object)

We can assign that to a df column:

In [90]: df4['n']=_
In [91]: df4
Out[91]: 
   contract  RB  BeginDate  ...  49  50                                     n
2    A00118  46   19850100  ...   7   7  [42, 42, 42, 42, 42, 42, 42, 42, 42]
3    A00118  47   19000100  ...   7   7  [21, 21, 21, 21, 21, 21, 21, 21, 21]
5    A00118  47   19850100  ...   7   7  [21, 21, 21, 21, 21, 21, 21, 21, 21]
6    A00253  48   19000100  ...   7   7           [7, 7, 7, 7, 7, 7, 7, 7, 7]
7    A00253  48   19820100  ...   7   7           [7, 7, 7, 7, 7, 7, 7, 7, 7]
8    A00253  48   19850100  ...   7   7           [7, 7, 7, 7, 7, 7, 7, 7, 7]
9    A00253  50   19000100  ...   7   7                                     1
10   A00253  50   19790100  ...   7   7                                     1
11   A00253  50   19850100  ...   7   7                                     1

We could just as well assign the Out[85] list

df4['n']=Out[85]

Time is about the same:

In [94]: timeit vfunc(df4['RB'].values)
211 µs ± 5.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [95]: timeit [test(i) for i in df4['RB'].values]
217 µs ± 6.06 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Usually vectorize is slower, but test itself may be slow enough, and iteration method doesn't make much difference. Remember (reread the docs if necessary), vectorize is not a performance tool. It does not 'compile' your function or make it run faster.

An alternative for returning an object dtype array:

In [96]: vfunc=np.frompyfunc(test,1,1)
In [97]: vfunc(df4['RB'].values)
Out[97]: 
array([array([42, 42, 42, 42, 42, 42, 42, 42, 42]),
       array([21, 21, 21, 21, 21, 21, 21, 21, 21]),
       array([21, 21, 21, 21, 21, 21, 21, 21, 21]),
       array([7, 7, 7, 7, 7, 7, 7, 7, 7]),
       array([7, 7, 7, 7, 7, 7, 7, 7, 7]),
       array([7, 7, 7, 7, 7, 7, 7, 7, 7]), 1, 1, 1], dtype=object)
In [98]: timeit vfunc(df4['RB'].values)
202 µs ± 6.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Collectives™ on Stack Overflow

pandas numpy : setting an array element with a sequence while math operation

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related