DataFrame. TypeError: 'numpy.int64' inside a FOR loop in Data

Question

Goal is to create a function able to de-cumulate the column of table. For example:

data_matrix = {'x0': [0, 2, 0],

'x1':[2, 3, 1],

'x3':[4, 4, 1],

'x4':[6, 5, 1]}

data_mtrice = pd.DataFrame(data_matrix)
data_matrice = data_mtrice.T

That produces:

data_matrice.head()

Out[30]:

    0  1  2

x0  0  2  0

x1  2  3  1

x3  4  4  1

x4  6  5  1

Each line is a cumulative sum. For example, 0+2= 2 then 0+2+2 = 4 then 0+2+2+2 = 6. I am looking for a function to de-cumulate. I tried to write:

import pandas as pd
def decumule(tableau):

    decu_table = np.zeros(tableau.shape)
    for ligne, element in enumerate(tableau.iloc()):
        print("ligne = ",ligne)
        for colon, elem in enumerate(tableau.iloc[ligne]):
            if ligne > 0:
                print("colonn",colon)
                decu_table.iloc[[ligne, colon]] = tableau.iloc[[ligne, colon]] - tableau.iloc[[ligne - 1, colon]]
            else:
                 decu_table.iloc[[ligne, colon]] = tableau.iloc[[ligne, colon]]
    return decu_table

tentative = data_matrice.apply(lambda tableau: decumule(tableau))

That produces:

for colon, elem in enumerate(tableau.iloc[ligne]):

TypeError: 'numpy.int64' object is not iterable

Do you have any idea what can go wrong?

Regards, Atapalou

Matteo Zanoni · Accepted Answer · 2022-03-01 14:39:24Z

There are tree problems with your code.

The first is with tentative = data_matrice.apply(lambda tableau: decumule(tableau)), your function decumule expects a dataframe as input, when you call apply though the function is applied one row at a time so it gets only a row as input. This is easily fixable by just changing it to tentative = decumule(data_matrix).

The second problem is with the indexing in the .iloc, if you call df.iloc[[1, 2]] you get full rows number 1 and 2. To get the element at index (1, 2) instead you need df.iloc[1, 2].

The third (and simplest) is that decu_table is created as a numpy ndarray which does not support .iloc. To fix this just convert it to a pandas dataframe

Fixing both problems you get:

import numpy as np
import pandas as pd

data_matrix = {"x0": [0, 2, 0], "x1": [2, 3, 1], "x3": [4, 4, 1], "x4": [6, 5, 1]}

data_matrix = pd.DataFrame(data_matrix)
data_matrix = data_matrix.T


def decumule(tableau):
    decu_table = pd.DataFrame(
        np.zeros(tableau.shape), columns=tableau.columns, index=tableau.index
    )
    for ligne, element in enumerate(tableau.iloc()):
        print("ligne = ", ligne)
        for colon, elem in enumerate(tableau.iloc[ligne]):
            if ligne > 0:
                print("colonn", colon)
                decu_table.iloc[ligne, colon] = (
                    tableau.iloc[ligne, colon] - tableau.iloc[ligne - 1, colon]
                )
            else:
                decu_table.iloc[ligne, colon] = tableau.iloc[ligne, colon]
    return decu_table


tentative = decumule(data_matrix)

That produces:

     0    1    2
0  0.0  2.0  0.0
1  2.0  1.0  1.0
2  2.0  1.0  0.0
3  2.0  1.0  0.0

Just as an additional note you are performing the calculations one cell at a time. This can be simplified by calculating one row at a time like this:

def decumule(tableau):
    decu_table = pd.DataFrame(
        np.zeros(tableau.shape), columns=tableau.columns, index=tableau.index
    )
    for ligne, element in enumerate(tableau.iloc()):
        print("ligne = ", ligne)
        if ligne > 0:
            decu_table.iloc[ligne] = tableau.iloc[ligne] - tableau.iloc[ligne - 1]
        else:
            decu_table.iloc[ligne] = tableau.iloc[ligne]
    return decu_table

Wich gives the same result as the previous but is faster

Cameron Riddell · Accepted Answer · 2022-03-08 17:29:30Z

1

De-cumulation is commonly referred to as an array diff operation. pandas also can perform diffs on its base objects as well, so you can get your desired result like so:

print(
    data_matrice.diff()
)
      0    1    2
x0  NaN  NaN  NaN
x1  2.0  1.0  1.0
x3  2.0  1.0  0.0
x4  2.0  1.0  0.0

Then if you want those NaN values in the top row to match the original values from data_matrice you can add in a call to .fillna

print(
    data_matrice.diff().fillna(data_matrice.iloc[0])
)
      0    1    2
x0  0.0  2.0  0.0
x1  2.0  1.0  1.0
x3  2.0  1.0  0.0
x4  2.0  1.0  0.0

answered Mar 8, 2022 at 17:29

Cameron Riddell

13.8k14 silver badges21 bronze badges

Comments

Yserbius · Accepted Answer · 2022-03-08 21:53:16Z

0

By making a detour through numpy, another solution is:

def decumule(tableau):
    decu_table = pd.DataFrame(
        np.zeros(tableau.shape), columns=tableau.columns, index=tableau.index
    )
    for ligne, element in enumerate(tableau.iloc()):
        print("ligne = ", ligne)
        if ligne > 0:
            decu_table.iloc[ligne] = tableau.iloc[ligne] - tableau.iloc[ligne - 1]
        else:
            decu_table.iloc[ligne] = tableau.iloc[ligne]
    return pd.DataFrame(decu_table)

edited Mar 8, 2022 at 21:53

Yserbius

1,40413 silver badges19 bronze badges

answered Mar 8, 2022 at 17:18

Atapalou

612 silver badges7 bronze badges

Collectives™ on Stack Overflow

DataFrame. TypeError: 'numpy.int64' inside a FOR loop in Data

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related