0

Goal is to create a function able to de-cumulate the column of table. For example:

data_matrix = {'x0': [0, 2, 0],

'x1':[2, 3, 1],

'x3':[4, 4, 1],

'x4':[6, 5, 1]}

data_mtrice = pd.DataFrame(data_matrix)
data_matrice = data_mtrice.T

That produces:

data_matrice.head()

Out[30]:

    0  1  2

x0  0  2  0

x1  2  3  1

x3  4  4  1

x4  6  5  1

Each line is a cumulative sum. For example, 0+2= 2 then 0+2+2 = 4 then 0+2+2+2 = 6. I am looking for a function to de-cumulate. I tried to write:

import pandas as pd
def decumule(tableau):

    decu_table = np.zeros(tableau.shape)
    for ligne, element in enumerate(tableau.iloc()):
        print("ligne = ",ligne)
        for colon, elem in enumerate(tableau.iloc[ligne]):
            if ligne > 0:
                print("colonn",colon)
                decu_table.iloc[[ligne, colon]] = tableau.iloc[[ligne, colon]] - tableau.iloc[[ligne - 1, colon]]
            else:
                 decu_table.iloc[[ligne, colon]] = tableau.iloc[[ligne, colon]]
    return decu_table

tentative = data_matrice.apply(lambda tableau: decumule(tableau))

That produces:

for colon, elem in enumerate(tableau.iloc[ligne]):

TypeError: 'numpy.int64' object is not iterable

Do you have any idea what can go wrong?

Regards, Atapalou

3 Answers 3

2

There are tree problems with your code.

The first is with tentative = data_matrice.apply(lambda tableau: decumule(tableau)), your function decumule expects a dataframe as input, when you call apply though the function is applied one row at a time so it gets only a row as input. This is easily fixable by just changing it to tentative = decumule(data_matrix).

The second problem is with the indexing in the .iloc, if you call df.iloc[[1, 2]] you get full rows number 1 and 2. To get the element at index (1, 2) instead you need df.iloc[1, 2].

The third (and simplest) is that decu_table is created as a numpy ndarray which does not support .iloc. To fix this just convert it to a pandas dataframe

Fixing both problems you get:

import numpy as np
import pandas as pd

data_matrix = {"x0": [0, 2, 0], "x1": [2, 3, 1], "x3": [4, 4, 1], "x4": [6, 5, 1]}

data_matrix = pd.DataFrame(data_matrix)
data_matrix = data_matrix.T


def decumule(tableau):
    decu_table = pd.DataFrame(
        np.zeros(tableau.shape), columns=tableau.columns, index=tableau.index
    )
    for ligne, element in enumerate(tableau.iloc()):
        print("ligne = ", ligne)
        for colon, elem in enumerate(tableau.iloc[ligne]):
            if ligne > 0:
                print("colonn", colon)
                decu_table.iloc[ligne, colon] = (
                    tableau.iloc[ligne, colon] - tableau.iloc[ligne - 1, colon]
                )
            else:
                decu_table.iloc[ligne, colon] = tableau.iloc[ligne, colon]
    return decu_table


tentative = decumule(data_matrix)

That produces:

     0    1    2
0  0.0  2.0  0.0
1  2.0  1.0  1.0
2  2.0  1.0  0.0
3  2.0  1.0  0.0

Just as an additional note you are performing the calculations one cell at a time. This can be simplified by calculating one row at a time like this:

def decumule(tableau):
    decu_table = pd.DataFrame(
        np.zeros(tableau.shape), columns=tableau.columns, index=tableau.index
    )
    for ligne, element in enumerate(tableau.iloc()):
        print("ligne = ", ligne)
        if ligne > 0:
            decu_table.iloc[ligne] = tableau.iloc[ligne] - tableau.iloc[ligne - 1]
        else:
            decu_table.iloc[ligne] = tableau.iloc[ligne]
    return decu_table

Wich gives the same result as the previous but is faster

Sign up to request clarification or add additional context in comments.

Comments

1

De-cumulation is commonly referred to as an array diff operation. pandas also can perform diffs on its base objects as well, so you can get your desired result like so:

print(
    data_matrice.diff()
)
      0    1    2
x0  NaN  NaN  NaN
x1  2.0  1.0  1.0
x3  2.0  1.0  0.0
x4  2.0  1.0  0.0

Then if you want those NaN values in the top row to match the original values from data_matrice you can add in a call to .fillna

print(
    data_matrice.diff().fillna(data_matrice.iloc[0])
)
      0    1    2
x0  0.0  2.0  0.0
x1  2.0  1.0  1.0
x3  2.0  1.0  0.0
x4  2.0  1.0  0.0

Comments

0

By making a detour through numpy, another solution is:

def decumule(tableau):
    decu_table = pd.DataFrame(
        np.zeros(tableau.shape), columns=tableau.columns, index=tableau.index
    )
    for ligne, element in enumerate(tableau.iloc()):
        print("ligne = ", ligne)
        if ligne > 0:
            decu_table.iloc[ligne] = tableau.iloc[ligne] - tableau.iloc[ligne - 1]
        else:
            decu_table.iloc[ligne] = tableau.iloc[ligne]
    return pd.DataFrame(decu_table)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.