2

I am working with a multiIndex DataFrame and want to do several operations that I am struggling with:

a) I would like to apply several operations to a list (element-wise) without using for loops

b) I would like to extract the values of indices of my DataFrame and compare those values; before they have to be converted from object to int or float

c) I want to compare values within the DataFrame (without using for loops) and select values from either column depending on the value of that comparison

========================================================================

import pandas as pd
import numpy as np

idx = pd.IndexSlice
ix = pd.MultiIndex.from_product(
    [['2015', '2016', '2017', '2018'],
     ['2016', '2017', '2018', '2019', '2020'],
     ['A', 'B', 'C']],
    names=['SimulationStart', 'ProjectionPeriod', 'Group']
)

df = pd.DataFrame(np.random.randn(60, 1), index=ix, columns=['Origin'])
origin = df.loc[idx[:, :, :], 'Origin'].values

increase_over_base_percent = 0.3
increase_over_base_abs = 10
abs_level = 1
min_increase = 0.001

'Is there a way to do this comparison without using for loops?'
# The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
change = pd.Series(np.nan)
i = 0
for element in origin:
    change[i] = max(
        min(element * (1 + increase_over_base_percent),
            element + increase_over_base_abs,
            abs_level),
        element + min_increase)
    i += 1

print(change)


# Write results to a new column in the DataFrame ('Change')
df.loc[idx[:, :, :], 'Change'] = change

# Add data on 'Group' level
group_qualifier = [0, 0, 1]

# Is there a way to apply the group_qualifier to the group level without having to slice each index?
# Note: the formula does not work yet (results are to be reported in a new column of the DataFrame)
df.loc[idx[:], 'GroupQA'] = group_qualifier

'This is the part I am struggling with most (my index values are objects, not integers or floats;'
'and the comparison of values within the DataFrame does not work either)'
# Create new column 'Selected'; use origin values for all combinations where
# projectionPeriod < simulationStart & group_qualifier value == 0;
# use change values for all other combinations
values = df.index.get_level_values
mask = (values('ProjectionPeriod') - values('SimulationStart')) <= 1
mask = mask * df.loc[idx[:], 'GroupQA'].values
selected = df.loc[mask]
df.loc[idx[:, :, :], 'Selected'] = selected
4
  • Regarding (a), I don't see the for loop you want to avoid. Commented Nov 4, 2016 at 10:17
  • @IanS - sorry for the confusion. I edited the code to reflect the for loop I was talking about. Commented Nov 4, 2016 at 12:18
  • Thanks for accepting my answer. Would you need help with the other items? Commented Nov 7, 2016 at 9:43
  • @IanS Thanks for asking. I solved most of the above questions with lots of trial and error :-) Commented Nov 7, 2016 at 16:02

1 Answer 1

2

A partial answer for a):

df['Change'] = pd.concat([
    pd.concat([
        df.loc[:, 'Origin'] * (1 + increase_over_base_percent),
        df.loc[:, 'Origin'] + increase_over_base_abs,
    ], axis=1).min(axis=1).clip(upper=abs_level),
    df.loc[:, 'Origin'] + min_increase
], axis=1).max(axis=1)

The idea is to use pandas' min and max functions directly on the Origin series (with a little twist, using clip for abs_level).

Since pandas operations keep the index, you can directly assign the result to a column.


Edit: If you prefer, you can use the combine approach explained at the end of this question.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.