Pandas eval with multi-index dataframes

Question

Consider a multi-index dataframe df:

A       bar                flux          
B       one     three       six     three
x  0.627915  0.507184  0.690787  1.166318
y  0.927342  0.788232  1.776677 -0.512259
z  1.000000  1.000000  1.000000  0.000000

I would like to use eval to substract ('bar', 'one') from ('flux', six'). Does the eval syntax support this type of index?

I think there is support for MI index but not MI columns, see github.com/pydata/pandas/pull/4164#issuecomment-24009601. A workaround/hack is to set the columns, do the query, reset the columns (since this is usually a cheap operation). — Andy Hayden
– Andy Hayden, Commented Feb 10, 2015 at 1:44

Daniel Oliveira · Accepted Answer · 2023-03-07 00:43:49Z

2

For a 2 level MultiIndex, you can use:

f"`('{level1}', '{level2}')`"

So your example would be

df.eval("`('bar', 'one')` = `('flux', 'six')`", inplace=True)

answered Mar 7, 2023 at 0:43

Daniel Oliveira

1296 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

James Mnatzaganian · Accepted Answer · 2015-03-14 03:50:48Z

You can do this without using eval by using the equivalent standard Python notation:

df['bar']['one'] - df['flux']['six']`

Take a look at this reference. Below is an example for you, based off the object in your question:

from pandas import DataFrame, MultiIndex

# Create the object
columns = [
    ('bar', 'one'),
    ('bar', 'three'),
    ('flux', 'six'),
    ('flux', 'three')
]
data    = [
    [0.627915, 0.507184, 0.690787, 1.166318],
    [0.927342, 0.788232, 1.776677, -0.512259],
    [1.000000, 1.000000, 1.000000, 0.000000]
]
index   = MultiIndex.from_tuples(columns, names=['A', 'B'])
df      = DataFrame(data, index=['x', 'y', 'z'], columns=index)

# Calculate the difference
sub = df['bar']['one'] - df['flux']['six']
print sub

# Assign that difference to a new column in the object
df['new', 'col'] = sub
print df

The corresponding result is:

A       bar                flux                 new
B       one     three       six     three       col
x  0.627915  0.507184  0.690787  1.166318 -0.062872
y  0.927342  0.788232  1.776677 -0.512259 -0.849335
z  1.000000  1.000000  1.000000  0.000000  0.000000

jcmatthews · Accepted Answer · 2021-10-20 09:49:15Z

Here's an example of a work-around that allows you to use tuple indexing in the DataFrame eval function. I know this is an old one, but I couldn't find a good answer to the original question.

from pandas import DataFrame, MultiIndex
import re

LEVEL_DELIMITER = "___"

def tuples_to_str(t):
    return LEVEL_DELIMITER.join(t)

def str_to_tuples(s):
    return tuple(s.split(LEVEL_DELIMITER))

def flatten_mi_var_expression(e):
    # Find match to multi-index variables and flatten
    tuple_re = r'\(.*?,.*?\)'
    for tuple_str in re.findall(tuple_re, e):
        e = e.replace(tuple_str, tuples_to_str(eval(tuple_str)))
    return e

# Create the object
columns = [
    ('bar', 'one'),
    ('bar', 'three'),
    ('flux', 'six'),
    ('flux', 'three')
]
data = [
    [0.627915, 0.507184, 0.690787, 1.166318],
    [0.927342, 0.788232, 1.776677, -0.512259],
    [1.000000, 1.000000, 1.000000, 0.000000]
]
index = MultiIndex.from_tuples(columns, names=['A', 'B'])
df = DataFrame(data, index=['x', 'y', 'z'], columns=index)

# Desired multi-index variable expression (using tuple indexes)
new_col = ('new', 'col')
mi_expression = f"{new_col} = {('flux', 'six')} + {('bar', 'one')}"

# Capture the original multi-index column object
mi_cols = df.columns

# Flatten the multi-index columns
df.columns = [LEVEL_DELIMITER.join(col) for col in df.columns.values]

# Convert multi-index variable expression to flattened indexing
flat_expression = flatten_mi_var_expression(mi_expression)

# Evaluate
df.eval(flat_expression, inplace=True)

# Append the new column to the original multi-index instance and assign to the DataFrame
df.columns = MultiIndex.from_tuples(mi_cols.tolist() + [new_col], names=mi_cols.names)

print(df)

This should provide the following.

A       bar                flux                 new
B       one     three       six     three       col
x  0.627915  0.507184  0.690787  1.166318  1.318702
y  0.927342  0.788232  1.776677 -0.512259  2.704019
z  1.000000  1.000000  1.000000  0.000000  2.000000

Not sure how safe this is with using python eval (which really isn't needed), but this example seems to work.

Collectives™ on Stack Overflow

Pandas eval with multi-index dataframes

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related