8

I'm working with a pandas DataFrame that represents a graph. The dataframe is indexed by a MultiIndex that indicates the node endpoints.

Setup:

import pandas as pd
import numpy as np
import itertools as it
edges = list(it.combinations([1, 2, 3, 4], 2))

# Define a dataframe to represent a graph
index = pd.MultiIndex.from_tuples(edges, names=['u', 'v'])
df = pd.DataFrame.from_dict({
    'edge_id': list(range(len(edges))),
    'edge_weight': np.random.RandomState(0).rand(len(edges)),
})
df.index = index
print(df)
## -- End pasted text --
     edge_id  edge_weight
u v                      
1 2        0       0.5488
  3        1       0.7152
  4        2       0.6028
2 3        3       0.5449
  4        4       0.4237
3 4        5       0.6459

I want to be able to index into the graph using an edge subset, which is why I've chosen to use a MultiIndex. I'm able to do this just fine as long as the input to df.loc is a list of tuples.

# Select subset of graph using list-of-tuple indexing
edge_subset1 = [edges[x] for x in [0, 3, 2]]
df.loc[edge_subset1]
## -- End pasted text --
     edge_id  edge_weight
u v                      
1 2        0       0.5488
2 3        3       0.5449
1 4        2       0.6028

However, when my list of edges is a numpy array (as it often is), or a list of lists, then I seem to be unable to use the df.loc property.

# Why can't I do this if `edge_subset2` is a numpy array?
edge_subset2 = np.array(edge_subset1)
df.loc[edge_subset2]
## -- End pasted text --
TypeError: unhashable type: 'numpy.ndarray'

It would be ok if I could just all arr.tolist(), but this results in a seemingly different error.

# Why can't I do this if `edge_subset2` is a numpy array?
# or if `edge_subset3` is a list-of-lists?
edge_subset3 = edge_subset2.tolist()
df.loc[edge_subset3]
## -- End pasted text --
TypeError: '[1, 2]' is an invalid key

It's a real pain to have to use list(map(tuple, arr.tolist())) every time I want to select a subset. It would be nice if there was another way to do this.

The main questsions are:

  • Why can't I use a numpy array with .loc? Is it because under the hood a dictionary is being used to map the multi-index labels to positional indices?

  • Why does a list-of-lists give a different error? Maybe its really the same problem its just caught a different way?

  • Is there another (ideally less-verbose) way to lookup a subset of a dataframe with a numpy array of multi-index labels that I'm unaware of?

1
  • Note that df.edge_id[edge_subset2] works - meaning this style of indexing is supported on a Series but not a DataFrame for some reason. Bizarrely, df.edge_id.loc[edge_subset2] fails too (for no reason, since it works without loc). I suggest submitting this to Pandas here: github.com/pandas-dev/pandas/issues Commented Jul 20, 2017 at 8:43

1 Answer 1

2

A dictionary keys are immutable, that's basically why you cant use a list of lists to access multi-index.

To be able to access multi-indexed data using loc you need to convert your numpy array to a list of tuples; tuples are immutable, one way to do so is using map as you mentioned

If you want to avoid using map and you're reading the edges form a csv file, you could read them into a data frame then use to_records with the index attribute set to False, Another way could be by creating a multi-index from the ndarray but you have to transpose the list before passing it so that each level is one list in the array

import pandas as pd   

df1 = df.loc[pd.MultiIndex.from_arrays(edge_subset2.T)]


print(df1)

#outputs
          edge_id    edge_weight
------  ---------  -------------
(1, 2)          0       0.548814
(2, 3)          3       0.544883
(1, 4)          2       0.602763

I found the article advanced multi-indexing in the pandas documentation very helpful

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.