Use pandas.IntervalIndex as dataframe index

Question

Consider this dataset:

cat > example_file.txt <<EOL
bins|1,1|2,2|3,4|5,6|7,9|10,12|13,18|19,24|25,36|37,54|55,81
1,1||431|501|651|861|1081|1461|1711|2071|2371|2531
2,2|||261|401|631|871|1291|1611|2001|2291|2431
3,4|||121|271|551|1011|1191|1511|1901|2211|2351
5,6||||101|361|901|1251|1301|1691|2011|2221
7,9|||||181|461|841|1151|1511|1821|2061
10,12||||||161|591|931|1291|1621|1821
13,18|||||||351|691|1091|1401|1571
19,24||||||||301|861|1451|1201
25,36|||||||||371|851|961
37,54||||||||||371|621
55,81|||||||||||351
EOL

which I import:

import pandas
example = pandas.read_csv('example_file.txt', sep = '|', index_col = 0)

Now, the row and column names of this table are actually interval bounds. I can have pandas recognize them as such:

col_bin = pandas.IntervalIndex.from_tuples([tuple(list(map(int, x.split(',')))) for x in example.columns])
row_bin = pandas.IntervalIndex.from_tuples([tuple(list(map(int, x.split(',')))) for x in example.index])
example.columns = col_bin
example.index   = row_bin

Now, I would like to find the cell value corresponding to value pairs such as (11, 13).

For example, for (11, 13) the cell value is 591.

This is because the row index value (11) belongs to the interval (10,12) [this is the 6th row] and the col index value (13) belongs to the interval (13,18) [this is the 7th column] and the value at (row, column) (6, 7) in the table is 591.

jezrael · Accepted Answer · 2018-10-15 08:08:10Z

You need first add parameter closed='both' to IntervalIndex.from_tuples, because intevals are closed in both sides:

col_bin = pd.IntervalIndex.from_tuples([tuple(list(map(int, x.split(',')))) for x in example.columns], closed='both')
row_bin = pd.IntervalIndex.from_tuples([tuple(list(map(int, x.split(',')))) for x in example.index], closed='both')
example.columns = col_bin
example.index   = row_bin
print (example)
         [1, 1]  [2, 2]  [3, 4]  [5, 6]  [7, 9]  [10, 12]  [13, 18]  \
[1, 1]       NaN   431.0   501.0   651.0   861.0    1081.0    1461.0   
[2, 2]       NaN     NaN   261.0   401.0   631.0     871.0    1291.0   
[3, 4]       NaN     NaN   121.0   271.0   551.0    1011.0    1191.0   
[5, 6]       NaN     NaN     NaN   101.0   361.0     901.0    1251.0   
[7, 9]       NaN     NaN     NaN     NaN   181.0     461.0     841.0   
[10, 12]     NaN     NaN     NaN     NaN     NaN     161.0     591.0   
[13, 18]     NaN     NaN     NaN     NaN     NaN       NaN     351.0   
[19, 24]     NaN     NaN     NaN     NaN     NaN       NaN       NaN   
[25, 36]     NaN     NaN     NaN     NaN     NaN       NaN       NaN   
[37, 54]     NaN     NaN     NaN     NaN     NaN       NaN       NaN   
[55, 81]     NaN     NaN     NaN     NaN     NaN       NaN       NaN   

          [19, 24]  [25, 36]  [37, 54]  [55, 81]  
[1, 1]      1711.0    2071.0    2371.0      2531  
[2, 2]      1611.0    2001.0    2291.0      2431  
[3, 4]      1511.0    1901.0    2211.0      2351  
[5, 6]      1301.0    1691.0    2011.0      2221  
[7, 9]      1151.0    1511.0    1821.0      2061  
[10, 12]     931.0    1291.0    1621.0      1821  
[13, 18]     691.0    1091.0    1401.0      1571  
[19, 24]     301.0     861.0    1451.0      1201  
[25, 36]       NaN     371.0     851.0       961  
[37, 54]       NaN       NaN     371.0       621  
[55, 81]       NaN       NaN       NaN       351

And then get positions by IntervalIndex.get_loc and select by DataFrame.iat:

tup = (11, 13)
pos1 = example.index.get_loc(tup[0])
pos2 = example.columns.get_loc(tup[1])

print (example.iat[pos1, pos2])
591.0

Thanks a lot, I could feel I was not far but I was missing the last mile;)

Collectives™ on Stack Overflow

Use pandas.IntervalIndex as dataframe index

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related