I need to process geographic and statistical data from a big data csv. It contains data from geographical administrative and geostatistical. Municipality, Location, geostatistical basic division and block constitute the hierarchical indexes.
I have to create a new column ['data2'] for every element the max value of the data in the geo index, and divide each block value by that value. For each index level, and the index level value must be different from 0, because the 0 index level value accounts for other types of info not used in the calculation.
data1 data2
mun loc geo block
1 0 0 0 20 20
1 1 0 0 10 10
1 1 1 0 10 10
1 1 1 1 3 3/4
1 1 1 2 4 4/4
1 1 2 0 30 30
1 1 2 1 1 1/3
1 1 2 2 3 3/3
1 2 1 1 10 10/12
1 2 1 2 12 12/12
2 1 1 1 123 123/123
2 1 1 2 7 7/123
2 1 2 1 6 6/6
2 1 2 2 1 1/6
Any ideas? I have tried with for loops, converting the indexes in columns with reset_index() and iterating by column and row values but the computation is taking forever and I think that is not the correct way to do this kind of operations.
Also, what if I want to get my masks like this, so I can run my calculations to every level.
mun loc geo block
1 0 0 0 False
1 1 0 0 False
1 1 1 0 True
1 1 1 1 False
1 1 1 2 False
1 1 2 0 True
1 1 2 1 False
1 1 2 2 False
mun loc geo block
1 0 0 0 False
1 1 0 0 True
1 1 1 0 False
1 1 1 1 False
1 1 1 2 False
1 2 0 0 True
1 2 2 0 False
1 2 2 1 False
mun loc geo block
1 0 0 0 True
1 1 0 0 False
1 1 1 0 False
1 1 1 1 False
1 1 1 2 False
2 0 0 0 True
2 1 1 0 False
2 1 2 1 False
0? And in first row ofdf2it is (0 / max(0,0,7.15,9.85)) ? And in second (0/ ???) ? Can you add numbers for second and third row in output? Thanks. I think it is a bit unclear.value,value.. give sample data e.g.1,2,3,4,5and then formulas with numbers are (1 / 4) for first row, then (2 / 2) ? Can you extend sample with numbers and some rows (if necessary) for clarity? Thank you.