I have a dataset of 50000 farmers who are growing crop in some villages. I Have to figure out how many farmers in same survey number land and how much his crop area [output image attached]
Here is my dummy data set
df
Out[5]:
Name Village Survey_no Land_Area
0 Farmer_1 Village_1 26 0.33
1 Farmer_1 Village_1 26 0.40
2 Farmer_2 Village_1 26 0.30
3 Farmer_2 Village_1 26 0.40
4 Farmer_2 Village_1 26 0.50
5 Farmer_3 Village_1 26 0.52
6 Farmer_3 Village_1 26 0.40
7 Farmer_4 Village_1 151 0.23
8 Farmer_5 Village_1 151 0.25
9 Farmer_5 Village_1 151 0.10
Here is actual output required
Here is what I have so far:
df = (df.set_index(['Village','Survey_no', df.groupby(['Village','Survey_no']).cumcount().add(1)]).unstack().sort_index(axis=1, level=1))
df.columns = ['{}-{}'.format(x, y) for x, y in df.columns]
df = df.reset_index()
df
Village Survey_no Land_Area-1 ... Name-6 Land_Area-7 Name-7
0 Village_1 26 0.33 ... Farmer_3 0.4 Farmer_3
1 Village_1 151 0.23 ... NaN NaN NaN
The output is not correct, because I don't get actual farmers wise total area of the same land and number of farmers in the same land.
