0

Suppose I am having a data like this:

Length    Width    Height 
100        140       100
120        150       110
140        160       120
160        170       130 
170        190       140
200        200       150
210        210       160
220        220       170

Now, I want to know the distribution of data in each column with a certain increment For example: If I want to see the distribution of data in Length column from 100 to 160 with an increment of 30 and I want to see the output like

Min   Max    count  Percentage  Remaining values(out the range which we have given)
100   130     1       12.5         7
130   160     2       25           5 

And how to draw the bar graph from it? Please help

1
  • Look up "histogram". That is the common English term for the construct you are looking for, and also a numpy and matplotlib function. Commented Mar 11, 2022 at 14:04

2 Answers 2

1

You can use pd.cut to achieve your goal:

out = df.groupby(pd.cut(df['Length'], np.arange(100, 160+1, 30)))['Length'] \
        .agg(**{'Min': 'min', 'Max': 'max', 'Count': 'count',
                'Percentage': lambda x: 100 * x.size / len(df),
                'Remaining': lambda x: len(df) - x.size})
print(out)

# Output
            Min  Max  Count  Percentage  Remaining
Length                                            
(100, 130]  120  120      1        12.5          7
(130, 160]  140  160      2        25.0          6
Sign up to request clarification or add additional context in comments.

Comments

0

IIUC, you could use pandas.cut:

(df.groupby(pd.cut(df['Length'], bins=[100,130,160]))
   ['Length'].agg(count='count')
   .assign(**{'Remaining value': lambda d: len(df)-d['count'],
              'Percentage': lambda d: d['count']/len(df)*100,
             })
)

output:

            count  Remaining value  Percentage
Length                                        
(100, 130]      1                7        12.5
(130, 160]      2                6        25.0

For graphing, you can do it automatically with many libraries.

Example with seaborn:

import seaborn as sns
sns.histplot(df, bins=[100,130,160,190,220])

output:

enter image description here

or

sns.displot(df.melt(), x='value', col='variable',
            kind='hist', bins=[100,130,160,190,220])

output:

enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.