0

How can I make a summary of a data frame in Pandas, stacking individual operations.

For example, I used the following code:

 df=pd.DataFrame(wb)

# Get list with headers
header1 = list(df)
count=df.count()

NaNs=df.isnull().sum()
sum=df.sum(0)
mean=df.mean()
median=df.median()
min= df.min()
max= df.max()
standardeviation= df.std()
nints=df.dtypes

But I can only print them as individual results. I get something like this for each calculation:

Unnamed: 0                  60
region                      50
IV_bins                     60
N                           60
meanbase                    60
cash                        60
dtype: int64

Finally, I tried creating a summarytable=[] table at the beginning and trying something like summarytable.append(count) and so on with all the calculations, but couldn't get it right. What I am looking for is some table like this, which I believe also involves transposing the calculations:

          A    B 
Count     100  98
NANs      5    7
Mean      10   12.5
Median    14   8
...
Nints     95   96
NStr      5    2

One last thing to take into account. I noticed that for some calculations, like sum(), it doesn't make sense to count strings, so, when I print the results, the strings columns don't print anything. This is the result for print(sum): (Notice how region doesn't appear)

Unnamed: 0                                                               1830
IV_bins                     [0,2.31e+06](2.31e+06,5.7e+06](5.7e+06,1.07e+0...
N                                                                     3680163
meanbase                                                              3.46248
cash                                                              9.00091e+09
6
  • 1
    sum=df.sum(0), min= df.min(), max= df.max() - you just destroyed three useful built-in functions. Commented Feb 19, 2018 at 23:24
  • You show us a lot of outputs but not the code that produces them. Please include it. Also, what exactly is your question? Commented Feb 19, 2018 at 23:25
  • What do you mean I destroyed them? Those outputs are for simple print(count) and print(sum). What I am looking for is a summary table of all this functions, as in the example output I posted. Commented Feb 19, 2018 at 23:48
  • 1
    sum=df.sum(0) makes the buil-in function sum() unavailable (same with the outher two functions). Commented Feb 19, 2018 at 23:58
  • 3
    Have you tried df.describe() on your data? It will give you a statistical summary of all numeric columns in your data frame. Commented Feb 20, 2018 at 0:08

2 Answers 2

2

Seems like you may get use out of DataFrame.agg(), with which you can essentially build a customized .describe() output. Here's an example to get you started:

import pandas as pd
import numpy as np

df = pd.DataFrame({ 'object': ['a', 'b', 'c'],
                    'numeric': [1, 2, 3],
                    'numeric2': [1.1, 2.5, 50.],
                    'categorical': pd.Categorical(['d','e','f'])
                  })


def nullcounts(ser):
    return ser.isnull().sum()


def custom_describe(frame, func=[nullcounts, 'sum', 'mean', 'median', 'max'],
                    numeric_only=True, **kwargs):
    if numeric_only:
        frame = frame.select_dtypes(include=np.number)
    return frame.agg(func, **kwargs)


custom_describe(df)

            numeric   numeric2
nullcounts      0.0   0.000000
sum             6.0  53.600000
mean            2.0  17.866667
median          2.0   2.500000
max             3.0  50.000000
Sign up to request clarification or add additional context in comments.

2 Comments

If you want to use the quantile function instead of the median for the 99% percentile, how can you pass the q argument in the function?
@Michael follow what's done for nullcounts() here: def quantile(ser): return ser.quantile(). Then replace 'median' with quantile in the function
1

It seems like there is a library that does exactly that. Check out pandas-summary. For each column, it gives you the count, min,max,std,mean,variance,count of all, count of uniques, missing values, type of column, and much more.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.