Data Profiling using python

Question

I have a data frame as below :

 member_id  |   loan_amnt   |  Age   | Marital_status
 AK219      |    49539.09   |  34    |  Married 
 AK314      |    1022454.00 |  37    |  NA
 BN204      |    75422.00   |  34    |  Single

I want to create an output file in the below format

 Columns       | Null Values | Duplicate |
 member_id     |  N          |   N       |
 loan_amnt     |  N          |   N       |
 Age           |  N          |   Y       |
 Marital Status|  Y          |   N       |

I know about one python package called PandasProfiling but I want build this in the above manner so that I can enhance my code with respect to the data sets.

@Ruturaj- I ran python package PandasProfiling ,it gave me the details about Null values , duplicate values, Maximum , min values. But I want to build this by my own. I need to enhance this further. — Aditya Sharma
– Aditya Sharma, Commented May 2, 2019 at 6:12

Jeril · Accepted Answer · 2019-05-02 06:13:55Z

2

Use something like:

m=df.apply(lambda x: x.duplicated())
n=df.isna()
df_new=(pd.concat([pd.Series(n.any(),name='Null_Values'),pd.Series(m.any(),name='Duplicates')],axis=1)
                     .replace({True:'Y',False:'N'}))

edited May 2, 2019 at 6:13

Jeril

8,6316 gold badges60 silver badges74 bronze badges

answered May 2, 2019 at 6:05

anky

75.3k11 gold badges46 silver badges76 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Rajat Jain · Accepted Answer · 2019-05-02 06:14:29Z

0

Here is python one-liner:

pd.concat([df.isnull().any() , df.apply(lambda x: x.count() != x.nunique())], 1).replace({True: "Y", False: "N"})

answered May 2, 2019 at 6:14

Rajat Jain

2,0422 gold badges17 silver badges23 bronze badges

Comments

Amitesh Verma · Accepted Answer · 2019-07-14 05:06:29Z

0

Actually the Pandas_Profiling gives you multiple options where you can figure out if there are repetitive values.

answered Jul 14, 2019 at 5:06

Amitesh Verma

112 bronze badges

Collectives™ on Stack Overflow

Data Profiling using python

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related