1

I am converting my data frame to pivot table. Here's my Data frame.

 +----+---------------------+----
|    |   A|   B|   C   | D     |
|----+---------------------+-----
|  0 |   a|  OK| one   | col1  |
|  1 |   b|  OK| two   | col1  |
|  2 |   c|  OK| two   | col2  |
|  3 |   d|  OK| Four  | NaN   |
|  4 |   e|  OK| Five  | NaN   |
|  5 |   f|  OK| Six   | NaN   |
|  6 |   g| NaN| NaN   | Col3  |
|  7 |   h| NaN| NaN   | Col4  |
|  8 |   i| NaN| NaN   | Col5  |
+----+---------------------+-----

I.m doing-->

pivot_data = df.pivot(index='C', columns = 'D', values = 'B')

This is my output.

 +------------------+-------+-----------+-------------+-----
|      |   NaN|   Col1|   col2 |   col3 |   col4 |   col5 |
|------------------+-------+-----------+-------------+------
| NaN  |   NaN|    NaN|     NaN|     NaN|     NaN|     NaN|
| four |    OK|    NaN|     NaN|     NaN|     NaN|     NaN|
| six  |    OK|    NaN|     NaN|     NaN|     NaN|     NaN|
| one  |   NaN|     OK|     NaN|     NaN|     NaN|     NaN|
| two  |   NaN|     OK|      OK|     NaN|     NaN|     NaN|
| five |   OK |    NaN|     NaN|     NaN|     NaN|     NaN|
+------------------+-------+-----------+-------------+------

This is my desired output. When I'm using pivot_table instead of pivot I'm not getting rows and cols with all values NaN. But it is important to have all those rows/cols.

How can I achieve the below desired output.

 +------------------+-------+-----------+-----------
|      |  Col1|   col2 |   col3 |   col4 |   col5 |
|------------------+-------+-----------+------------
| four |   NaN|     NaN|     NaN|     NaN|     NaN|
| six  |   NaN|     NaN|     NaN|     NaN|     NaN|
| one  |    OK|     NaN|     NaN|     NaN|     NaN|
| two  |    OK|      OK|     NaN|     NaN|     NaN|
| five |   NaN|     NaN|     NaN|     NaN|     NaN|
+------------------+-------+-----------+------------

Thank you .

Update:

Updated data set which giving Value error: Index contains duplicate entries. Cannot reshape.

 +----+---------------------+-----------+-----------
|    |   A   |          B|     C|      D          |
|----+---------------------+-----------+------------
|  0 |  3957 |         OK| One  | TM-009.4        |
|  1 |  3957 |         OK| two  | TM-009.4        |
|  2 |  4147 |         OK| three| CERT008         |
|  3 |  3816 |         OK| four | FITEYE-04       |
|  4 |  3955 |         OK| five | TM-009.2        |
|  5 |  4147 |         OK| six  | CERT008         |
|  6 |  4147 |         OK| seven| CERT008         |
|  7 |  3807 |         OK| seven| EMT-038.4       |
|  8 |   nan |         OK| eight| nan             |
|  9 |   nan |         OK| nine | nan             |
| 10 |   nan |         OK| ten  | nan             |
| 11 |   nan |         OK| 11   | nan             |
| 12 |   nan |         OK| 12   | nan             |
| 13 |   nan |         OK| 13   | nan             |
| 14 |   nan |         OK| 14   | nan             |
| 15 |   nan |         OK| 14   | nan             |
| 16 |  3814 |       nan | nan  | FITEYE-02       |
| 17 |  3819 |       nan | nan  | FITEYE-08       |
| 18 |  3884 |       nan | nan  | TG-000.8        |
| 19 |  4087 |       nan | nan  | TM-042.1        |
+----+---------------------+-----------+-------------

1 Answer 1

2

You were almost there; after pivot, we just need to rename the axis using rename_axis and drop columns and index using drop which are not required.

Code

df[['C','D']] = df[['C','D']].fillna('NA') # To keep things simple while dropping col and index
df.pivot(index='C', columns = 'D', 
         values = 'B').rename_axis(index=None, columns=None).drop(columns='NA', index='NA')

Output

        col1    col2    col3    col4    col5
five    NaN     NaN     NaN     NaN     NaN
four    NaN     NaN     NaN     NaN     NaN
one     OK      NaN     NaN     NaN     NaN
six     NaN     NaN     NaN     NaN     NaN
two     OK      OK      NaN     NaN     NaN

UPDATE Issue is because of duplicate NaNs in the C column as we are dropping NaNs anyways from index we can drop duplicates or drop them completely at first. I have dropped duplicates in below solution, you can even drop them completely as per requirements.

Code

df[['C','D']] = df[['C','D']].fillna('NA')
df = df.drop_duplicates(['C'])
df.pivot(index = 'C', columns = 'D', values='B').rename_axis(index=None, columns=None).drop(columns='NA', index='NA')

Output

    CERT008 FITEYE-02   FITEYE-04   TM-009.2    TM-009.4
11  NaN     NaN         NaN         NaN         NaN
12  NaN     NaN         NaN         NaN         NaN
13  NaN     NaN         NaN         NaN         NaN
14  NaN     NaN         NaN         NaN         NaN
One NaN     NaN         NaN         NaN         OK
eight   NaN NaN         NaN         NaN         NaN
five    NaN NaN         NaN         OK          NaN
four    NaN NaN         OK          NaN         NaN
nine    NaN NaN         NaN         NaN         NaN
seven   OK  NaN         NaN         NaN         NaN
six     OK  NaN         NaN         NaN         NaN
ten     NaN NaN         NaN         NaN         NaN
three   OK  NaN         NaN         NaN         NaN
two     NaN NaN         NaN         NaN         OK
Sign up to request clarification or add additional context in comments.

5 Comments

Thank you for your response. It did really helped me. I have a large data set say 500 * 500. In that case I'm getting the below error. ValueError: Index contains duplicate entries, cannot reshape. Unfortunately I cannot put the data here. Can you please help me resolve that error.
I suppose that error is occurring because of repetetive NaN's in columns or index.
This error occurs when there are duplicates in Index. One option to solve it is reset_index() method over dataframe. This will create new index with sequence starting from 0 and convert existing index to a column.
I cannot resolve that even after adding reset_index() to my data frame. I updated same dataset which giving the error. Can you please check and help me with that! .
nw, let me take a look

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.