How to create dataframe in pandas that contains Null values

Question

I try to create below dataframe that deliberately lacks some piece of information. That is, type shall be empty for one record.

df = {'id': [1, 2, 3, 4, 5],
      'created_at': ['2020-02-01', '2020-02-02', '2020-02-02', '2020-02-02', '2020-02-03'],
      'type': ['red', NaN, 'blue', 'blue', 'yellow']}

df = pd.DataFrame (df, columns = ['id', 'created_at','type', 'converted_tf'])

Works perfectly fine when I put all the values but I keep getting errors with NaN, Null, Na, etc.

Any idea what I have to put?

Use None instead.

CodeIt
– CodeIt

2020-03-03 14:10:58 +00:00
Commented Mar 3, 2020 at 14:10 — CodeIt
– CodeIt, Commented Mar 3, 2020 at 14:10
Yepp, None was exactly what i was looking for. Thanks!

LeroyFromBerlin
– LeroyFromBerlin

2020-03-03 14:29:56 +00:00
Commented Mar 3, 2020 at 14:29 — LeroyFromBerlin
– LeroyFromBerlin, Commented Mar 3, 2020 at 14:29

CodeIt · Accepted Answer · 2020-03-03 14:25:46Z

NaN, Null, Na doesn't not represent an absence of value.

Use Python's None Object to represent absence of value.

import pandas as pd

df = {'id': [1, 2, 3, 4, 5],
      'created_at': ['2020-02-01', '2020-02-02', '2020-02-02', '2020-02-02', '2020-02-03'],
      'type': ['red', None, 'blue', 'blue', 'yellow']}

df = pd.DataFrame (df, columns = ['id', 'created_at','type', 'converted_tf'])

If you try to print the df, you'll get the following output:

   id  created_at    type converted_tf
0   1  2020-02-01     red          NaN
1   2  2020-02-02    None          NaN
2   3  2020-02-02    blue          NaN
3   4  2020-02-02    blue          NaN
4   5  2020-02-03  yellow          NaN

So, you may now think that NaN and None are different. Pandas uses NaN as a placeholder for missing values, i.e instead of showing None it shows NaN which is more readable. Read more about this here.

Now let's trying fillna function,

df.fillna('')  # filling None or NaN values with empty string

You can see that both NaN and None got replaced by empty string.

   id  created_at    type converted_tf
0   1  2020-02-01     red
1   2  2020-02-02
2   3  2020-02-02    blue
3   4  2020-02-02    blue
4   5  2020-02-03  yellow

jezrael · Accepted Answer · 2020-03-03 14:16:58Z

Use np.NaN if need missing value:

import numpy as np
import pandas as pd

df = {'id': [1, 2, 3, 4, 5],
      'created_at': ['2020-02-01', '2020-02-02', '2020-02-02', '2020-02-02', '2020-02-03'],
      'type': ['red', np.NaN, 'blue', 'blue', 'yellow']}

Or float('NaN') working too:

df = {'id': [1, 2, 3, 4, 5],
      'created_at': ['2020-02-01', '2020-02-02', '2020-02-02', '2020-02-02', '2020-02-03'],
      'type': ['red', float('NaN'), 'blue', 'blue', 'yellow']}

df = pd.DataFrame (df, columns = ['id', 'created_at','type', 'converted_tf'])
print (df)
   id  created_at    type converted_tf
0   1  2020-02-01     red          NaN
1   2  2020-02-02     NaN          NaN
2   3  2020-02-02    blue          NaN
3   4  2020-02-02    blue          NaN
4   5  2020-02-03  yellow          NaN

Or use None, it most time working same like np.NaN if processing data in pandas:

df = {'id': [1, 2, 3, 4, 5],
      'created_at': ['2020-02-01', '2020-02-02', '2020-02-02', '2020-02-02', '2020-02-03'],
      'type': ['red', None, 'blue', 'blue', 'yellow']}

df = pd.DataFrame (df, columns = ['id', 'created_at','type', 'converted_tf'])
print (df)
   id  created_at    type converted_tf
0   1  2020-02-01     red          NaN
1   2  2020-02-02    None          NaN
2   3  2020-02-02    blue          NaN
3   4  2020-02-02    blue          NaN
4   5  2020-02-03  yellow          NaN

Collectives™ on Stack Overflow

How to create dataframe in pandas that contains Null values

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related