0

I try to create below dataframe that deliberately lacks some piece of information. That is, type shall be empty for one record.

df = {'id': [1, 2, 3, 4, 5],
      'created_at': ['2020-02-01', '2020-02-02', '2020-02-02', '2020-02-02', '2020-02-03'],
      'type': ['red', NaN, 'blue', 'blue', 'yellow']}

df = pd.DataFrame (df, columns = ['id', 'created_at','type', 'converted_tf'])

Works perfectly fine when I put all the values but I keep getting errors with NaN, Null, Na, etc.

Any idea what I have to put?

2
  • Use None instead. Commented Mar 3, 2020 at 14:10
  • 1
    Yepp, None was exactly what i was looking for. Thanks! Commented Mar 3, 2020 at 14:29

2 Answers 2

5

NaN, Null, Na doesn't not represent an absence of value.


Use Python's None Object to represent absence of value.

import pandas as pd

df = {'id': [1, 2, 3, 4, 5],
      'created_at': ['2020-02-01', '2020-02-02', '2020-02-02', '2020-02-02', '2020-02-03'],
      'type': ['red', None, 'blue', 'blue', 'yellow']}

df = pd.DataFrame (df, columns = ['id', 'created_at','type', 'converted_tf'])

If you try to print the df, you'll get the following output:

   id  created_at    type converted_tf
0   1  2020-02-01     red          NaN
1   2  2020-02-02    None          NaN
2   3  2020-02-02    blue          NaN
3   4  2020-02-02    blue          NaN
4   5  2020-02-03  yellow          NaN

So, you may now think that NaN and None are different. Pandas uses NaN as a placeholder for missing values, i.e instead of showing None it shows NaN which is more readable. Read more about this here.

Now let's trying fillna function,

df.fillna('')  # filling None or NaN values with empty string

You can see that both NaN and None got replaced by empty string.

   id  created_at    type converted_tf
0   1  2020-02-01     red
1   2  2020-02-02
2   3  2020-02-02    blue
3   4  2020-02-02    blue
4   5  2020-02-03  yellow
Sign up to request clarification or add additional context in comments.

Comments

3

Use np.NaN if need missing value:

import numpy as np
import pandas as pd

df = {'id': [1, 2, 3, 4, 5],
      'created_at': ['2020-02-01', '2020-02-02', '2020-02-02', '2020-02-02', '2020-02-03'],
      'type': ['red', np.NaN, 'blue', 'blue', 'yellow']}

Or float('NaN') working too:

df = {'id': [1, 2, 3, 4, 5],
      'created_at': ['2020-02-01', '2020-02-02', '2020-02-02', '2020-02-02', '2020-02-03'],
      'type': ['red', float('NaN'), 'blue', 'blue', 'yellow']}

df = pd.DataFrame (df, columns = ['id', 'created_at','type', 'converted_tf'])
print (df)
   id  created_at    type converted_tf
0   1  2020-02-01     red          NaN
1   2  2020-02-02     NaN          NaN
2   3  2020-02-02    blue          NaN
3   4  2020-02-02    blue          NaN
4   5  2020-02-03  yellow          NaN

Or use None, it most time working same like np.NaN if processing data in pandas:

df = {'id': [1, 2, 3, 4, 5],
      'created_at': ['2020-02-01', '2020-02-02', '2020-02-02', '2020-02-02', '2020-02-03'],
      'type': ['red', None, 'blue', 'blue', 'yellow']}

df = pd.DataFrame (df, columns = ['id', 'created_at','type', 'converted_tf'])
print (df)
   id  created_at    type converted_tf
0   1  2020-02-01     red          NaN
1   2  2020-02-02    None          NaN
2   3  2020-02-02    blue          NaN
3   4  2020-02-02    blue          NaN
4   5  2020-02-03  yellow          NaN

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.