Python Pandas - How to check a value in DataFrame

Question

How do I find a missing line in the dataframe and add a new one?

The DataFrame df

    federalState    hasParking  Size
0   A               False       154
1   A               True        531
2   B               False       191
3   B               True        725
4   C               True        54
5   D               False       100
6   D               True        656

For df['federalState'] the false for C is missing

The final result should look like this

    federalState    hasParking  Size
0   A               False       154
1   A               True        531
2   B               False       191
3   B               True        725
4   C               False       89
5   C               True        54
6   D               False       100
7   D               True        656

My code for adding the new line

df.loc[-1] = ['C', 'False' , 89]  # adding a row
df.index = df.index + 1  # shifting index
df = too.sort_values(by=['federalState'])  # sorting by index

But how do I find out that the line is missing? My if-statement does not work

if ((df['federalState']=='C) and (df['hasParking']=='True')).any():

'True' is not the value you should be comparing to

Paul H
– Paul H

2018-02-11 17:31:17 +00:00
Commented Feb 11, 2018 at 17:31 — Paul H
– Paul H, Commented Feb 11, 2018 at 17:31

jezrael · Accepted Answer · 2018-02-11 17:40:00Z

3

For chain condition use & for and. If hasParking is boolean == True should be omit.

There is difference between True - as boolean and 'True' as string, I think you need remove '' because boolean column.

if ((data['federalState']=='C') & (data['hasParking'])).any():
#same as
#if ((data['federalState']=='C') & (data['hasParking'] == True)).any():

And for first is possible after sorting add reset_index for default index:

df.loc[-1] = ['C', False , 89]  # adding a row
df = df.sort_values(by=['federalState']).reset_index(drop=True)
print (df)
  federalState  hasParking  Size
0            A       False   154
1            A        True   531
2            B       False   191
3            B        True   725
4            C        True    54
5            C       False    89
6            D       False   100
7            D        True   656

print (df.dtypes)
federalState    object
hasParking        bool
Size             int64
dtype: object

For find missing values use:

df1 = df.set_index(['federalState','hasParking'])['Size'].unstack().unstack().reset_index(name='val')
print (df1)
   hasParking federalState    val
0       False            A  154.0
1       False            B  191.0
2       False            C    NaN
3       False            D  100.0
4        True            A  531.0
5        True            B  725.0
6        True            C   54.0
7        True            D  656.0

a = df1.loc[df1['val'].isnull(), ['federalState','hasParking']]
print (a)
  federalState  hasParking
2            C       False

edited Feb 11, 2018 at 17:40

answered Feb 11, 2018 at 17:24

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

justintime Over a year ago

Thank you for your help. How do I insert the new line now?

justintime Over a year ago

I solved it with df_new = df1.fillna(0). Many thanks for the help

Tai · Accepted Answer · 2018-02-11 17:42:10Z

1

IIUC, you want to search within each lable of "federalState" column that whether there are some missing values.

To find elements that do not have the same unique values, you can first do groupby and then check unique elements in the hasParking column with nunique().

df.groupby("federalState")["hasParking"].nunique()
federalState
A    2
B    2
C    1
D    2
Name: hasParking, dtype: int64

To check existence of a particular element in a group, you can try

df.groupby("federalState")["hasParking"].apply(lambda g: g.isin([False]).any())

federalState
A     True
B     True
C    False    # does not contain False
D     True
Name: hasParking, dtype: bool

edited Feb 11, 2018 at 17:42

answered Feb 11, 2018 at 17:35

Tai

8,0643 gold badges31 silver badges50 bronze badges

Collectives™ on Stack Overflow

Python Pandas - How to check a value in DataFrame

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related