1

How do I find a missing line in the dataframe and add a new one?

The DataFrame df

    federalState    hasParking  Size
0   A               False       154
1   A               True        531
2   B               False       191
3   B               True        725
4   C               True        54
5   D               False       100
6   D               True        656

For df['federalState'] the false for C is missing

The final result should look like this

    federalState    hasParking  Size
0   A               False       154
1   A               True        531
2   B               False       191
3   B               True        725
4   C               False       89
5   C               True        54
6   D               False       100
7   D               True        656

My code for adding the new line

df.loc[-1] = ['C', 'False' , 89]  # adding a row
df.index = df.index + 1  # shifting index
df = too.sort_values(by=['federalState'])  # sorting by index

But how do I find out that the line is missing? My if-statement does not work

if ((df['federalState']=='C) and (df['hasParking']=='True')).any():
1
  • 'True' is not the value you should be comparing to Commented Feb 11, 2018 at 17:31

2 Answers 2

3

For chain condition use & for and. If hasParking is boolean == True should be omit.

There is difference between True - as boolean and 'True' as string, I think you need remove '' because boolean column.

if ((data['federalState']=='C') & (data['hasParking'])).any():
#same as
#if ((data['federalState']=='C') & (data['hasParking'] == True)).any():

And for first is possible after sorting add reset_index for default index:

df.loc[-1] = ['C', False , 89]  # adding a row
df = df.sort_values(by=['federalState']).reset_index(drop=True)
print (df)
  federalState  hasParking  Size
0            A       False   154
1            A        True   531
2            B       False   191
3            B        True   725
4            C        True    54
5            C       False    89
6            D       False   100
7            D        True   656

print (df.dtypes)
federalState    object
hasParking        bool
Size             int64
dtype: object

For find missing values use:

df1 = df.set_index(['federalState','hasParking'])['Size'].unstack().unstack().reset_index(name='val')
print (df1)
   hasParking federalState    val
0       False            A  154.0
1       False            B  191.0
2       False            C    NaN
3       False            D  100.0
4        True            A  531.0
5        True            B  725.0
6        True            C   54.0
7        True            D  656.0

a = df1.loc[df1['val'].isnull(), ['federalState','hasParking']]
print (a)
  federalState  hasParking
2            C       False
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for your help. How do I insert the new line now?
I solved it with df_new = df1.fillna(0). Many thanks for the help
1

IIUC, you want to search within each lable of "federalState" column that whether there are some missing values.

To find elements that do not have the same unique values, you can first do groupby and then check unique elements in the hasParking column with nunique().

df.groupby("federalState")["hasParking"].nunique()
federalState
A    2
B    2
C    1
D    2
Name: hasParking, dtype: int64

To check existence of a particular element in a group, you can try

df.groupby("federalState")["hasParking"].apply(lambda g: g.isin([False]).any())

federalState
A     True
B     True
C    False    # does not contain False
D     True
Name: hasParking, dtype: bool

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.