3

I am trying to find a way to look for the missing data in dataframe based on the data in my list. Each interface must have this 5 sub-interfaces.

sub_interface_list = ['1030', '1035', '1039', '1050', '1059']

df = pd.DataFrame({'Device': ['DeviceA', 'DeviceA', 'DeviceA', 'DeviceA', 'DeviceA', 'DeviceA', 'DeviceA', 'DeviceA', 'DeviceA'], 'Interface': ['Eth-Trunk100', 'Eth-Trunk100', 'Eth-Trunk100', 'Eth-Trunk100', 'Eth-Trunk100', 'Eth-Trunk101', 'Eth-Trunk101', 'Eth-Trunk101', 'Eth-Trunk101'], 'Sub_interface': ['1030', '1035', '1039', '1050', '1059', '1030', '1039', '1050', '1059']})

The dataframe looks like this

Device  Interface   Sub_interface
DeviceA Eth-Trunk100    1030
DeviceA Eth-Trunk100    1035
DeviceA Eth-Trunk100    1039
DeviceA Eth-Trunk100    1050
DeviceA Eth-Trunk100    1059
DeviceA Eth-Trunk101    1030
DeviceA Eth-Trunk101    1039
DeviceA Eth-Trunk101    1050
DeviceA Eth-Trunk101    1059

From the list we can see that Eth-Trunk101 is missing the sub_interface of 1035, and I would like to insert the 1035 into the last row of each interface. I know using dataframe.iterrows() and search for the missing element is easy, but is there any way in pandas that can be used without using the for loop?

** This is a test data set, my data is much bigger and using iteration will be very time consuming.

2
  • How is the data input into the df? It may be a better option to insert this prior to creating the df? Commented Mar 9, 2021 at 2:51
  • The data input is from another source, so I cannot assure the data is always correct and accurate. Commented Mar 9, 2021 at 3:04

2 Answers 2

4

You could use the complete function from pyjanitor to expose the missing values:

df.complete(['Interface', 'Sub_interface'])

      Interface Sub_interface   Device
0  Eth-Trunk100          1030  DeviceA
1  Eth-Trunk100          1035  DeviceA
2  Eth-Trunk100          1039  DeviceA
3  Eth-Trunk100          1050  DeviceA
4  Eth-Trunk100          1059  DeviceA
5  Eth-Trunk101          1030  DeviceA
6  Eth-Trunk101          1035      NaN
7  Eth-Trunk101          1039  DeviceA
8  Eth-Trunk101          1050  DeviceA
9  Eth-Trunk101          1059  DeviceA

You can use ffill to fill the null value:

df.complete(['Interface', 'Sub_interface']).ffill()

If you want to stick solely within Pandas (pyjanitor is a collection of convenient wrappers around Pandas), the solution below works well:

Create unique index of interface and sub_interface:

interface = pd.MultiIndex.from_product([df.Interface.unique(), 
                                        df.Sub_interface.unique()])

In [456]: interface
Out[456]: 
MultiIndex([('Eth-Trunk100', '1030'),
            ('Eth-Trunk100', '1035'),
            ('Eth-Trunk100', '1039'),
            ('Eth-Trunk100', '1050'),
            ('Eth-Trunk100', '1059'),
            ('Eth-Trunk101', '1030'),
            ('Eth-Trunk101', '1035'),
            ('Eth-Trunk101', '1039'),
            ('Eth-Trunk101', '1050'),
            ('Eth-Trunk101', '1059')],
           )

Set interface and sub_interface as index, reindex with interface and reset_index:

  df.set_index(['Interface', 'Sub_interface']).reindex(interface).reset_index()

 
      Interface Sub_interface   Device
0  Eth-Trunk100          1030  DeviceA
1  Eth-Trunk100          1035  DeviceA
2  Eth-Trunk100          1039  DeviceA
3  Eth-Trunk100          1050  DeviceA
4  Eth-Trunk100          1059  DeviceA
5  Eth-Trunk101          1030  DeviceA
6  Eth-Trunk101          1035      NaN
7  Eth-Trunk101          1039  DeviceA
8  Eth-Trunk101          1050  DeviceA
9  Eth-Trunk101          1059  DeviceA

Reindexing here works because the combination of interface and sub_interface is unique; if it is not unique, then a merge on outer is a better step; complete takes care of these checks in the background.

Also be wary of setting index with nulls; the Pandas docs suggests to avoid it - Although, so far with reindexing I have not noticed any issues.

You could also use unstack/stack, since the index is unique:

df.set_index(['Interface', 'Sub_interface']).unstack().stack(dropna = False).reset_index()

     Interface Sub_interface   Device
0  Eth-Trunk100          1030  DeviceA
1  Eth-Trunk100          1035  DeviceA
2  Eth-Trunk100          1039  DeviceA
3  Eth-Trunk100          1050  DeviceA
4  Eth-Trunk100          1059  DeviceA
5  Eth-Trunk101          1030  DeviceA
6  Eth-Trunk101          1035      NaN
7  Eth-Trunk101          1039  DeviceA
8  Eth-Trunk101          1050  DeviceA
9  Eth-Trunk101          1059  DeviceA
Sign up to request clarification or add additional context in comments.

Comments

1

One way is to pivot then stack:

(df.assign(dummy=1)
   .pivot_table(index=['Device','Interface'], columns='Sub_interface', 
                values='dummy', fill_value=1)
   .reindex(sub_interface_list, fill_value=1, axis=1)
   .stack().reset_index(name='dummy')
   .drop('dummy', axis=1)
)

Output:

    Device     Interface Sub_interface
0  DeviceA  Eth-Trunk100          1030
1  DeviceA  Eth-Trunk100          1035
2  DeviceA  Eth-Trunk100          1039
3  DeviceA  Eth-Trunk100          1050
4  DeviceA  Eth-Trunk100          1059
5  DeviceA  Eth-Trunk101          1030
6  DeviceA  Eth-Trunk101          1035
7  DeviceA  Eth-Trunk101          1039
8  DeviceA  Eth-Trunk101          1050
9  DeviceA  Eth-Trunk101          1059

2 Comments

it has a name param, check the latest pandas doc. pandas.pydata.org/pandas-docs/stable/reference/api/…
@QuangHoang, I think you are missing the code for sub_interface_list

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.