4

I am using Excel for this task now, but I was wondering if any of you know a way to find and insert missing sequence numbers in python.

Say I have a dataframe:

import pandas as pd

data = {'Sequence':  [1, 2, 4, 6, 7, 9, 10],
        'Value': ["x", "x", "x", "x", "x", "x", "x"]
        }

df = pd.DataFrame (data, columns = ['Sequence','Value'])

And now I want to use some code here to find missing sequence numbers in the column 'Sequence', and leave blank spaces at the column 'Values' for the rows of missing sequence numbers. To get the following output:

    print(df)

   Sequence Value
0         1     x
1         2     x
2         3     
3         4     x  
4         5     
5         6     x
6         7     x
7         8
8         9     x
9         10    x

Even better would be a solution in which you can also define the start and end of the sequence. For example when the sequence starts with 3 but you want it to start from 1 and end at 12. But a solution for only the first part will already help a lot. Thanks in advance!!

3 Answers 3

3

You can set_index and reindex using a range from the Sequence's min and max values:

(df.set_index('Sequence')
   .reindex(range(df.Sequence.iat[0],df.Sequence.iat[-1]+1), fill_value='')
   .reset_index())

   Sequence Value
0         1     x
1         2     x
2         3      
3         4     x
4         5      
5         6     x
6         7     x
7         8      
8         9     x
9        10     x
Sign up to request clarification or add additional context in comments.

Comments

2

Or do it by merging DataFrames:

    seq = [1, 2, 4, 6, 7, 9, 10]
    dfs0 = pd.DataFrame.from_dict({'Sequence':  seq, 'Value': ['x']*len(seq)})
    dfseq = pd.DataFrame.from_dict({'Sequence': range( min(seq), max(seq)+1 )})
              .merge(dfs0, on='Sequence', how='outer').fillna('')
    print(dfseq)


   Sequence Value
0         1     x
1         2     x
2         3      
3         4     x
4         5      
5         6     x
6         7     x
7         8      
8         9     x
9        10     x

Comments

1

You can try this :

Sequence = [1, 2, 4, 6, 7, 9, 10]
df = pd.DataFrame(np.arange(1,12), columns=["Sequence"])
df = df.loc[df.Sequence.isin(Sequence), 'Value'] = 'x'
df = df.fillna('')

First you create your DataFrame with the given range of values you want it to have for sequence. Then you set 'Value' to 'x' for the rows where 'Sequence' is in your Sequence list. Finally you fill the missing values with ''.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.