6

I am looking to insert a row into a dataframe between two existing rows based on certain criteria.

For example, my data frame:

import pandas as pd
df = pd.DataFrame({'Col1':['A','B','D','E'],'Col2':['B', 'C', 'E', 'F'], 'Col3':['1', '1', '1', '1']})

Which looks like:

    Col1    Col2    Col3
  0 A       B       1
  1 B       C       1
  2 D       E       1
  3 E       F       1

I want to be able to insert a new row between Index 1 and Index 2 given the condition:

n = 0
while n < len(df):
    (df.ix[n]['Col2'] == df.ix[n+1]['Col1']) == False
    Something, Something, insert row
    n+=1

My desired output table would look like:

    Col1    Col2    Col3
  0 A       B       1
  1 B       C       1
  2 C       D       1
  3 D       E       1
  4 E       F       1

I am struggling with conditional inserting of rows based on values in the previous and proceeding records. I ultimately want to preform the above exercise on my real world example which would include multiple conditions, and preserving the values of more than one column (in this example it was Col3, but in my real world it would be multiple columns)

3
  • It may be easier to insert columns instead of rows. Maybe you can transpose the dataframe first, insert the new data as a new column, and re-transpose to get your original table back. Just a guess. Commented Oct 17, 2016 at 17:15
  • What determines the content of the new row? Is it going to be "fixing" the sequence as in the toy example? Commented Oct 17, 2016 at 17:37
  • @Tammo Heeren, I'll give that a shot and see if that's beneficial. @ ASGM, The content of the new row would be that Col1 takes the value of Col2 from the previous row and Col2 would take the value of Col1 from the proceeding row, while taking the values of the previous row for all other columns. A good example is in my desired output table where I take my Col1 and Col2 values as C and D (Col2 of previous and Col1 of proceeding) while taking 1 as my value for Col3 (Col3 previous value). Let me know if that makes any sense Commented Oct 17, 2016 at 17:40

1 Answer 1

5

UPDATE: memory saving method - first set a new index with a gap for a new row:

In [30]: df
Out[30]:
  Col1 Col2 Col3
0    A    B    1
1    B    C    1
2    D    E    1
3    E    F    1

if we want to insert a new row between indexes 1 and 2, we split the index at position 2:

In [31]: idxs = np.split(df.index, 2)

set a new index (with gap at position 2):

In [32]: df.set_index(idxs[0].union(idxs[1] + 1), inplace=True)

In [33]: df
Out[33]:
  Col1 Col2 Col3
0    A    B    1
1    B    C    1
3    D    E    1
4    E    F    1

insert new row with index 2:

In [34]: df.loc[2] = ['X','X',2]

In [35]: df
Out[35]:
  Col1 Col2 Col3
0    A    B    1
1    B    C    1
3    D    E    1
4    E    F    1
2    X    X    2

sort index:

In [38]: df.sort_index(inplace=True)

In [39]: df
Out[39]:
  Col1 Col2 Col3
0    A    B    1
1    B    C    1
2    X    X    2
3    D    E    1
4    E    F    1

PS you also can insert DataFrame instead of single row using df.append(new_df):

In [42]: df
Out[42]:
  Col1 Col2 Col3
0    A    B    1
1    B    C    1
2    D    E    1
3    E    F    1

In [43]: idxs = np.split(df.index, 2)

In [45]: new_df = pd.DataFrame([['X', 'X', 10], ['Y','Y',11]], columns=df.columns)

In [49]: new_df.index += idxs[1].min()

In [51]: new_df
Out[51]:
  Col1 Col2  Col3
2    X    X    10
3    Y    Y    11

In [52]: df = df.set_index(idxs[0].union(idxs[1]+len(new_df)))

In [53]: df
Out[53]:
  Col1 Col2 Col3
0    A    B    1
1    B    C    1
4    D    E    1
5    E    F    1

In [54]: df = df.append(new_df)

In [55]: df
Out[55]:
  Col1 Col2 Col3
0    A    B    1
1    B    C    1
4    D    E    1
5    E    F    1
2    X    X   10
3    Y    Y   11

In [56]: df.sort_index(inplace=True)

In [57]: df
Out[57]:
  Col1 Col2 Col3
0    A    B    1
1    B    C    1
2    X    X   10
3    Y    Y   11
4    D    E    1
5    E    F    1

OLD answer:

One (among many) way to achieve that would be to split your DF and concatenate it together with needed DF in desired order:

Original DF:

In [12]: df
Out[12]:
  Col1 Col2 Col3
0    A    B    1
1    B    C    1
2    D    E    1
3    E    F    1

let's split it into two parts ([0:1], [2:end]):

In [13]: dfs = np.split(df, [2])

In [14]: dfs
Out[14]:
[  Col1 Col2 Col3
 0    A    B    1
 1    B    C    1,   Col1 Col2 Col3
 2    D    E    1
 3    E    F    1]

now we can concatenate it together with a new DF in desired order:

In [15]: pd.concat([dfs[0], pd.DataFrame([['C','D', 1]], columns=df.columns), dfs[1]], ignore_index=True)
Out[15]:
  Col1 Col2 Col3
0    A    B    1
1    B    C    1
2    C    D    1
3    D    E    1
4    E    F    1
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.