Pandas dataframe append rows based on each sequential row

Question

i've got a dataframe with three columns. Each row needs to be copied and altered twice based on the values in that specific row and column. However, the values in the other columns need to stay the same.

I've managed to make the dataframe, as follows:

df = pd.DataFrame({'Value': list(range(3)), 'Value2': list(range(3)), 'Value3':['A','B','C']})

idx = df['Value'].index

# construct dataframe to append
df_extra1 = df.loc[idx].copy()
df_extra2 = df.loc[idx].copy()
df_extra3 = df.loc[idx].copy()
df_extra4 = df.loc[idx].copy()


# add 3 seconds
df_extra1['Value'] = df_extra1['Value'] + 0.1
df_extra2['Value'] = df_extra2['Value'] - 0.1
df_extra3['Value2'] = df_extra3['Value2'] + 0.1
df_extra4['Value2'] = df_extra4['Value2'] - 0.1

# append to original
res1 = df.append(df_extra1)
res2 = res1.append(df_extra2)
res3 = res2.append(df_extra3)
res4 = res3.append(df_extra4)

This is what the result is and should look like:

   Value  Value2 Value3
0    0.0     0.0      A
1    1.0     1.0      B
2    2.0     2.0      C
0    0.1     0.0      A
1    1.1     1.0      B
2    2.1     2.0      C
0   -0.1     0.0      A
1    0.9     1.0      B
2    1.9     2.0      C
0    0.0     0.1      A
1    1.0     1.1      B
2    2.0     2.1      C
0    0.0    -0.1      A
1    1.0     0.9      B
2    2.0     1.9      C

Is there anyway to speed this up or make it more concise?

Directly accessing individual cells in Pandas will be glacial, just due to performance issues due to fancy indexing. Merely exporting the column to a numpy array, and then providing an operation on each row in that array, and then re-assigning the array will be orders of magnitude faster. — Alex Huszagh
– Alex Huszagh, Commented Aug 23, 2019 at 21:49
If you can apply uniform logic to the function (depends only on values), you can use df.apply which should be quite performant, but that does not look to be the case here: pandas.pydata.org/pandas-docs/stable/reference/api/… — Alex Huszagh
– Alex Huszagh, Commented Aug 23, 2019 at 21:51
For future readers: see cs95's answer (stackoverflow.com/questions/16476924/…), also (engineering.upside.com/…). — angrymantis
– angrymantis, Commented Aug 23, 2019 at 22:46

dan_g · Accepted Answer · 2019-08-23 22:22:52Z

1

It's not entirely clear what you're trying to do, but based on the example you provide you could simplify this by iterating over the product of the columns you're trying to update and the updates you're trying to apply:

import pandas as pd
from itertools import product

df = pd.DataFrame({'Value': list(range(3)), 'Value2': list(range(3)), 'Value3':['A','B','C']})

to_alter = ['Value', 'Value2']
constants = [0.1, -0.1]

dfs = [df, ]
for col, const in product(to_alter, constants):
    t = df.copy()
    t[col] += const
    dfs.append(t)

result = pd.concat(dfs)

By appending you're copying your dataframe repeatedly, which is not ideal, especially since you're already creating copies at the start.

edited Aug 23, 2019 at 22:22

answered Aug 23, 2019 at 22:17

dan_g

2,8156 gold badges30 silver badges44 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Pandas dataframe append rows based on each sequential row

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related