0

i've got a dataframe with three columns. Each row needs to be copied and altered twice based on the values in that specific row and column. However, the values in the other columns need to stay the same.

I've managed to make the dataframe, as follows:

df = pd.DataFrame({'Value': list(range(3)), 'Value2': list(range(3)), 'Value3':['A','B','C']})

idx = df['Value'].index

# construct dataframe to append
df_extra1 = df.loc[idx].copy()
df_extra2 = df.loc[idx].copy()
df_extra3 = df.loc[idx].copy()
df_extra4 = df.loc[idx].copy()


# add 3 seconds
df_extra1['Value'] = df_extra1['Value'] + 0.1
df_extra2['Value'] = df_extra2['Value'] - 0.1
df_extra3['Value2'] = df_extra3['Value2'] + 0.1
df_extra4['Value2'] = df_extra4['Value2'] - 0.1

# append to original
res1 = df.append(df_extra1)
res2 = res1.append(df_extra2)
res3 = res2.append(df_extra3)
res4 = res3.append(df_extra4)

This is what the result is and should look like:

   Value  Value2 Value3
0    0.0     0.0      A
1    1.0     1.0      B
2    2.0     2.0      C
0    0.1     0.0      A
1    1.1     1.0      B
2    2.1     2.0      C
0   -0.1     0.0      A
1    0.9     1.0      B
2    1.9     2.0      C
0    0.0     0.1      A
1    1.0     1.1      B
2    2.0     2.1      C
0    0.0    -0.1      A
1    1.0     0.9      B
2    2.0     1.9      C 

Is there anyway to speed this up or make it more concise?

3
  • 1
    Directly accessing individual cells in Pandas will be glacial, just due to performance issues due to fancy indexing. Merely exporting the column to a numpy array, and then providing an operation on each row in that array, and then re-assigning the array will be orders of magnitude faster. Commented Aug 23, 2019 at 21:49
  • If you can apply uniform logic to the function (depends only on values), you can use df.apply which should be quite performant, but that does not look to be the case here: pandas.pydata.org/pandas-docs/stable/reference/api/… Commented Aug 23, 2019 at 21:51
  • For future readers: see cs95's answer (stackoverflow.com/questions/16476924/…), also (engineering.upside.com/…). Commented Aug 23, 2019 at 22:46

1 Answer 1

1

It's not entirely clear what you're trying to do, but based on the example you provide you could simplify this by iterating over the product of the columns you're trying to update and the updates you're trying to apply:

import pandas as pd
from itertools import product

df = pd.DataFrame({'Value': list(range(3)), 'Value2': list(range(3)), 'Value3':['A','B','C']})

to_alter = ['Value', 'Value2']
constants = [0.1, -0.1]

dfs = [df, ]
for col, const in product(to_alter, constants):
    t = df.copy()
    t[col] += const
    dfs.append(t)

result = pd.concat(dfs)

By appending you're copying your dataframe repeatedly, which is not ideal, especially since you're already creating copies at the start.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.