1

I am trying to append dynamically to a dataframe a single value that i am generating in a loop.

global results_df
results_df=pd.DataFrame()

avg =109

std_dev = 12

# Loop through many simulations
for i in range(1000):    
    # Choose random inputs 
    rev_sim = np.random.normal(avg, std_dev, 1).round(0)#Rounding to 0 decimals

    # Build the dataframe based on the inputs
    df_res = pd.DataFrame(data={'REV_SIM': rev_sim})
    results_df.append(df_res)

But my results_df is empty.

1
  • Could you please add your expected output? Commented Apr 1, 2019 at 16:52

2 Answers 2

3

You did not assign it back

for i in range(1000):    
    # Choose random inputs 
    rev_sim = np.random.normal(avg, std_dev, 1).round(0)#Rounding to 0 decimals

    # Build the dataframe based on the inputs
    df_res = pd.DataFrame(data={'REV_SIM': rev_sim})
    results_df=results_df.append(df_res)# assign it back 
Sign up to request clarification or add additional context in comments.

Comments

2

Why don't you try

import pandas as pd
import numpy as np

avg = 109
std_dev = 12

N  = 1000
rev_sim = np.random.normal(avg, std_dev, N).round(0)
df = pd.DataFrame({'REV_SIM':rev_sim})

UPDATE:

Timing

Wen-Ben's solution

%%timeit -n10
global results_df
results_df=pd.DataFrame()

for i in range(1000):    
    # Choose random inputs 
    rev_sim = np.random.normal(avg, std_dev, 1).round(0)#Rounding to 0 decimals

    # Build the dataframe based on the inputs
    df_res = pd.DataFrame(data={'REV_SIM': rev_sim})
    results_df=results_df.append(df_res)# assign it back 

1.08 s ± 36.5 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

My solution

%%timeit -n10
N  = 1000
rev_sim = np.random.normal(avg, std_dev, N).round(0)
result_df = pd.DataFrame({'REV_SIM':rev_sim})

748 µs ± 153 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

If you really need to generate entries via a loop it's better to define an array first and then append it to your df

%%timeit -n10
rev_sim = [np.random.normal(avg, std_dev, 1).round(0) for i in range(1000)]
result_df = pd.DataFrame({'REV_SIM':rev_sim})

6.55 ms ± 888 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

The last version is 8.64x slower than the one I proposed while the Wen-Ben's solution is ~1444x slower.

Pandas could get really slow with loops.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.