Fastest way to add rows to existing pandas dataframe

Question

I'm currently trying to create a new csv based on an existing csv.

I can't find a faster way to set values of a dataframe based on an existing dataframe values.

import pandas
import sys
import numpy
import time

# path to file as argument
path = sys.argv[1]
df = pandas.read_csv(path, sep = "\t")

# only care about lines with response_time
df = df[pandas.notnull(df['response_time'])]

# new empty dataframe
new_df = pandas.DataFrame(index = df["datetime"])    

# new_df needs to have datetime as index 
# and columns based on a combination 
# of 2 columns name from previous dataframe 
# (there are only 10 differents combinations)
# and response_time as values, so there will be lots of 
# blank cells but I don't care
for i, row in df.iterrows():
    start = time.time()
    new_df.set_value(row["datetime"], row["name"] + "-" + row["type"], row["response_time"])
    print(i, time.time() - start)

Original dataframe is:

                     datetime           name   type  response_time
0  2018-12-18T00:00:00.500829    HSS_ANDROID  audio        0.02430
1  2018-12-18T00:00:00.509108    HSS_ANDROID  video        0.02537
2  2018-12-18T00:00:01.816758       HSS_TEST  audio        0.03958
3  2018-12-18T00:00:01.819865       HSS_TEST  video        0.03596
4  2018-12-18T00:00:01.825054  HSS_ANDROID_2  audio        0.02590
5  2018-12-18T00:00:01.842974  HSS_ANDROID_2  video        0.03643
6  2018-12-18T00:00:02.492477    HSS_ANDROID  audio        0.01575
7  2018-12-18T00:00:02.509231    HSS_ANDROID  video        0.02870
8  2018-12-18T00:00:03.788196       HSS_TEST  audio        0.01666
9  2018-12-18T00:00:03.807682       HSS_TEST  video        0.02975

new_df will look like this:

I takes 7ms per loop.

It takes an eternity to process a (only ?) 400 000 rows Dataframe. How can I make it faster ?

you can use loc. sorry didnt have time to type a nice example for you. here is some documentation though: pandas.pydata.org/pandas-docs/stable/… — Jacobr365
– Jacobr365, Commented Dec 20, 2018 at 14:28
Give some input data as text, not picture. it is possible that using pivot could be a solution for what you try to do — Ben.T
– Ben.T, Commented Dec 20, 2018 at 14:30
We need to see the original dataframe (df) can you do df.head(10).to_dict() and pate your output as text? — It_is_Chris
– It_is_Chris, Commented Dec 20, 2018 at 14:35

Ben.T · Accepted Answer · 2018-12-20 14:57:41Z

Indeed, using pivot will do what you look for such as:

import pandas as pd
new_df = pd.pivot(df.datetime, df.name + '-' + df.type, df.response_time)
print (new_df.head())
                           HSS_ANDROID-audio  HSS_ANDROID-video  \
datetime                                                           
2018-12-18T00:00:00.500829             0.0243                NaN   
2018-12-18T00:00:00.509108                NaN            0.02537   
2018-12-18T00:00:01.816758                NaN                NaN   
2018-12-18T00:00:01.819865                NaN                NaN   
2018-12-18T00:00:01.825054                NaN                NaN   

                            HSS_ANDROID_2-audio  HSS_ANDROID_2-video  \
datetime                                                               
2018-12-18T00:00:00.500829                  NaN                  NaN   
2018-12-18T00:00:00.509108                  NaN                  NaN   
2018-12-18T00:00:01.816758                  NaN                  NaN   
2018-12-18T00:00:01.819865                  NaN                  NaN   
2018-12-18T00:00:01.825054               0.0259                  NaN   

                            HSS_TEST-audio  HSS_TEST-video  
datetime                                                    
2018-12-18T00:00:00.500829             NaN             NaN  
2018-12-18T00:00:00.509108             NaN             NaN  
2018-12-18T00:00:01.816758         0.03958             NaN  
2018-12-18T00:00:01.819865             NaN         0.03596  
2018-12-18T00:00:01.825054             NaN             NaN

and to not have NaN, you can use fillna with any value you want such as:

new_df = pd.pivot(df.datetime, df.name +'-'+df.type, df.response_time).fillna(0)

It_is_Chris · Accepted Answer · 2018-12-20 15:13:20Z

2

you can also use unstack as well just another option

new = df.set_index(['type','name', 'datetime']).unstack([0,1])
new.columns = ['{}-{}'.format(z,y) for x,y,z, in new.columns]

using f-strings will be a little faster than format:

new.columns = [f'{z}-{y}' for x,y,z, in new.columns]

edited Dec 20, 2018 at 15:13

answered Dec 20, 2018 at 15:07

It_is_Chris

14.2k3 gold badges27 silver badges45 bronze badges

Collectives™ on Stack Overflow

Fastest way to add rows to existing pandas dataframe

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related