I'm currently trying to create a new csv based on an existing csv.
I can't find a faster way to set values of a dataframe based on an existing dataframe values.
import pandas
import sys
import numpy
import time
# path to file as argument
path = sys.argv[1]
df = pandas.read_csv(path, sep = "\t")
# only care about lines with response_time
df = df[pandas.notnull(df['response_time'])]
# new empty dataframe
new_df = pandas.DataFrame(index = df["datetime"])
# new_df needs to have datetime as index
# and columns based on a combination
# of 2 columns name from previous dataframe
# (there are only 10 differents combinations)
# and response_time as values, so there will be lots of
# blank cells but I don't care
for i, row in df.iterrows():
start = time.time()
new_df.set_value(row["datetime"], row["name"] + "-" + row["type"], row["response_time"])
print(i, time.time() - start)
Original dataframe is:
datetime name type response_time
0 2018-12-18T00:00:00.500829 HSS_ANDROID audio 0.02430
1 2018-12-18T00:00:00.509108 HSS_ANDROID video 0.02537
2 2018-12-18T00:00:01.816758 HSS_TEST audio 0.03958
3 2018-12-18T00:00:01.819865 HSS_TEST video 0.03596
4 2018-12-18T00:00:01.825054 HSS_ANDROID_2 audio 0.02590
5 2018-12-18T00:00:01.842974 HSS_ANDROID_2 video 0.03643
6 2018-12-18T00:00:02.492477 HSS_ANDROID audio 0.01575
7 2018-12-18T00:00:02.509231 HSS_ANDROID video 0.02870
8 2018-12-18T00:00:03.788196 HSS_TEST audio 0.01666
9 2018-12-18T00:00:03.807682 HSS_TEST video 0.02975
new_df will look like this:
I takes 7ms per loop.
It takes an eternity to process a (only ?) 400 000 rows Dataframe. How can I make it faster ?

pivotcould be a solution for what you try to dodf.head(10).to_dict()and pate your output as text?