Loop optimization in Python using loc command

Question

I have a section of python code as shown:

# Main Loop that take values attributed to the row by row basis and sorts
# them into correpsonding columns based on matching the 'Name' and the newly
# generated column names.
listed_names=list(df_cv) #list of column names to reference later.
variable=listed_names[3:] #List of the 3rd to the last column. Column 1&2 are irrelevant.
for i in df_cv.index: #For each index in the Dataframe (DF)
     for m in variable: #For each variable in the list of variable column names
            if df_cv.loc[i,'Name']==m: #If index location in variable name is equal to the variable column name...
                df_cv.loc[i,m]=df_cv.loc[i,'Value'] #...Then that location is equal to the value in same row under the column 'Value'

Basically it takes a 3xn list of time/name/value and sorts it into an pandas df of size n by unique(n).

Time   Name    Value
1      Color   Red
2      Age     6
3      Temp    25
4      Age     1

Into this:

Time   Color   Age    Temp
1      Red     
2              6
3                     25
4              1

My code take a terribly long amount of time to run and I wanted to know if there is a better way to set up my loops. I come from a MATLAB background so the style of python (ie not using rows/column for everything is still alien).

How can I make this section of code run faster?

DSM · Accepted Answer · 2016-06-22 16:39:24Z

4

Instead of looping, think of it as a pivot operation. Assuming that Time is a column and not an index (if it is, just use reset_index):

In [96]: df
Out[96]: 
   Time   Name Value
0     1  Color   Red
1     2    Age     6
2     3   Temp    25
3     4    Age     1

In [97]: df.pivot(index="Time", columns="Name", values="Value")
Out[97]: 
Name   Age Color  Temp
Time                  
1     None   Red  None
2        6  None  None
3     None  None    25
4        1  None  None

In [98]: df.pivot(index="Time", columns="Name", values="Value").fillna("")
Out[98]: 
Name Age Color Temp
Time               
1          Red     
2      6           
3                25
4      1

This should be much faster on real datasets, and is simpler to boot.

answered Jun 22, 2016 at 16:39

DSM

355k67 gold badges606 silver badges504 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Loop optimization in Python using loc command

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related