Iterating over pandas dataframes is painfully slow and inefficient, and you have reached the right conclusion to want to change it. There are multiple alternatives to looping, which is not exactly the same procedure but can accomplish almost anything that you'd want to do using a for loop.
My favorite approach is writing a custom function and then using .apply() to apply the function to your dataframe, either row-wise (axis=0) or column-wise (axis=1). Here is a simple example on a pandas series:
import pandas as pd
import numpy as np
data= np.random.rand(5,1)
df = pd.DataFrame(data=data)
def multiply(x):
return x*2
df['multiply'] = df.apply(multiply, axis=1)
print(df)
You can go further and use the same method on a wider dataframe, but you have to use lambda function in your apply method:
data= np.random.rand(5,3)
df = pd.DataFrame(data=data)
df
Output:
0 1 2
0 0.652011 0.358764 0.993731
1 0.690814 0.390272 0.627235
2 0.282919 0.237765 0.417414
3 0.123283 0.823570 0.992152
4 0.700824 0.604650 0.147833
def multiply(x1, x2, x3):
return x1*2 + x2*2 + x3*2
df['multiply'] = df.apply(lambda row: multiply(row[0], row[1], row[2]), axis=1)
df
Output:
0 1 2 multiply
0 0.652011 0.358764 0.993731 4.009011
1 0.690814 0.390272 0.627235 3.416642
2 0.282919 0.237765 0.417414 1.876196
3 0.123283 0.823570 0.992152 3.878009
4 0.700824 0.604650 0.147833 2.906615
Regarding the second part of your question, you can use pd.merge() if you need to merge on one or multiple id columns, or pd.join() and pd.concat() otherwise.