I try to merge 3 files which is about 3GB, 200Kb and 200kb using pandas, and my computer has 32G memory, but it still end with MemoryError. Is there any way to avoid this problem? My merge code is below:
product = pd.read_csv("../data/process_product.csv", header=0)
product["bandID"] = pd.factorize(product.Band)[0]
product = product.drop('Band', 1)
product = product.drop('Info', 1)
town_state = pd.read_csv("../data/town_state.csv", header=0)
dumies = pd.get_dummies(town_state.State)
town_state = pd.concat([town_state, dumies], axis=1)
town_state["townID"] = pd.factorize(town_state.Town)[0]
town_state = town_state.drop('State', 1)
town_state = town_state.drop('Town', 1)
train = pd.read_csv("../data/train.csv", header=0)
result = pd.merge(train, town_state, on="Agencia_ID", how='left')
result = pd.merge(result, product, on="Producto_ID", how='left')
result.to_csv("../data/train_data.csv")