0

I have two really large dataframes that I'd like to merge, but can't because my computer's memory cannot handle it. Instead, I would like to split one of the dataframes into smaller dataframes by group (of which there are 195 groups, so 195 dataframes), and then join each of those 195 dataframes to the other large dataframe.

So far I have tried groupby:

split_data = list(bigdata1.groupby("GROUP"))

Which results in a list of the 195 dataframes

I would now like to know how to apply the joining function defined below to each of the dataframes, and have each of the dataframes separated and able to be called for subsequent manipulation (e.g. analysis). I am brand new to python so any assistance would be greatly appreciated. Thanks in advance

def joining_function(df):
    pd.merge(df, bigdata2, on = 'PERSON_ID', how = 'left')

1 Answer 1

1

First your function need to return the result:

def joining_function(df):
    return pd.merge(df, bigdata2, on = 'PERSON_ID', how = 'left')

Then a better way to store all the dataframes is in a dictionary using the group name as a key. Use the function at this moment

res = {gr:joining_function(dfg) for gr, dfg in bigdata1.groupby("GROUP")}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.