0

I have a written a python function that takes a data frame as one of the arguments. Below is the simplified version of the function:

def cat_update(df_to_update, df_source, cat_lst, con_lst):
    try:
        for cat, con in itertools.product(cat_lst, con_lst):
            df_to_update.at[cat,  con] = df_source.at[cat,  con]

Below is how I am calling this function:

cat_update(df_templete1, raw_source, cat_lst, con_lst)

Now, I need to scale my code where there can be multiple source data frames (raw_source)

How do I specify a variable here so that instead of specifying the actual data frame value I can change it as per the requirement?

I tried specifying assigning the value of the variable as follows:

raw_source = 'df_source_1'

But in this case, it goes as a string and not as a data frame hence the function is not able to evaluate it as per expectations. In short, I need to change it from str to pandas.core.frame.DataFrame

More information: I call the above function inside a for loop:

for n in range(len(df_config)):
    cat_lst = df_config.at[n,'category'].split(",")
    con_lst = df_config.at[n,'country'].split(",")
    raw_source = df_config.at[n,'Raw source']
    energy_source = df_config.at[n,'Energy source']

Hence the source data frame is picked up automatically from user input which is saved in the df_config.

4
  • Could you please also add raw_cat_update method? Commented Oct 6, 2021 at 6:18
  • I updated the call function name. It was a typo. Commented Oct 6, 2021 at 6:20
  • Don't pass the variable name -- that has no meaning at all at runtime. Instead, just pass the dataframe itself. Use a dict to map a name to a dataframe if you need to, but that seems unlikely in this case. Commented Oct 6, 2021 at 6:22
  • i have multiple source dataframes. If i use the above, i will have to write same function multiple times like: cat_update(df_templete1, df_source_1, cat_lst, con_lst); cat_update(df_templete1, df_source_2, cat_lst, con_lst) Also, which source is to be used is conditional so I was trying to write a generic function. Commented Oct 6, 2021 at 6:23

1 Answer 1

1

Create a dictionary like this: {"data_frame_name" : data_frame}, so that you can access each data_frame by it's name, and assume we have a data_src_1 data, like below:

data_src_1 = [['Alex',10],['Bob',12],['Clarke',13]]
df_source_1 = pd.DataFrame(data_src_1)
raw_sources = {"df_source_1" : df_source_1}    # You can have other dataframes here

Pass the name of data frame you want df_source to the cat_update method, and edit the method like this:

raw_sources = {"df_source_1" : df_source_1, ...}
def cat_update(df_to_update, df_source, cat_lst, con_lst):
    try:
        for cat, con in itertools.product(cat_lst, con_lst):
            df_to_update.at[cat,  con] = raw_sources[df_source].at[cat,  con]

However, you could just pass the data frame such as df_source_1 it self to the method, but in the above snippet, you can have all data frames altogether in one dictionary (raw_sources).

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.