0

Say I have a df:

df = pd.DataFrame({'A.C.1_v': [1, 2, 3], 'B': ['a', 'b', 'c'], 'C.C.1_f': [4, 5, 6], 'D': ['e', 'f', 'g'], 'E': [7, 8, 9]})

Noticed that the col of interest are those nmae includes "C.1_letter"

I have built a list corresponding of selected columns : col_list = [A.C.1_v, C.C.1_f]

Objective : Create several dataframes as follow (in this illustration only 2 dfs are built, but there could be much more in practice)

The first df

  1. Takes the name with the following convention name : "df_AC1_v"
  2. Is composed of the values of column A.C.1_v and the values of columns D and E

So, for df_AC1_v we would have the following output: output 1 without iteration

The second df

  1. Takes the name with the following convention name : "df_CC1_f"
  2. Is composed of the values of column C.C.1_f and the values of columns D and E So, for df_CC1_f, we would have the following output: Output2 without iteration

My point is to do this iteratively, but so far, what I have attempted does not work.

Here are the codes I have done. It bugs in the loop for and I do not understand why. First I extract the col list and create a list as follow:

col_list = list(df)
list_c1 = list(filter(lambda x:'.C.1' in x, col_list))
list_c1 = [str(r) for r in list_c1]

in: list_c1 out:['A.C.1_v', 'C.C.1_f']

Second I isolate the 'C.1'

list_c1_bis = []
for element in list_c1:
    stock = element.split('.C.1')
    list_c1_bis.append(stock)

in : list_c1_bis out:[['A', '_v'], ['C', '_f']]

Until now, I am happy. Where it bugs is the code below:

for line in list_c1_bis:
    name1 ='df'+'_'+line[0]+'C1'+line[1]
    vars()[name1] =  df[[list_c1[0],'D','E']]

My outputs are indeed as follow: in: df_AC1_v ==> OK correct out: output1

in: df_CC1_f ==> Wrong it has taken the inappropriate column A.C.1_v, instead of expected C.C.1_f output2

Your suggestions are welcome !

Thanks a lot for your time and help, that will be truly appreciated

nb : please feel free to modify the first steps that work if you think you have a better solution

Kindest regards

2
  • And the column B? Commented Mar 10, 2022 at 10:09
  • Please provide the expected output as code/text Commented Mar 10, 2022 at 13:25

2 Answers 2

2

I strongly discouraged you to create variables dynamically with vars, locals or globals. Prefer to use dictionary.

Try

for col in df.columns[df.columns.str.contains(r'[A-Z]\.[0-9]_[a-z]')]:
    name = col.replace('.', '')
    locals()[f"df_{name}"] = df[[col, 'D', 'E']]

Update

If f-strings are not available (Python < 3.6), replace locals()[f"df_{name}"] by locals()["df_{}".format(name)].

Output:

>>> df_AC1_v
   A.C.1_v  D  E
0        1  e  7
1        2  f  8
2        3  g  9

>>> df_CC1_f
   C.C.1_f  D  E
0        4  e  7
1        5  f  8
2        6  g  9

Alternative with dictionary:

dfs = {}
for col in df.columns[df.columns.str.contains(r'[A-Z]\.[0-9]_[a-z]')]:
    name = col.replace('.', '')
    dfs[name] = df[[col, 'D', 'E']]

Output:

>>> dfs['AC1_v']
   A.C.1_v  D  E
0        1  e  7
1        2  f  8
2        3  g  9

>>> dfs['CC1_f']
   C.C.1_f  D  E
0        4  e  7
1        5  f  8
2        6  g  9
Sign up to request clarification or add additional context in comments.

5 Comments

Maybe show the dictionary alternative as main answer as encouragement to use it ;)
The OP already uses vars(). I think he knows what he's doing :)
Still, this is a very bad practice to fight ;)
Hi Corralien, I have replied, so perhaps you can have a look. Kindest regards
Thanks Corralien, it works perfectly. Kindest regards
0

Hi Corralien and first let me thank you for your prompt reply that is truly appreciated.

I have tried the first code

for col in df.columns[df.columns.str.contains(r'[A-Z]\.[0-9]_[a-z]')]:
    name = col.replace('.', '')
    locals()[f"df_{name}"] = df[[col, 'D', 'E']]

But, I have the following error : File "", line 3 locals()[f"df_{name}"] = df[[col, 'D', 'E']] ^ SyntaxError: invalid syntax

I have also tried the second proposed code that gives the solution under dictionary.

dfs = {}
for col in df.columns[df.columns.str.contains(r'[A-Z]\.[0-9]_[a-z]')]:
    name = col.replace('.', '')
    dfs[name] = df[[col, 'D', 'E']]

It runs without error, but when I check the existence of the DFs in: df_AC1_v

I have the following errors : NameError: name 'df_AC1_v' is not defined

I understand that to get the df , it is required to write : dfs['AC1_v']

The second solution is acceptable, but I would prefer the first solution if it worked.

Kindest regards

1 Comment

Do you use a version of Python prior to 3.6?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.