Create Dataframes iteratively from columns of another Dataframe

Question

Say I have a df:

df = pd.DataFrame({'A.C.1_v': [1, 2, 3], 'B': ['a', 'b', 'c'], 'C.C.1_f': [4, 5, 6], 'D': ['e', 'f', 'g'], 'E': [7, 8, 9]})

Noticed that the col of interest are those nmae includes "C.1_letter"

I have built a list corresponding of selected columns : col_list = [A.C.1_v, C.C.1_f]

Objective : Create several dataframes as follow (in this illustration only 2 dfs are built, but there could be much more in practice)

The first df

Takes the name with the following convention name : "df_AC1_v"
Is composed of the values of column A.C.1_v and the values of columns D and E

So, for df_AC1_v we would have the following output: output 1 without iteration

The second df

Takes the name with the following convention name : "df_CC1_f"
Is composed of the values of column C.C.1_f and the values of columns D and E So, for df_CC1_f, we would have the following output: Output2 without iteration

My point is to do this iteratively, but so far, what I have attempted does not work.

Here are the codes I have done. It bugs in the loop for and I do not understand why. First I extract the col list and create a list as follow:

col_list = list(df)
list_c1 = list(filter(lambda x:'.C.1' in x, col_list))
list_c1 = [str(r) for r in list_c1]

in: list_c1 out:['A.C.1_v', 'C.C.1_f']

Second I isolate the 'C.1'

list_c1_bis = []
for element in list_c1:
    stock = element.split('.C.1')
    list_c1_bis.append(stock)

in : list_c1_bis out:[['A', '_v'], ['C', '_f']]

Until now, I am happy. Where it bugs is the code below:

for line in list_c1_bis:
    name1 ='df'+'_'+line[0]+'C1'+line[1]
    vars()[name1] =  df[[list_c1[0],'D','E']]

My outputs are indeed as follow: in: df_AC1_v ==> OK correct out: output1

in: df_CC1_f ==> Wrong it has taken the inappropriate column A.C.1_v, instead of expected C.C.1_f output2

Your suggestions are welcome !

Thanks a lot for your time and help, that will be truly appreciated

nb : please feel free to modify the first steps that work if you think you have a better solution

Kindest regards

And the column B?

Corralien
– Corralien

2022-03-10 10:09:54 +00:00
Commented Mar 10, 2022 at 10:09 — Corralien
– Corralien, Commented Mar 10, 2022 at 10:09
Please provide the expected output as code/text

rpanai
– rpanai

2022-03-10 13:25:03 +00:00
Commented Mar 10, 2022 at 13:25 — rpanai
– rpanai, Commented Mar 10, 2022 at 13:25

Corralien · Accepted Answer · 2022-03-10 13:23:29Z

2

I strongly discouraged you to create variables dynamically with vars, locals or globals. Prefer to use dictionary.

Try

for col in df.columns[df.columns.str.contains(r'[A-Z]\.[0-9]_[a-z]')]:
    name = col.replace('.', '')
    locals()[f"df_{name}"] = df[[col, 'D', 'E']]

Update

If f-strings are not available (Python < 3.6), replace locals()[f"df_{name}"] by locals()["df_{}".format(name)].

Output:

>>> df_AC1_v
   A.C.1_v  D  E
0        1  e  7
1        2  f  8
2        3  g  9

>>> df_CC1_f
   C.C.1_f  D  E
0        4  e  7
1        5  f  8
2        6  g  9

Alternative with dictionary:

dfs = {}
for col in df.columns[df.columns.str.contains(r'[A-Z]\.[0-9]_[a-z]')]:
    name = col.replace('.', '')
    dfs[name] = df[[col, 'D', 'E']]

Output:

>>> dfs['AC1_v']
   A.C.1_v  D  E
0        1  e  7
1        2  f  8
2        3  g  9

>>> dfs['CC1_f']
   C.C.1_f  D  E
0        4  e  7
1        5  f  8
2        6  g  9

edited Mar 10, 2022 at 13:23

answered Mar 10, 2022 at 10:20

Corralien

121k8 gold badges44 silver badges69 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

mozway Over a year ago

Maybe show the dictionary alternative as main answer as encouragement to use it ;)

Corralien Over a year ago

The OP already uses vars(). I think he knows what he's doing :)

mozway Over a year ago

Still, this is a very bad practice to fight ;)

Laurent Salé Over a year ago

Hi Corralien, I have replied, so perhaps you can have a look. Kindest regards

Laurent Salé Over a year ago

Thanks Corralien, it works perfectly. Kindest regards

Laurent Salé · Accepted Answer · 2022-03-10 10:59:58Z

0

Hi Corralien and first let me thank you for your prompt reply that is truly appreciated.

I have tried the first code

for col in df.columns[df.columns.str.contains(r'[A-Z]\.[0-9]_[a-z]')]:
    name = col.replace('.', '')
    locals()[f"df_{name}"] = df[[col, 'D', 'E']]

But, I have the following error : File "", line 3 locals()[f"df_{name}"] = df[[col, 'D', 'E']] ^ SyntaxError: invalid syntax

I have also tried the second proposed code that gives the solution under dictionary.

dfs = {}
for col in df.columns[df.columns.str.contains(r'[A-Z]\.[0-9]_[a-z]')]:
    name = col.replace('.', '')
    dfs[name] = df[[col, 'D', 'E']]

It runs without error, but when I check the existence of the DFs in: df_AC1_v

I have the following errors : NameError: name 'df_AC1_v' is not defined

I understand that to get the df , it is required to write : dfs['AC1_v']

The second solution is acceptable, but I would prefer the first solution if it worked.

Kindest regards

edited Mar 10, 2022 at 10:59

answered Mar 10, 2022 at 10:45

Laurent Salé

296 bronze badges

1 Comment

Corralien Over a year ago

Do you use a version of Python prior to 3.6?

Collectives™ on Stack Overflow

Create Dataframes iteratively from columns of another Dataframe

2 Answers 2

5 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related