Create dataframe mapping a list of arrays

Question

I have two dataframes, one with the input info and one with the output:

df_input:
index col1 col2
 0    'A'  'B'
 1    'B'  'H'
 2    'C'  'D'

df_output:
index vectors
 0    [[D, 0.5],[E, 0.3]]
 1    [[A, 0.3]]
 2    [[B, 0.8],[C, 0.5],[H, 0.2]]

The output its a array of arrays. Variable in quantity.

What I need is map the index and append every vector in a row, like this:

df:
index col1 col2 val1 val2
 0    'A'  'B'  'D'  0.5
 1    'A'  'B'  'E'  0.3
 2    'B'  'H'  'A'  0.3
 3    'C'  'D'  'B'  0.8
 4    'C'  'D'  'C'  0.5
 5    'C'  'D'  'H'  0.2

the df its very large so im trying to avoid a loop if its possible.

thank you in advance estimates.

I have to agree with @WeNYoBen. Are vectors in df_output lists or strings? — Erfan
– Erfan, Commented Jun 3, 2019 at 19:32
If they are lists, how are they seperated by a comma, without the lists being in a another list? So for example. the first row should look like: [[D, 0.5],[E, 0.3]] — Erfan
– Erfan, Commented Jun 3, 2019 at 19:49

Scott Boston · Accepted Answer · 2019-06-03 20:21:31Z

2

Where:

input_vectors = pd.DataFrame({'vectors':[[['D', .5],['E',.3]],
                                         [['A',.3]],
                                         [['B',.8],['C',.5],['H',.2]]]})
input_vectors

Output:

                          vectors
0            [[D, 0.5], [E, 0.3]]
1                      [[A, 0.3]]
2  [[B, 0.8], [C, 0.5], [H, 0.2]]

and

df_input

Output:

   index col1 col2
0      0    A    B
1      1    B    H
2      2    C    D

Use:

pd.concat([pd.DataFrame(x, index=[i]*len(x)) 
            for i, x in input_vectors.itertuples()])\
  .join(df_input)

Output:

   0    1  index col1 col2
0  D  0.5      0    A    B
0  E  0.3      0    A    B
1  A  0.3      1    B    H
2  B  0.8      2    C    D
2  C  0.5      2    C    D
2  H  0.2      2    C    D

edited Jun 3, 2019 at 20:21

answered Jun 3, 2019 at 20:08

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

jose_bacoy · Accepted Answer · 2019-06-03 21:06:15Z

Split the list of list into rows using stack function. Then for each row in vectors column, convert it into string and use split function to create two columns va1 and va2. Use concat to join the two dataframes via index column. Drop the column index since it is not needed in the final output.

import pandas as pd
my_dict = {'index':[0,1,2], 'col1':['A','B','C'], 'col2':['B','H','D']}
df_input = pd.DataFrame(my_dict)
my_dict = {'index':[0,1,2],'vectors':[[['D', 0.5],['E', 0.3]],[['A', 0.3]],[['B', 0.8],['C', 0.5],['H', 0.2]]]}
df_output = pd.DataFrame(my_dict)

df_output = df_output.vectors.apply(pd.Series).stack().rename('vectors')
df_output = df_output.to_frame().reset_index(1, drop=True).reset_index()
df_tmp = df_output.vectors.apply(lambda x: ','.join(map(str, x))).str.split(',', expand=True)
df_tmp.columns = ['va1','val2']
df_tmp = pd.concat([df_tmp, df_output['index']], axis=1, sort=False)
df_tmp = df_input.join(df_tmp.set_index('index'), on='index')
df_tmp.reset_index(drop=True).drop(columns=['index'])

Result:

  col1 col2 va1 val2
0   A   B   D   0.5
1   A   B   E   0.3
2   B   H   A   0.3
3   C   D   B   0.8
4   C   D   C   0.5
5   C   D   H   0.2

Collectives™ on Stack Overflow

Create dataframe mapping a list of arrays

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related