0

I have two dataframes, one with the input info and one with the output:

df_input:
index col1 col2
 0    'A'  'B'
 1    'B'  'H'
 2    'C'  'D'

df_output:
index vectors
 0    [[D, 0.5],[E, 0.3]]
 1    [[A, 0.3]]
 2    [[B, 0.8],[C, 0.5],[H, 0.2]]

The output its a array of arrays. Variable in quantity.

What I need is map the index and append every vector in a row, like this:

df:
index col1 col2 val1 val2
 0    'A'  'B'  'D'  0.5
 1    'A'  'B'  'E'  0.3
 2    'B'  'H'  'A'  0.3
 3    'C'  'D'  'B'  0.8
 4    'C'  'D'  'C'  0.5
 5    'C'  'D'  'H'  0.2

the df its very large so im trying to avoid a loop if its possible.

thank you in advance estimates.

6
  • please show us what you have tried so far. thanks. Commented Jun 3, 2019 at 19:16
  • I am no sure it is list or string in your columns vector Commented Jun 3, 2019 at 19:24
  • I have to agree with @WeNYoBen. Are vectors in df_output lists or strings? Commented Jun 3, 2019 at 19:32
  • there are list @WeNYoBen Commented Jun 3, 2019 at 19:37
  • If they are lists, how are they seperated by a comma, without the lists being in a another list? So for example. the first row should look like: [[D, 0.5],[E, 0.3]] Commented Jun 3, 2019 at 19:49

2 Answers 2

2

Where:

input_vectors = pd.DataFrame({'vectors':[[['D', .5],['E',.3]],
                                         [['A',.3]],
                                         [['B',.8],['C',.5],['H',.2]]]})
input_vectors

Output:

                          vectors
0            [[D, 0.5], [E, 0.3]]
1                      [[A, 0.3]]
2  [[B, 0.8], [C, 0.5], [H, 0.2]]

and

df_input

Output:

   index col1 col2
0      0    A    B
1      1    B    H
2      2    C    D

Use:

pd.concat([pd.DataFrame(x, index=[i]*len(x)) 
            for i, x in input_vectors.itertuples()])\
  .join(df_input)

Output:

   0    1  index col1 col2
0  D  0.5      0    A    B
0  E  0.3      0    A    B
1  A  0.3      1    B    H
2  B  0.8      2    C    D
2  C  0.5      2    C    D
2  H  0.2      2    C    D
Sign up to request clarification or add additional context in comments.

Comments

0

Split the list of list into rows using stack function. Then for each row in vectors column, convert it into string and use split function to create two columns va1 and va2. Use concat to join the two dataframes via index column. Drop the column index since it is not needed in the final output.

import pandas as pd
my_dict = {'index':[0,1,2], 'col1':['A','B','C'], 'col2':['B','H','D']}
df_input = pd.DataFrame(my_dict)
my_dict = {'index':[0,1,2],'vectors':[[['D', 0.5],['E', 0.3]],[['A', 0.3]],[['B', 0.8],['C', 0.5],['H', 0.2]]]}
df_output = pd.DataFrame(my_dict)

df_output = df_output.vectors.apply(pd.Series).stack().rename('vectors')
df_output = df_output.to_frame().reset_index(1, drop=True).reset_index()
df_tmp = df_output.vectors.apply(lambda x: ','.join(map(str, x))).str.split(',', expand=True)
df_tmp.columns = ['va1','val2']
df_tmp = pd.concat([df_tmp, df_output['index']], axis=1, sort=False)
df_tmp = df_input.join(df_tmp.set_index('index'), on='index')
df_tmp.reset_index(drop=True).drop(columns=['index'])

Result:

  col1 col2 va1 val2
0   A   B   D   0.5
1   A   B   E   0.3
2   B   H   A   0.3
3   C   D   B   0.8
4   C   D   C   0.5
5   C   D   H   0.2

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.