1

I have a dataframe which contains array column and string column

| string_col  | array_col            |
|-------------|----------------------|
| fruits      | ['apple', 'banaana'] |
| flowers     | ['rose', 'sunflower']|
| animals     | ['lion', 'tiger']    |

I want to assign string_col elements to each element in array_col. So, the output dataframe which is in the form of below.

| string_col  | array_col            | new_col              |
|-------------|----------------------|----------------------|
| fruits      | ['apple', 'banaana'] |['fruits', 'fruits']  |
| flowers     | ['rose', 'sunflower']|['flowers', 'flowers']|
| animals     | ['lion', 'tiger']    |['animals', 'animals']|

2 Answers 2

3

Use list comprehension for repeat strings by length of column:

df['new_col'] = [[a] * len(b) for a, b in zip(df['string_col'], df['array_col'])]
print (df)
  string_col          array_col             new_col
0     fruits   [apple, banaana]    [fruits, fruits]
1    flowers  [rose, sunflower]  [flowers, flowers]
2    animals      [lion, tiger]  [animals, animals]

If small data and performance not important use DataFrame.apply:

df['new_col'] = df.apply(lambda x: [x['string_col']] * len(x['array_col']) , axis=1)

#3k rows
df = pd.concat([df] * 1000, ignore_index=True)


In [311]: %timeit df['new_col'] = [[a] * len(b) for a, b in zip(df['string_col'], df['array_col'])]
1.94 ms ± 97.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [312]: %timeit df['new_col'] = df.apply(lambda x: [x['string_col']] * len(x['array_col']) , axis=1)
40.4 ms ± 3.35 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [313]: %timeit df['new_col']=df[['string_col']].agg(list, axis=1)*df['array_col'].str.len()
132 ms ± 6.91 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Sign up to request clarification or add additional context in comments.

1 Comment

Elegant solution
0

Vectorized solution creating list out of string_col then multiplying by length of list in array_col:

>>> df['new_col']=df[['string_col']].agg(list, axis=1)*df['array_col'].str.len()

  string_col          array_col             new_col
0     fruits   [apple, banaana]    [fruits, fruits]
1    flowers  [rose, sunflower]  [flowers, flowers]
2    animals      [lion, tiger]  [animals, animals]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.