Appending Pandas DataFrame column based on another column

Question

I have Pandas DataFrame that looks like this:

| Index | Value        |
|-------|--------------|
| 1     | [1, 12, 123] |
| 2     | [12, 123, 1] |
| 3     | [123, 12, 1] |

and I want to append third column with list of array elements lengths:

| Index | Value        | Expected_value |
|-------|--------------|----------------|
| 1     | [1, 12, 123] | [1, 2, 3]      |
| 2     | [12, 123, 1] | [2, 3, 1]      |
| 3     | [123, 12, 1] | [3, 2, 1]      |

I've tried to use python lambda function and mapping little bit like this:

dataframe["Expected_value"] = dataframe.value.map(lambda x: len(str(x)))

but instead of list I got sum of those lengths:

| Index | Value        | Expected_value |
|-------|--------------|----------------|
| 1     | [1, 12, 123] | 6              |
| 2     | [12, 123, 1] | 6              |
| 3     | [123, 12, 1] | 6              |

It's me or that 6 in Expected value is not correct? My mind is blowing — Lante Dellarovere
– Lante Dellarovere, Commented Apr 13, 2019 at 19:29
somehow I got integer - hadn't checked if this is sum of values of lengts or was it just lengths of lists from Value column — pkolawa
– pkolawa, Commented Apr 13, 2019 at 19:32
Ah you inserted values by hand, I thought It was the exact map output — Lante Dellarovere
– Lante Dellarovere, Commented Apr 13, 2019 at 19:34

jezrael · Accepted Answer · 2019-04-13 19:22:05Z

3

You can use list comprehension with map:

dataframe["Expected_value"] = dataframe.Value.map(lambda x: [len(str(y)) for y in x])

Or nested list comprehension:

dataframe["Expected_value"] = [[len(str(y)) for y in x] for x in dataframe.Value]

There is also possible use alternative for get lengths of integers:

import math
dataframe["Expected_value"] = [[int(math.log10(y))+1 for y in x] for x in dataframe.Value]

print (dataframe)
   Index         Value Expected_value
0      1  [1, 12, 123]      [1, 2, 3]
1      2  [12, 123, 1]      [2, 3, 1]
2      3  [123, 12, 1]      [3, 2, 1]

edited Apr 13, 2019 at 19:22

answered Apr 13, 2019 at 19:15

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

ResidentSleeper Over a year ago

Can you please suggest me how to read the clipboard like this? sep='\|+' not work.

jezrael Over a year ago

@ResidentSleeper - Unfortunately I have to copy to text file and change formating, not very nice way...

Erfan · Accepted Answer · 2019-04-13 19:51:19Z

1

Use a list comprehension:

[[len(str(y)) for y in x] for x in df['Value'].tolist()]
# [[1, 2, 3], [2, 3, 1], [3, 2, 1]]

df['Expected_value'] = [[len(str(y)) for y in x] for x in df['Value'].tolist()]
df

   Index         Value Expected_value
0      1  [1, 12, 123]      [1, 2, 3]
1      2  [12, 123, 1]      [2, 3, 1]
2      3  [123, 12, 1]      [3, 2, 1]

If you need to handle missing data,

def foo(x):
    try:
       return [len(str(y)) for y in x]
    except TypeError:
        return np.nan

df['Expected_value'] = [foo(x) for x in df['Value'].tolist()]
df

   Index         Value Expected_value
0      1  [1, 12, 123]      [1, 2, 3]
1      2  [12, 123, 1]      [2, 3, 1]
2      3  [123, 12, 1]      [3, 2, 1]

It is probably the best in terms of performance when dealing with object type data. More reading at For loops with pandas - When should I care?.

Another solution with pd.DataFrame, applymap and agg:

pd.DataFrame(df['Value'].tolist()).astype(str).applymap(len).agg(list, axis=1)

0    [1, 2, 3]
1    [2, 3, 1]
2    [3, 2, 1]
dtype: object

edited Apr 13, 2019 at 19:51

Erfan

43.3k10 gold badges75 silver badges86 bronze badges

answered Apr 13, 2019 at 19:16

cs95

406k106 gold badges744 silver badges797 bronze badges

3 Comments

Erfan Over a year ago

Like the function to catch NaN!

cs95 Over a year ago

Thanks @Erfan! I blame my mac for autocorrecting. "groupby" also used to be corrected to "groupie", it was quite irritating until I learned how to fix it.

Erfan Over a year ago

You're welcome. Yes I can imagine, had same problem

Collectives™ on Stack Overflow

Appending Pandas DataFrame column based on another column

2 Answers 2

2 Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related