1

I have Pandas DataFrame that looks like this:

| Index | Value        |
|-------|--------------|
| 1     | [1, 12, 123] |
| 2     | [12, 123, 1] |
| 3     | [123, 12, 1] |

and I want to append third column with list of array elements lengths:

| Index | Value        | Expected_value |
|-------|--------------|----------------|
| 1     | [1, 12, 123] | [1, 2, 3]      |
| 2     | [12, 123, 1] | [2, 3, 1]      |
| 3     | [123, 12, 1] | [3, 2, 1]      |

I've tried to use python lambda function and mapping little bit like this:

dataframe["Expected_value"] = dataframe.value.map(lambda x: len(str(x)))

but instead of list I got sum of those lengths:

| Index | Value        | Expected_value |
|-------|--------------|----------------|
| 1     | [1, 12, 123] | 6              |
| 2     | [12, 123, 1] | 6              |
| 3     | [123, 12, 1] | 6              |
3
  • It's me or that 6 in Expected value is not correct? My mind is blowing Commented Apr 13, 2019 at 19:29
  • somehow I got integer - hadn't checked if this is sum of values of lengts or was it just lengths of lists from Value column Commented Apr 13, 2019 at 19:32
  • Ah you inserted values by hand, I thought It was the exact map output Commented Apr 13, 2019 at 19:34

2 Answers 2

3

You can use list comprehension with map:

dataframe["Expected_value"] = dataframe.Value.map(lambda x: [len(str(y)) for y in x])

Or nested list comprehension:

dataframe["Expected_value"] = [[len(str(y)) for y in x] for x in dataframe.Value]

There is also possible use alternative for get lengths of integers:

import math
dataframe["Expected_value"] = [[int(math.log10(y))+1 for y in x] for x in dataframe.Value]

print (dataframe)
   Index         Value Expected_value
0      1  [1, 12, 123]      [1, 2, 3]
1      2  [12, 123, 1]      [2, 3, 1]
2      3  [123, 12, 1]      [3, 2, 1]
Sign up to request clarification or add additional context in comments.

2 Comments

Can you please suggest me how to read the clipboard like this? sep='\|+' not work.
@ResidentSleeper - Unfortunately I have to copy to text file and change formating, not very nice way...
1

Use a list comprehension:

[[len(str(y)) for y in x] for x in df['Value'].tolist()]
# [[1, 2, 3], [2, 3, 1], [3, 2, 1]]

df['Expected_value'] = [[len(str(y)) for y in x] for x in df['Value'].tolist()]
df

   Index         Value Expected_value
0      1  [1, 12, 123]      [1, 2, 3]
1      2  [12, 123, 1]      [2, 3, 1]
2      3  [123, 12, 1]      [3, 2, 1]

If you need to handle missing data,

def foo(x):
    try:
       return [len(str(y)) for y in x]
    except TypeError:
        return np.nan

df['Expected_value'] = [foo(x) for x in df['Value'].tolist()]
df

   Index         Value Expected_value
0      1  [1, 12, 123]      [1, 2, 3]
1      2  [12, 123, 1]      [2, 3, 1]
2      3  [123, 12, 1]      [3, 2, 1]

It is probably the best in terms of performance when dealing with object type data. More reading at For loops with pandas - When should I care?.


Another solution with pd.DataFrame, applymap and agg:

pd.DataFrame(df['Value'].tolist()).astype(str).applymap(len).agg(list, axis=1)

0    [1, 2, 3]
1    [2, 3, 1]
2    [3, 2, 1]
dtype: object

3 Comments

Like the function to catch NaN!
Thanks @Erfan! I blame my mac for autocorrecting. "groupby" also used to be corrected to "groupie", it was quite irritating until I learned how to fix it.
You're welcome. Yes I can imagine, had same problem

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.