3

I have a dataframe df like this where both columns are object.

    +-----+--------------------+--------------------+
    |  id |         col1       |         col2       |
    +-----+--------------------+--------------------+
    |   1 |  0,1,4,0,1         |  1,2,4,0,0         |
    +-----+--------------------+--------------------+

I convert them into a list like this

test = df["col1"]+','+df["col2"]
test.tolist()

Which produces the following results as a SINGLE STING element in a list

['0,1,4,0,1,1,2,4,0,0']

However, I want them as a list of integers like this

[0,1,4,0,1,1,2,4,0,0] 

Any suggestions? Just FYI, the columns are really huge in my original dataset so performance might be an issue too.

2
  • Do you have control over how those columns were created in the first place? If performance might be an issue that's the place to spend the effort. Commented Apr 16, 2020 at 20:29
  • They come from a CSV file. Commented Apr 16, 2020 at 20:40

3 Answers 3

5

I think you want:

(df['col1'] + ',' + df['col2']).apply(lambda row: [int(s) for s in row.split(',')])

Output:

0    [0, 1, 4, 0, 1, 1, 2, 4, 0, 0]
dtype: object
Sign up to request clarification or add additional context in comments.

4 Comments

How can I have it as a pure list instead of an object as the output goes to other functions to be processed?
does your dataframe have just one row?
Good question. I do have multiple rows but as a preprocess, I already make sure that there is one row comes in at a time. It would be better if the solution can handle multiple rows :-)
This solution gives you a series of lists, each for a row. So if you want to pass each list to your function, you can do a loop for r in series: your_func(r), where series is the output above.
3

another method using str.split and explode

arr = df.set_index('id').stack().str.split(',').explode().astype(int).values

print(arr)
array([0, 1, 4, 0, 1, 1, 2, 4, 0, 0])

4 Comments

Encountered with this error AttributeError: 'Series' object has no attribute 'explode'
@AbuShoeb what version of pandas are you on ? it was added in 0.25
I see. Mine is 0.23.4
@AbuShoeb if you can upgrade then do that otherwise you can use stack again after str.split with arg expand=True
0

You can do it with map as

 test = str(df["col1"]+','+df["col2"])
 list(map(int, test.split(','))) 

3 Comments

As I mentioned in another answer, would it possible to have the final result as a list instead of map object?
Typecast the entire thing to list. Updated
I think I tried it before and tried again now and ended up with ValueError: invalid literal for int() with base 10: '0 1'

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.