0

my dataframe df looks like this

Row_ID Codes
=============
1      A123,B456,C678
2      X359,C678,F23
3      J3,D24,J36,K994

I want to put all Codes in a list

something like this

['A123', 'B456', 'C678'],['X359', 'C678', 'F23'], ['J3', 'D24', 'J36', 'K994']

I did this

# an empty list
CodeList = [] 
for i in df['Codes']: 
  CodeList.append(list(i)) 

but what I get is this

['A','1','2','3','B'....

How can I do it the right way as mentioned above?

2
  • 1
    Remove list(i) from the line CodeList.append(list(i)). Just keep CodeList.append(i) Commented Nov 24, 2019 at 5:03
  • Variable names should follow the lower_case_with_underscores style, not camelCase or anything else. Commented Nov 24, 2019 at 8:59

5 Answers 5

1
import pandas as pd


data = {"Codes": ["A123, B456, C678", "X359, C678, F23", "J3, D24, J36, K994"]}
df = pd.DataFrame(data)

result = [a.split(", ") for a in df["Codes"]]
print(result)

output

[['A123', 'B456', 'C678'], ['X359', 'C678', 'F23'], ['J3', 'D24', 'J36', 'K994']]
Sign up to request clarification or add additional context in comments.

4 Comments

The docs advise against using .values.
Also, is it just the way it was copied into the post or is that output list different from the one in OP’s post?
@AlexanderCécile good point. I updated my result by removing values. So instead I'm just iterating over series values. Thank you.
Yeah I'm quite certain that this output is incorrect. OP's desired output is [['A123', 'B456', 'C678'],['X359', 'C678', 'F23'], ['J3', 'D24', 'J36', 'K994']].
1

Try spliting using the following:

CodeList.append(i.split(',')) 

4 Comments

Why the extra [ ]?
@AlexanderCécile I think he wants array of arrays
I just tested it, the [ ] are unnecessary.
Ah ok, I'm thinking of += instead of append. thanks @AlexanderCécile
0

It seems like many of the other answers here might just be plain wrong. (Edit: Currently, they all are)

This code does work:

import pandas as pd

data = {'Codes': ['A123,B456,C678', 'X359,C678,F23', 'J3,D24,J36,K994']}
df = pd.DataFrame(data)

codes_list = df['Codes'].str.split(',').tolist()

codes_list looks like:

[['A123', 'B456', 'C678'], ['X359', 'C678', 'F23'], ['J3', 'D24', 'J36', 'K994']]

Note that this solution is idiomatic Pandas, whereas explicit loops should be avoided whenever possible.

Comments

0
import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(3, 2), columns=list('AB'))
print(df.head())
print(df.values.tolist())

output:

[[-0.2645782053241853, 0.5022937587041725], [1.624868960959602, 0.5086915380333786], [1.3593608874498997, 0.7077939622903995]]

5 Comments

The docs advise against using .values.
@AlexanderCécile alright, it's usually good to have more than solution.
I said nothing to the contrary. We shouldn’t promote bad practices “because they’re different”, however, especially when there exists a good alternative. Also, I’m pretty sure that code doesn’t even produce the right result.
@AlexanderCécile Okay
Yeah no i’m quite certain that this isn’t correct. Your code is literally just converting that DataFrame to a list.
0

Just remove the list from line CodeList.append(list(i))

CodeList = [] 
for i in df['Codes']: 
   CodeList.append(i.split(','))

enter image description here

2 Comments

Pretty sure this is incorrect. In any case, it should probably just be written df[‘Codes’].tolist().
I just checked, this is indeed incorrect. The output of CodeList will be ['A123, B456, C678', 'X359, C678, F23', 'J3, D24, J36, K994'].

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.