Split string values of pandas.DataFrame column to array

Question

I did some sql request from postgresql and set it like pandas.DataFrame(). Every rows looks like: '8B1LP1D' where letters ('B' , 'LP' etc.) are delimiters And this aproach:

#formula is a pd.DataFrame with 1 column
for x in formula:
    print(re.split('B|LP|D|E|OS|DN',x))

out put looks fine like:

['8', '1', '1']
...
['5', '3', '2']
#etc

But I have to append it in array:

def move_parts(a):
    split = []
    for x in a:
        split.append(re.split('B|LP|D|E|OS|DN',x))
move_parts(formula)

and result was returned like error:

/usr/lib/python3.7/re.py in split(pattern, string, maxsplit, flags)
    211     and the remainder of the string is returned as the final element
    212     of the list."""
--> 213     return _compile(pattern, flags).split(string, maxsplit)
    214 
    215 def findall(pattern, string, flags=0):

TypeError: expected string or bytes-like object

what is wrong, how to save all splited values to array?

better if u create an example, I guess replacing B and LP with a pipe or comma wouldn't work as you may delete data that you need. — Umar.H
– Umar.H, Commented Jan 8, 2020 at 11:49

Clusks · Accepted Answer · 2020-01-08 12:06:36Z

The error here is not due to the appending onto a list, it is actually with the re.split values. The only way I was able to re-produce the error was when the type of formula = pandas.DataFrame. When I set formula to be a flat list or a pandas.Series, it all works fine. Is it possible in your code that the first instance formula was a list (or a pandas.Series) and then changed after to a pandas.DataFrame? It could be as simple as just referring to the actual column name of what you want it to run on in the pandas.DataFrame. Let's presume it is called 'request_results', then we change the code to the below and it should be able to run:

def move_parts(a):
    split = []
    for x in a:
        split.append(re.split('B|LP|D|E|OS|DN',x))
move_parts(formula['request_results'].astype(str))

Note I've also added in .astype(str) to the end. The other alternative is that some of the items in the list are not of str type. The error the is being produced is that the second parameter of re.split() is expecting a str (or bytes object, but won't go into that), and instead is getting something else - possible something like None or a float.

Stef · Accepted Answer · 2020-01-08 12:06:19Z

1

If formula is a pd.DataFrame with 1 column as your said, your first expression gives the same error. Use pandas split instead:

df = pd.DataFrame({'col1': ['8B1LP1','5E3DN2']})
df.iloc[:,0].str.split('B|LP|DN|E|OS|D',expand=True).values.tolist()

Output:

[['8', '1', '1'], ['5', '3', '2']]

PS: you shold re-order your delimiters (as shown in my example): the longer'DN' must be before the single 'D', otherwise it'll never match.

answered Jan 8, 2020 at 12:06

Stef

30.9k3 gold badges34 silver badges60 bronze badges

Collectives™ on Stack Overflow

Split string values of pandas.DataFrame column to array

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related