1

I did some sql request from postgresql and set it like pandas.DataFrame(). Every rows looks like: '8B1LP1D' where letters ('B' , 'LP' etc.) are delimiters And this aproach:

#formula is a pd.DataFrame with 1 column
for x in formula:
    print(re.split('B|LP|D|E|OS|DN',x))

out put looks fine like:

['8', '1', '1']
...
['5', '3', '2']
#etc

But I have to append it in array:

def move_parts(a):
    split = []
    for x in a:
        split.append(re.split('B|LP|D|E|OS|DN',x))
move_parts(formula)

and result was returned like error:

/usr/lib/python3.7/re.py in split(pattern, string, maxsplit, flags)
    211     and the remainder of the string is returned as the final element
    212     of the list."""
--> 213     return _compile(pattern, flags).split(string, maxsplit)
    214 
    215 def findall(pattern, string, flags=0):

TypeError: expected string or bytes-like object

what is wrong, how to save all splited values to array?

2
  • Can you give an example of what result you want to have? Commented Jan 8, 2020 at 11:43
  • better if u create an example, I guess replacing B and LP with a pipe or comma wouldn't work as you may delete data that you need. Commented Jan 8, 2020 at 11:49

2 Answers 2

1

The error here is not due to the appending onto a list, it is actually with the re.split values. The only way I was able to re-produce the error was when the type of formula = pandas.DataFrame. When I set formula to be a flat list or a pandas.Series, it all works fine. Is it possible in your code that the first instance formula was a list (or a pandas.Series) and then changed after to a pandas.DataFrame? It could be as simple as just referring to the actual column name of what you want it to run on in the pandas.DataFrame. Let's presume it is called 'request_results', then we change the code to the below and it should be able to run:

def move_parts(a):
    split = []
    for x in a:
        split.append(re.split('B|LP|D|E|OS|DN',x))
move_parts(formula['request_results'].astype(str))

Note I've also added in .astype(str) to the end. The other alternative is that some of the items in the list are not of str type. The error the is being produced is that the second parameter of re.split() is expecting a str (or bytes object, but won't go into that), and instead is getting something else - possible something like None or a float.

Sign up to request clarification or add additional context in comments.

Comments

1

If formula is a pd.DataFrame with 1 column as your said, your first expression gives the same error. Use pandas split instead:

df = pd.DataFrame({'col1': ['8B1LP1','5E3DN2']})
df.iloc[:,0].str.split('B|LP|DN|E|OS|D',expand=True).values.tolist()

Output:

[['8', '1', '1'], ['5', '3', '2']]

PS: you shold re-order your delimiters (as shown in my example): the longer'DN' must be before the single 'D', otherwise it'll never match.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.