I've been searching around for a while now, but I can't seem to find the answer to this small problem.
I have this code that is supposed to split the string after every three words:
import pandas as pd
import numpy as np
df1 = {
'State':['Arizona AZ asdf hello abc','Georgia GG asdfg hello def','Newyork NY asdfg hello ghi','Indiana IN asdfg hello jkl','Florida FL ASDFG hello mno']}
df1 = pd.DataFrame(df1,columns=['State'])
df1
def splitTextToTriplet(df):
text = df['State'].str.split()
n = 3
grouped_words = [' '.join(str(text[i:i+n]) for i in range(0,len(text),n))]
return grouped_words
splitTextToTriplet(df1)
Currently the output is as such:
['0 [Arizona, AZ, asdf, hello, abc]\n1 [Georgia, GG, asdfg, hello, def]\nName: State, dtype: object 2 [Newyork, NY, asdfg, hello, ghi]\n3 [Indiana, IN, asdfg, hello, jkl]\nName: State, dtype: object 4 [Florida, FL, ASDFG, hello, mno]\nName: State, dtype: object']
But I am actually expecting this output in 5 rows, one column on dataframe:
['Arizona AZ asdf', 'hello abc']
['Georgia GG asdfg', 'hello def']
['Newyork NY asdfg', 'hello ghi']
['Indiana IN asdfg', 'hello jkl']
['Florida FL ASDFG', 'hello mno']
how can I change the regex so it produces the expected output?