0

I have a data frame with a single column 'data' which contains words separated by space. I want to separate the data into multiple rows split by space. I have tried the following code but does not work:

from itertools import chain
def chainer(s):
    return list(chain.from_iterable(s.str.split('\s+')))  
lengths = df['data'].str.split('\s+').map(len)
df_m = pd.DataFrame({"data" : np.repeat(df["data"], lengths)})

Dataframe example

words = ["a b c d e","b m g f e","c" ,"w"]
dff = pd.DataFrame({"data" :words })


data
0   a b c d e
1   b m g f e
2   c
3   w
4
  • can you post a sample dataframe for people to tinker with? Commented Jul 28, 2019 at 20:34
  • I just did it. Please take a look Commented Jul 28, 2019 at 20:40
  • thanks. what do you want the final DataFrame to look like? dff is not what you want? Commented Jul 28, 2019 at 20:44
  • 1
    dff is my current data frame, all data in each rows should be splited by space. There is coma in the example. but it should be space. I am correcting it Commented Jul 28, 2019 at 20:46

2 Answers 2

3

Are you looking for something like this:

df = pd.DataFrame()
df['text'] = ['word1 word2 word3', 'hey there hello word', 'stackoverflow is amazing']

Input:

                       text
0         word1 word2 word3
1      hey there hello word
2  stackoverflow is amazing

Do:

x = df.data.str.split(expand=True).stack().values
new_df = pd.DataFrame()
new_df['words'] = x.tolist()

Output:

           words
0          word1
1          word2
2          word3
3            hey
4          there
5          hello
6           word
7  stackoverflow
8             is
9        amazing
Sign up to request clarification or add additional context in comments.

2 Comments

Haha, thanks buddy! Feel free to post other solutions though, always good to know other workarounds :)
Thanks , that was it ;)
1

Below is my attempt.

words = ['oneword','word1 word2 word3', 'hey there hello word', 'stackoverflow is amazing']
# make list of list and flatten.
flat_list = [item for sublist in words for item in sublist.split(' ')]
# put flat_list into DataFrame.
df = pd.DataFrame({"data" :flat_list })
print(df)

             data
0         oneword
1           word1
2           word2
3           word3
4             hey
5           there
6           hello
7            word
8   stackoverflow
9              is
10        amazing

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.