1

I have a Pandas DataFrame that was created by reading a table from a PDF with tabula. The PDF isn't parsed perfectly, so I end up with a few table columns smushed into one column in the resulting DataFrame. The issue is that one of the table columns in the PDF is text, so there are sometimes one word and sometimes two words that compose the column. Example:

            Col_1  Col_2
0       Hello X Y      A
1 Hello world Q R      B
2          Hi S T      C

I would like to split Col_1 into 3 columns. I'm not sure how to do this, given that the first new column would sometimes consist of one word, as in the case of Rows 0 & 2, and sometimes consist of two words, as in the case of Row 1.

I have tried splitting the strings of Col_1 with df['Col_1'].str.split(' ', 4, expand=True), but this starts the splitting from the beginning of the string (from the left), whereas I would like the splitting to be done from the right, I suppose.

1 Answer 1

4

You can try using str.rsplit:

Splits string around given separator/delimiter, starting from the right.

df['Col_1'].str.rsplit(' ', 2, expand=True)

Output:

             0  1  2
0        Hello  X  Y
1  Hello world  Q  R
2           Hi  S  T

As a full dataframe:

df['Col_1'].str.rsplit(' ', 2, expand=True).add_prefix('nCol_').join(df)

Output:

        nCol_0 nCol_1 nCol_2            Col_1 Col_2
0        Hello      X      Y        Hello X Y     A
1  Hello world      Q      R  Hello world Q R     B
2           Hi      S      T           Hi S T     C
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.