1

I have a DataFrame with a column keyboardInfo containing strings as follows

keyboardInfo
[alphabet de_CH:swiss 1080x667 text actionNone isMultiLine]
[alphabet de_CH:swiss 1080x667 text actionNone isMultiLine]
[alphabetAutomaticShifted de_CH:swiss 1080x667 text actionNone isMultiLine]
[alphabet de_CH:swiss 720x440 text actionNone isMultiLine]

The DataFrame is pretty big (around 5'000'000 rows). The third entry of each string (after the second white space) is always the width x height (but in total the string can contain different amounts of elements, that means it should be counted from the left). Now, I would like to add two additional columns to the DataFrame containing the width and height (as integers). The results should look as follows (the DataFrame contains also other columns which I droped here):

keyboardInfo                                                                    width    height
[alphabet de_CH:swiss 1080x667 text actionNone isMultiLine]                     1080     667
[alphabet de_CH:swiss 1080x667 text actionNone isMultiLine]                     1080     667
[alphabetAutomaticShifted de_CH:swiss 1080x667 text actionNone isMultiLine]     1080     667
[alphabet de_CH:swiss 720x440 text actionNone isMultiLine]                      720      440

How can this be done efficiently?

2
  • Are there more numbers in the text besides the width and the height? Commented Sep 27, 2020 at 23:42
  • @Erfan Most likely not but it is safer to split according to white spaces I guess. Commented Sep 27, 2020 at 23:47

1 Answer 1

1

Option 1: split, first we get the third element, then we split on the x:

df[['width', 'height']] = df['keyboardInfo'].str.split().str[2].str.split('x', expand=True)

Option 2: Use str.extractall to get the numbers, then use unstack:

df[['width', 'height']] = df['keyboardInfo'].str.extractall("(\d+)").unstack()

Output:


                                        keyboardInfo width height
0  [alphabet de_CH:swiss 1080x667 text actionNone...  1080    667
1  [alphabet de_CH:swiss 1080x667 text actionNone...  1080    667
2  [alphabetAutomaticShifted de_CH:swiss 1080x667...  1080    667
3  [alphabet de_CH:swiss 720x440 text actionNone ...   720    440
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.