2

I have a column named "KL" with for example:

sem_0405M4209F2057_1.000
sem_A_0103M5836F4798_1.000

Now I want to extract the four digits after "M" and the four digits after "F". But with df["KL"].str.extract I can't get it to work.

Locations of M and F vary, thus just using the slice [9:13] won't work for the complete column.

3 Answers 3

1

If you want to use str.extract, here's how:

>>> df['KL'].str.extract(r'M(?P<M>[0-9]{4})F(?P<F>[0-9]{4})')
      M     F
0  4209  2057
1  5836  4798

Here, M(?P<M>[0-9]{4}) matches the character 'M' and then captures 4 digits following it (the [0-9]{4} part). This is put in the column M (specified with ?P<M> inside the capturing group). The same thing is done for F.

Sign up to request clarification or add additional context in comments.

Comments

0

You could use split to achieve this, probably a better way exists:

In [147]:
s = pd.Series(['sem_0405M4209F2057_1.000','sem_A_0103M5836F4798_1.000'])
s

Out[147]:
0      sem_0405M4209F2057_1.000
1    sem_A_0103M5836F4798_1.000
dtype: object

In [153]:
m = s.str.split('M').str[1].str.split('F').str[0][:4]
f = s.str.split('M').str[1].str.split('F').str[1].str[:4]
print(m)
print(f)

0    4209
1    5836
dtype: object

0    2057
1    4798
dtype: object

Comments

0

You can also use regex:

import re

def get_data(x):
    data = re.search( r'M(\d{4})F(\d{4})', x)
    if data:
        m = data.group(1)
        f = data.group(2)

        return m, f

df = pd.DataFrame(data={'a': ['sem_0405M4209F2057_1.000', 'sem_0405M4239F2027_1.000']})

df['data'] = df['a'].apply(lambda x: get_data(x))

>>
                          a          data
0  sem_0405M4209F2057_1.000  (4209, 2057)
1  sem_0405M4239F2027_1.000  (4239, 2027)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.