python pandas parse string based on row values

Question

I have a pandas dataframe containing several columns including 'text', 'start', 'tend', and I want to create a new column that extracts a substring of 'text' based on the 'start' and 'tend'.

text               start              tend      subtext
'Sample text'        2                 8        'mple te'      
'Sample text'        4                10        'le text'

This works:

df['subtext']= df['text'].str[2:6]

This produces 'nan' instead of text:

df['subtext']= df['text'].str[df['start']:df['tend']]

I'm guessing it has something to do with passing a series rather than a single value. Any help would be appreciated, and I'm open to another approach if this strategy is bad.

stackoverflow.com/questions/47395993/…

Effective_cellist
– Effective_cellist

2021-06-02 19:31:35 +00:00
Commented Jun 2, 2021 at 19:31 — Effective_cellist
– Effective_cellist, Commented Jun 2, 2021 at 19:31

Muhammad Safwan · Accepted Answer · 2021-06-02 20:03:31Z

1

Use apply, because each row needs to be process separately:

data = {
    
    "text":["sample text","sample text"],
    "start":[2,4],
    "tend":[8,10]
    
}
df = pd.DataFrame(data)
df['subtext']= df.apply(lambda x: x['text'][x['start']:x['tend']], 1)

Output

    text        start   tend    subtext
   sample text   2       8      ample t
   sample text   4      10      ple tex

edited Jun 2, 2021 at 20:03

answered Jun 2, 2021 at 19:28

Muhammad Safwan

1,0349 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

python pandas parse string based on row values

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related