0

I have a pandas dataframe containing several columns including 'text', 'start', 'tend', and I want to create a new column that extracts a substring of 'text' based on the 'start' and 'tend'.

text               start              tend      subtext
'Sample text'        2                 8        'mple te'      
'Sample text'        4                10        'le text'

This works:

df['subtext']= df['text'].str[2:6]

This produces 'nan' instead of text:

df['subtext']= df['text'].str[df['start']:df['tend']]

I'm guessing it has something to do with passing a series rather than a single value. Any help would be appreciated, and I'm open to another approach if this strategy is bad.

1

1 Answer 1

1

Use apply, because each row needs to be process separately:

data = {
    
    "text":["sample text","sample text"],
    "start":[2,4],
    "tend":[8,10]
    
}
df = pd.DataFrame(data)
df['subtext']= df.apply(lambda x: x['text'][x['start']:x['tend']], 1)

Output

    text        start   tend    subtext
   sample text   2       8      ample t
   sample text   4      10      ple tex
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.