Get a substring from a string with python

Question

I'm stuck trying to get a substring from a list of strings using pandas. Basically, the application returns data in this way:

['com.server.application.service.sprint.Sprint@20137b52[id=8837,rapidViewId=7061,state=CLOSED,name=name_of_the_sprint_1,startDate=2022-02-21T13:07:00.000Z,endDate=2022-03-11T13:07:00.000Z,completeDate=2022-03-14T17:19:29.271Z,activatedDate=2022-02-21T20:57:03.111Z,sequence=8837,goal=,autoStartStop=false]', 'com.server.application.service.sprint.Sprint@5fcc83c9[id=8919,rapidViewId=7061,state=CLOSED,name=name_of_the_sprint_2,startDate=2022-03-14T14:52:00.000Z,endDate=2022-04-01T14:52:00.000Z,completeDate=2022-04-04T18:25:08.141Z,activatedDate=2022-03-14T20:52:24.680Z,sequence=8919,goal=,autoStartStop=false]']

This list has two items and what I'm trying to do is to get the name of the sprint name_of_the_sprint_1 and name_of_the_sprint_2 that are after the name=.

What I did until now (I do not know if this is the best and only way to do it) is the following:

df['sprints'].iloc[idx][0].split(',') so it creates a list where I can get the information I want. But I'll need to split it again (I'm gonna find 'name=name_of_the_sprint_1' in this sublist) in order to get only the name I want and need.

Is there a better way extract this information from my dataframe? I'll need to iterate over a dataframe with 3500 rows and do it for each item.

Thanks, folks for the help.

Try using the expand=True parameter for pd.Series.str.split (pandas.pydata.org/docs/reference/api/…) then you can do another split and get the data you need. Ex: df['sprints'].iloc[idx][0].split(',', expand=True) Alternatively, a regex or pd.Series.str.extract method would work as well though probably not needed since your data is nicely structured for splitting. — Coup
– Coup, Commented Apr 7, 2022 at 2:02
Just curious if every row of dataframe has two elements in your columns? Because your post actually returns two elements — user16836078
– user16836078, Commented Apr 7, 2022 at 2:26

score 1 · Accepted Answer · 2022-04-07 07:36:21Z

A nested for loop will be useful if you arrange your code neatly, I have tried this with 7000 rows of your data:

def function(df):
    result = []
    for i in df['sprints']:
        split_string = i.split(',')
        for row in split_string:
            if 'name=' in row: 
                aa = row[5:]
                result.append(aa)
    return result

%timeit function()
14.4 ms ± 261 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Extra

I've just realized that since you have known the keyword you wish to seek, you can just use re.search to get your output:

def function(df):
    return [re.search('name_of_the_sprint_'+r"(\d+)",row).group() for row in df['sprints']]

%timeit function(df)
10.9 ms ± 328 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Or if there's different names after the name=, you can try this:

result = [re.search('name='+'\w+',row).group()[5:] for row in df[0]]

Anonymous · Accepted Answer · 2022-04-07 02:05:17Z

1

First thing that comes to mind would be to slice the string, starting after the = and ending at the ,. If the list of lists was named data, it might look like this:

data = ["whatever items, not important, name=your_thing_name, some more random stuff,", "even more random stuff, name=a_different_name, some more random things"]

for d in data:
  sub = d.index("name")+5
  val = d[sub:sub+d[sub:].index(",")]

As far as performance goes, I ran this and the total time measured about 0.2 seconds

from time import perf_counter as pc

start = pc()

data = []
for i in range(3500):
  data.append(f"this, things, name={i}_loop, very cool, ik")

for d in data:
  sub = d.index("name")+5
  val = d[sub:sub+d[sub:].index(",")]
  print(val)

print(pc() - start)

edited Apr 7, 2022 at 2:05

answered Apr 7, 2022 at 1:56

Anonymous

4623 silver badges9 bronze badges

Collectives™ on Stack Overflow

Get a substring from a string with python

2 Answers 2

Extra

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Extra

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related