2

I have a list like the example data below. Every entry in the list follows the pattern 'source/number_something/'. I would like to create a new list like the output below, where the entries are just the "something". I was thinking I could use a for loop and string split on _ but some of the texts that follow also include _. This seems like something that could be done with regex, but I'm not that good at regex. Any tips are greatly appreciated.

example data:

['source/108_cash_total/',
 'source/108_customer/',
 'source/108_daily_units_total/',
 'source/108_discounts/',
 'source/108_employee/',
'source/56_cash_total/',
 'source/56_customer/',
 'source/56_daily_units_total/',
 'source/56_discounts/',
 'source/56_employee/']

output:

['cash_total',
 'customer',
 'daily_units_total',
 'discounts',
 'employee',
'cash_total',
 'customer/',
 'daily_units_total',
 'discounts',
 'employee']

3 Answers 3

6

You can use a regular expression:

\d+_([^/]+)

See a demo on regex101.com.


In Python:

import re

lst = ['source/108_cash_total/',
       'source/108_customer/',
       'source/108_daily_units_total/',
       'source/108_discounts/',
       'source/108_employee/',
       'source/56_cash_total/',
       'source/56_customer/',
       'source/56_daily_units_total/',
       'source/56_discounts/',
       'source/56_employee/']

rx = re.compile(r'\d+_([^/]+)')

output = [match.group(1) 
          for item in lst 
          for match in [rx.search(item)] 
          if match]
print(output)

Which yields

['cash_total', 'customer', 'daily_units_total', 
 'discounts', 'employee', 'cash_total', 'customer',
 'daily_units_total', 'discounts', 'employee']
Sign up to request clarification or add additional context in comments.

2 Comments

This creation of a dummy list to filter out the None search results is an interesting technique. I think I like it.
IMO this use of list comprehension isn't much shorter than classical approach (ordinary for) but is very cryptic. In Python 3.8 you may write this in much more elegant way: output = [m.group(1) for item in lst if (m := rx.search(item))].
0

You can easily do this without regex using only offsets and split() with maxsplit parameter set:

offset = len("source/")
result = []
for item in lst:
    num, data = item[offset:].split("_", 1)
    result.append(data[:-1])

Of course, it's not very flexible, but as long as your data follow the schema, it doesn't matter.

Comments

0

probably not so good and clean as compare to regex

using list comprehension and split function

lst = ['source/108_cash_total/',
 'source/108_customer/',
 'source/108_daily_units_total/',
 'source/108_discounts/',
 'source/108_employee/',
'source/56_cash_total/',
 'source/56_customer/',
 'source/56_daily_units_total/',
 'source/56_discounts/',
 'source/56_employee/']

res = [ '_'.join(i.split('_')[1:]).split('/')[:-1][0]  for i in lst]

print(res)

# output ['cash_total', 'customer', 'daily_units_total', 'discounts', 'employee', 'cash_total', 'customer', 'daily_units_total', 'discounts', 'employee']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.