Python extract part of file name regex

Question

I'm new to Python and trying to analyse some data. So I've imported and concatenated all the csv files in a folder into a single dataframe. I'm trying to extract part of the file name to use as a header and after searching, I find that you'd normally use regex.

The filenames are like this: 'Varying Concentration2_20190712-145158_Base Media.csv', 'Varying Concentration2_20190712-145158_250 g per l.csv', etc So the part I'm trying to extract is after the _ and before the .csv.

I've tried:

for fname in all_data:
    res = re.findall("(?<=_)(\w+).csv$", fname)
    if not res: continue
    print (res)

and also "(?<=[0-9]+_)(\w+)" but it does not seem to work.

The desired output would be a list containing 'Base Media', '150g per l' and so on.

Tim Biegeleisen · Accepted Answer · 2019-07-19 10:38:22Z

1

Here is an option which avoid regex and instead uses the base split string function, twice:

filename = 'Varying Concentration2_20190712-145158_Base Media.csv'
parts = filename.split('_')
nameonly = parts[len(parts)-1].split('.')[0]
print(nameonly)

Output:

Base Media

If the full filename could also contains dots, then this answer might need to be adjusted.

answered Jul 19, 2019 at 10:38

Tim Biegeleisen

526k32 gold badges323 silver badges399 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

FObersteiner Over a year ago

or just extract = (filename.split('_')[-1]).split('.')[0]

ekhumoro Over a year ago

header = filename.rsplit('_', 1)[-1].rsplit('.', 1)[0].

Tim Biegeleisen Over a year ago

When I saw your better versions I was just like :-O

heemayl · Accepted Answer · 2019-07-19 10:33:21Z

You can do:

(?<=_)[^_]+(?=\.csv$)

(?<=_) is zero-width positive lookbehind that matches _
[^_]+ matches one or more characters that are not _, this is our desired portion
(?=\.csv$) is zero-width positive lookahead makes sure we have csv at the end after the match

If you don't want to use lookarounds, you can use plain patterns and put the desired match in first (and only) captured group (and get the output by match.group(1) instead of match.group()):

_([^_]+)\.csv$

Example:

In [38]: text = 'Varying Concentration2_20190712-145158_Base Media.csv'

In [39]: re.search(r'(?<=_)[^_]+(?=\.csv$)', text).group()
Out[39]: 'Base Media'

In [40]: text = 'Varying Concentration2_20190712-145158_250 g per l.csv'

In [41]: re.search(r'(?<=_)[^_]+(?=\.csv$)', text).group()
Out[41]: '250 g per l'

CinCout · Accepted Answer · 2019-07-19 10:33:53Z

0

Use the following:

^.*_(.*)\.csv$

All this does is skips everything until _ then captures everything until .csv.

Demo

answered Jul 19, 2019 at 10:33

CinCout

9,62916 gold badges55 silver badges73 bronze badges

Comments

mrzasa · Accepted Answer · 2019-07-19 11:01:09Z

0

You can use:

_([^._]+).csv

and take the first captured group.

Demo

Explanation:

_([^._]+) you find _ and to ensure it's the last on in the string, you exculde _ from the repetition [^_]. You also exculude a dot, to avoid matching the extension .csv and that's why you repeat [^._]+. It's wrapped in brackets ([^._]+) making it a capturing group that you can use later.

In python:

>>> text = 'Varying Concentration2_20190712-145158_Base Media.csv'
>>> re.search(r'_([^._]+).csv', text).group(1)
'Base Media'

edited Jul 19, 2019 at 11:01

answered Jul 19, 2019 at 10:34

mrzasa

23.4k11 gold badges60 silver badges96 bronze badges

Collectives™ on Stack Overflow

Python extract part of file name regex

4 Answers 4

3 Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related