0

I am a novice in python and have an issue using regex(). I have a parent directory and a subdirectory in it.

I'm using the regex(r'(.*/)?(.+/)(.+)\.bam') 

to match the file with prefix '.bam' present in the subdirectory. A function utilizes the regex(), performs some task and gives the output and i need the output to be written to the parent directory.

Here is the total function which i am trying to do.

func(task,regex(r'(.*/)?(.+/)(.+)\.bam'),r'\1\3.output')

'.output' is the suffix to be added to the output and it shows the error "error: unmatched group". Could anyone help to fix this? or provide an elegant way to do this?

3
  • What is this regex function? Your regex compiles just fine. Commented May 30, 2013 at 17:06
  • actually the function is from a package Ruffus.Here is the original function: @transform(task, regex(r'(.*/)?(.+/(.+)\.bam'),r'\1\3.output'). This function takes in the inputfile from task which is of the format in the regex() and the suffix for output as '.output'. If i run the function from the parent directory it should take the input in subdir and should direct the output to parent dir. I doubt whether regex(r'(.*/)?(.+/)(.+)\.bam'),r'\1\3.output') does what in need? or im i going wrong? Commented May 30, 2013 at 17:42
  • Can you provide some sample text that you're searching in and the desired output? Commented May 30, 2013 at 18:38

1 Answer 1

1

Description

This expression will pull the filename, the file's path, and the current folder's parent path.

((.*[\/])[^\/]*[\/])([^\/]*?)[.]bam

enter image description here

  • ( start capture group 1
  • ( start capture group 2
  • .*[\/] greedy match entire string upto...
  • ) close capture group 2
  • [^\/]*[\/] require a the current directory
  • ) close capture group 1
  • ( start capture group 3
  • [^\/]*? non greedy match all non / characters before...
  • ) close capture group 3
  • [.] require the dot character
  • bam require the bam value

Groups

Group 0 gets the entire string

  1. gets the current path
  2. gets the this folder's parent
  3. gets the file name with .bam extension

Example

I don't know python well so here is a PHP example to show how this regex works.

$sourcestring="/ParentFolder1/SubFolder1/FileFoobar1.bam
/Some/Really/Deep/Folder/ParentFolder2/SubFolder2/FileFoobar2.bam";
preg_match_all('/((.*[\/])[^\/]*[\/])([^\/]*?)[.]bam/im',$sourcestring,$matches);
echo "<pre>".print_r($matches,true);
 
$matches Array:
(
    [0] => Array
        (
            [0] => /ParentFolder1/SubFolder1/FileFoobar1.bam
            [1] => /Some/Really/Deep/Folder/ParentFolder2/SubFolder2/FileFoobar2.bam
        )

    [1] => Array
        (
            [0] => /ParentFolder1/SubFolder1/
            [1] => /Some/Really/Deep/Folder/ParentFolder2/SubFolder2/
        )

    [2] => Array
        (
            [0] => /ParentFolder1/
            [1] => /Some/Really/Deep/Folder/ParentFolder2/
        )

    [3] => Array
        (
            [0] => FileFoobar1
            [1] => FileFoobar2
        )

)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.