3

I need to get a substring from a path directory made of 2 numbers preceeded and followed by a '_'. The string is like:

'P:\pgdfecol\71698384737978\INFENTECONTROL\2011_9_43\2011_9_46_43_29_10.ZIP'

and in this case I want to get the 43 following the 46.

The path is stored following the next rule, after the last backslash:

'\TablesPK_twoCharactersClassification_twoCharactersDocumentType_anything.ZIP'

I want to get the classification. The catch is that the table's primary key could be more than one field, although I know in each case how many fields are in the pk.

I got with something like this:

select substring(substring(substring('P:\pgdfecol\71698384737978\INFENTECONTROL\2011_9_43\2011_9_46_43_29_10.ZIP' from '([^\\]*(\.ZIP|zip))') from '([^_]*_){4}') from '[0-9]{2}')

But I would like something simpler.

Other cases:

'P:\pgdfecol\71698384737978\INFENTECONTROL\2011_03_46\2011_03_46_46_48_.ZIP'

(need the second 46)

'P:\pgdfecol\71698384737978\INFCONTABLE\2009_05_INBP\2009_05_INBP_22_28_.ZIP'

(need the 22 after INBP)

'P:\pgdfecol\71698384737978\INFOFICIAL\2007_06_MB\2007_06_MB_29_28_.ZIP'

(need the 29)

'P:\pgdfecol\71698384737978\ASOCIADOS\8010625\8010625_02_04_20110111.ZIP'

(02 after 8010625)

In the last case, the pk is only one field, so I've changed the sentence as:

select substring(substring(substring('P:\pgdfecol\71698384737978\ASOCIADOS\8010625\8010625_02_04_20110111.ZIP' from '([^\\]*(\.ZIP|zip))') from '([^_]*_){2}') from '[0-9]{2}')

For one Pk, I need the second set of ([^_]*_), for three the fourth, and so on..

select substring(substring(substring('P:\pgdfecol\71698384737978\ACTASCOMITE\ACRE123\ACRE123_17_11_.ZIP' from '([^\\]*(\.ZIP|zip))') from '([^_]*_){2}') from '[0-9]{2}')

(I get 17)

I'm using postgres 9.0.

6
  • Would also be helpful to add a couple more example values with possible variations to make clear what you need. And add what your database has to say to SHOW standard_conforming_strings;, please. Commented Oct 26, 2012 at 4:14
  • Why is there no _ before .ZIP in 8010625_02_04_20110111.ZIP? Commented Oct 26, 2012 at 4:47
  • It's not mandatory. After the "TwoCharactersDocumentType" could be anything. The only thing sure is that after the table's primary key always goes the classification and document_type. Commented Oct 26, 2012 at 4:52
  • But it's always _ and digits between the two digits and .zip? I think I got it all in my solution now. Commented Oct 26, 2012 at 4:54
  • No, after 02_04 could go numbers or letters or more '', it's not sure. That's why I start filtering from the last backslash, because before the .zip could go anything. it follows thi "format" \TablesPK_twoCharactersClassification_twoCharactersDocumentType_anything_including_letters.ZIP. The classificafion and document type are always between "" lower hyphen. Commented Oct 26, 2012 at 5:01

1 Answer 1

2

I am beginning to understand. Consider this test case:

WITH x(txt) AS ( VALUES
     ('P:\pgdfecol\71698384737978\INFENTECONTROL\2011_9_43\2011_9_46_43_29_10.ZIP')  -- 43
    ,('P:\pgdfecol\71698384737978\INFENTECONTROL\2011_03_46\2011_03_46_46_48_.ZIP')  --need the second 46
    ,('P:\pgdfecol\71698384737978\INFCONTABLE\2009_05_INBP\2009_05_INBP_22_28_.ZIP') --need the 22 after INBP
    ,('P:\pgdfecol\71698384737978\INFOFICIAL\2007_06_MB\2007_06_MB_29_28_.ZIP')      --need the 29
    )
SELECT txt, substring(txt, '\\(?:[^_\\]+_){3}(\d\d)_[^\\]*\.(?:ZIP|zip)$')
FROM   x

(?:) .. non-capturing parenthesis
[^_\\].. character class with any character except \ and _
\d .. a digit, same as [0-9] effectively
+ .. 1 or more matches (greedy)
$ .. end of string
[_\d] .. character class with digits and _

The case with only one pk needs a different pattern. Use {1} instead of {3}.

Not sure why you escape the backslashes. In modern versions of PostgreSQL standard_conforming_strings is on by default so you don't need to escape backslashes in strings - but still in regular expression of course.

Sign up to request clarification or add additional context in comments.

3 Comments

Hi, thanks for answering but it shows nothing to me, an empty string as result. I've escaped the backslash because it gives me an error "non-standar use of '\\' in string literal..." in pgAdmin 1.14. Besides, the primary key could be of any number of fields, and not only digits.
@nowxue You need to disclose your version of PostgreSQL. Always. Add that to your question. I suspect you have an older version with standard_conforming_strings = off ... pgAdmin is only the GUI and irrelevant to the question.
No, you're right. The escaped backslashes are not needed in the string, but in the regex

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.