1

I have a column with following values in a postgres table.

col1
uniprotkb:P62158(protein(MI:0326), 9606 - Homo sapiens)
uniprotkb:O00602-PRO_0000009136(protein(MI:0326), 9606 - Homo sapiens)

I would like to extract a value from above column values.

col2
P62158
O00602

I am using following regexp match on my column

select 

        uniprotkb:(.*)\-|\([a-zA-Z].* as col2

from table;

But the above regexp capture the text before the last '-'. I want to capture the text between uniprotkb: and before the first occurence of either '(' or '-'. Any suggestion here would be helpful.

4
  • 1
    Well, it seems the requirement you mention does not quite match your pattern. Did you mean to use uniprotkb:(.*?)[-(][a-zA-Z].*? The greedy * is only a part of the problem, isn't it? Commented Mar 2, 2020 at 11:12
  • Are you using regexp_matches? Commented Mar 2, 2020 at 11:30
  • Yes, I am using regexp_matches Commented Mar 2, 2020 at 11:31
  • So, you may even use uniprotkb:(.*?)[-(][a-zA-Z] Commented Mar 2, 2020 at 11:31

1 Answer 1

1

You may use

uniprotkb:(.*?)[-(][a-zA-Z]
           ^^^ ^^^^

See the regex demo.

Details

  • uniprotkb: - a literal string
  • (.*?) - Group 1: any 0+ chars as few as possible
  • [-(] - a - or (
  • [a-zA-Z] - a letter.

PostgresSQL test:

SELECT (REGEXP_MATCHES (
      'uniprotkb:P62158(protein(MI:0326), 9606 - Homo sapiens)',
      'uniprotkb:(.*?)[-(][a-zA-Z]'
   ))[1]

Outputs:

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.