0

I have a list of values coming from a PGSQL database that looks something like this:

198
199
1S
2
20
997
998
999
C1
C10
A

I'm looking to parse this field a bit into individual components, which I assume would take two regexp_replace function uses in my SQL. Essentially, any non-numeric character that appears before numeric ones needs to be returned for one column, and the other column would show all non-numeric characters appearing AFTER numeric ones.

The above list would then be split into this layout as the result from PG:

Table Format

I have created a function that strips out the non-numeric characters (the last column) and casts it as an Integer, but I can't figure out the regex to return the string values prior to the number, or those found after the number.

All I could come up with so far, with my next to non-existant regex knowledge, was this: regexp_replace(fieldname, '[^A-Z]+', '', 'g'), which just strips out anything not A-Z, but I can;t get to to work with strings before numeric values, or after them.

2
  • 2
    Please edit your question to include the code you've written so far. Commented Nov 22, 2015 at 21:15
  • All I could come up with so far, with my next to non-existant regex knowledge, was this: regexp_replace(fieldname, '[^A-Z]+', '', 'g'), which just strips out anything not A-Z Commented Nov 22, 2015 at 21:20

2 Answers 2

1

For extracting the characters before the digits:

regexp_replace(fieldname, '\d.*$', '')

For extracting the characters after the digits:

regexp_replace(fieldname, '^([^\d]*\d*)', '')

Note that:

  • if there are no digits, the first will return the original value and then second an empty string. This way you are sure that the concatenation is equal to the original value in this case also.
  • the concatenation of the three parts will not return the original if there are non-numerical characters surrounded by digits: those will be lost.
  • This also works for any non-alphanumeric characters like @, [, ! ...etc.

Final SQL

select
  fieldname as original,
  regexp_replace(fieldname, '\d.*$', '') as before_s,
  regexp_replace(fieldname, '^([^\d]*\d*)', '') as after_s,
  cast(nullif(regexp_replace(fieldname, '[^\d]', '', 'g'), '') as integer) as number
from mytable;  

See fiddle.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks - makes perfect sense.
Welcome, I added some more info after you accepted the answer. Hope it helps.
0

This answer relies on information you delivered, which is

Essentially, any non-numeric character that appears before numeric ones needs to be returned for one column, and the other column would show all non-numeric characters appearing AFTER numeric ones.

  1. Everything non-numeric before a numeric value into 1 column
  2. Everything non-numeric after a numeric value into 2 column

So there's assumption that you have a value that has a numeric value in it.

select 
  val,
  regexp_matches(val,'([a-zA-Z]*)\d+') AS before_numeric,
  regexp_matches(val,'\d+([a-zA-Z]*)') AS after_numeric
from 
  val;

Attached SQLFiddle for a preview.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.