RegExp: Split string alphanumeric / numeric

Question

can you help me to split a string (or preferred if possible, place a blank at every "split" occasion) after every change from alphanumeric to numeric and vice versa ?

So a string like D2c1 22 should look like D 2 c 1 22. Best way from would be to put a blank at every change from alpha-numeric to numeric.

Somewhat related is this question on natural sorting of strings containing mixes of number and non-number sequences. Some of the queries there might prove useful to you. stackoverflow.com/questions/12965463/… — Craig Ringer
– Craig Ringer, Commented Nov 9, 2012 at 6:28

davidrac · Accepted Answer · 2012-11-09 06:30:57Z

3

You can use this regexp to find the places where it switches:

(?<=\d)(?=\D)|(?<=\D)(?=\d)

This way:

"234kjh23ljkgh34klj2345klj".gsub(/(?<=\d)(?=\D)|(?<=\D)(?=\d)/, " ")
=> "234 kjh 23 ljkgh 34 klj 2345 klj"

Edit: Without zero length look ahead and look behind:

"234kjh23ljkgh34klj2345klj".gsub(/(\d)(\D)/, "#{$1} #{$2}").gsub(/(\D)(\d)/, "#{$2} #{$1}")
=> "23 jk 5 jkgk 5 lk 534 lj"

edited Nov 9, 2012 at 6:30

answered Nov 9, 2012 at 6:14

davidrac

10.8k3 gold badges45 silver badges71 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Craig Ringer Over a year ago

Pg (9.2 at least) doesn't seem to understand that regexp, unfortunately. regress=> select regexp_replace( '234kjh23ljkgh34klj2345klj', '(?<=\d)(?=\D)|(?<=\D)(?=\d)', ' ', 'g'); ERROR: invalid regular expression: quantifier operand invalid . See sqlfiddle.com/#!12/d41d8/148

davidrac Over a year ago

It probably can't handle zero-length lookahead and lookbehind. in such case you can use something like this approach (This is ruby syntax, so you'll have to adjust):"234kjh23ljkgh34klj2345klj".gsub(/(\d)(\D)/, "#{$1} #{$2}").gsub(/(\D)(\d)/, "#{$2} #{$1}") => "23 jk 5 jkgk 5 lk 534 lj"

davidrac Over a year ago

Syntax for lookahead in pg (from the doc): (?=re) positive lookahead matches at any point where a substring matching re begins (AREs only) (?!re) negative lookahead matches at any point where no substring matching re begins (AREs only)

calimero Over a year ago

Thanks so much for that explanation, but can you translate me the above to fit inte pg? I am really not that in regular expressions

Craig Ringer · Accepted Answer · 2012-11-09 07:39:46Z

2

Here's an approach tested with PostgreSQL and verified to work. It's a bit tortured, so performance might be ... interesting.

CREATE AGGREGATE array_cat_agg (
  BASETYPE = anyarray,
  SFUNC = array_cat,
  STYPE = anyarray
);
SELECT array_to_string(array_cat_agg(a), ' ')
FROM regexp_matches('234kjh23ljkgh34klj2345klj', '(\D*)(\d*)', 'g') x(a);

We need array_cat_agg because regular array_agg can't aggregate arrays of arrays.

Alternately, a form of @davidrac's approach that'll work with PostgreSQL and probably perform significantly better (though I haven't tested) is:

SELECT regexp_replace(
  regexp_replace(
     '234kjh23ljkgh34klj2345klj', '(\d)(\D)', '\1 \2', 'g'
  ), '(\D)(\d)', '\1 \2', 'g');

This is executing the replacement in two passes. First it's inserting a space where series of digits end and series of non-digits begin. Then in another pass it's inserting spaces where series of non-digits end and series of digits begin.

Update: Here's an improved formulation:

SELECT trim(regexp_replace('234kjh23ljkgh34klj2345klj', '(?!\d)(\D+)|(?!\D)(\d+)', '\1\2 ', 'g'));

edited Nov 9, 2012 at 7:39

answered Nov 9, 2012 at 6:35

Craig Ringer

329k84 gold badges742 silver badges820 bronze badges

7 Comments

Craig Ringer Over a year ago

@davidrac Verified that your 2nd formulation works with Pg. Rephrased for Pg above, along with an earlier (uglier) implementation. +1'd your answer.

calimero Over a year ago

Thanks. But i gut that exact same teststring (234kjh23ljkgh34klj2345klj) as result when i try the above sql with the double regexp_replace

Craig Ringer Over a year ago

@calimero Could you be using an old version of PostgreSQL (9.0 or older) from before the standard_conforming_strings change? Try SET standard_conforming_strings = on; then test again. (This is why you always mention your PostgreSQL version in questions). If it works with standard_conforming_strings on, then (a) upgrade PostgreSQL and (b) see postgresql.org/docs/current/static/… for how to make the query work without upgrading PostgreSQL. Basically, instead of '\' use E'\\'; double backslashes and use E''.

calimero Over a year ago

Its psql 8.1. But i am not allowed to change that parameter (ERROR: parameter "standard_conforming_strings" cannot be changed). Guess i have to wait till the admin is here

Craig Ringer Over a year ago

@calimero No, just rewrite the query to work with your (frankly prehistoric) version of PostgreSQL, as per the docs link above. '(\d)(\D)' becomes E'(\\d)(\\D) and so on. You need to urgently start planning to upgrade your end-of-life and obsolete PostgreSQL by the way, and always mention your version in questions especially since it's so incredibly obsolete. See also postgresql.org/support/versioning .

|

Slava Semushin · Accepted Answer · 2012-11-09 06:15:41Z

1

Best way from would be to put a blank at every change from alpha-numeric to numeric.

Its not hard to do:

$ echo "D2c1 22" | sed 's|\([a-ZA-Z]\)\([0-9]\)|\1 \2|g;s|\([0-9]\)\([a-ZA-Z]\)|\1 \2|g'
D 2 c 1 22

Here I used sed and its regexp because you doesn't mention which language you use. Main idea is to use 2 regexp which replaces alpha with digit and digit with alpha to first character, space and second character.

answered Nov 9, 2012 at 6:15

Slava Semushin

15.2k7 gold badges56 silver badges69 bronze badges

1 Comment

calimero Over a year ago

Thanks very much for the quick resonses. I'd like to do it in postgresql.

codaddict · Accepted Answer · 2012-11-09 06:17:14Z

1

You can match using the regex

(?<=[a-z])(?=[0-9])|(?<=[0-9])(?=[a-z])

and replace it with a space.

See it in Perl

answered Nov 9, 2012 at 6:17

codaddict

457k83 gold badges501 silver badges537 bronze badges

2 Comments

Craig Ringer Over a year ago

As @davidrac's solution, it seems Pg's regular expression engine doesn't cope with that one. See sqlfiddle.com/#!12/d41d8/148

Craig Ringer Over a year ago

Generally better to use \d and \D too, so you can cope with any digit and non-digit sequences, not just lower-case alphanumeric.

Collectives™ on Stack Overflow

RegExp: Split string alphanumeric / numeric

4 Answers 4

4 Comments

7 Comments

1 Comment

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

4 Comments

7 Comments

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related