3

can you help me to split a string (or preferred if possible, place a blank at every "split" occasion) after every change from alphanumeric to numeric and vice versa ?

So a string like D2c1 22 should look like D 2 c 1 22. Best way from would be to put a blank at every change from alpha-numeric to numeric.

2
  • 1
    I can help you but can not do it for you. Commented Nov 9, 2012 at 6:11
  • 1
    Somewhat related is this question on natural sorting of strings containing mixes of number and non-number sequences. Some of the queries there might prove useful to you. stackoverflow.com/questions/12965463/… Commented Nov 9, 2012 at 6:28

4 Answers 4

3

You can use this regexp to find the places where it switches:

(?<=\d)(?=\D)|(?<=\D)(?=\d)

This way:

"234kjh23ljkgh34klj2345klj".gsub(/(?<=\d)(?=\D)|(?<=\D)(?=\d)/, " ")
=> "234 kjh 23 ljkgh 34 klj 2345 klj"

Edit: Without zero length look ahead and look behind:

"234kjh23ljkgh34klj2345klj".gsub(/(\d)(\D)/, "#{$1} #{$2}").gsub(/(\D)(\d)/, "#{$2} #{$1}")
=> "23 jk 5 jkgk 5 lk 534 lj"
Sign up to request clarification or add additional context in comments.

4 Comments

Pg (9.2 at least) doesn't seem to understand that regexp, unfortunately. regress=> select regexp_replace( '234kjh23ljkgh34klj2345klj', '(?<=\d)(?=\D)|(?<=\D)(?=\d)', ' ', 'g'); ERROR: invalid regular expression: quantifier operand invalid . See sqlfiddle.com/#!12/d41d8/148
It probably can't handle zero-length lookahead and lookbehind. in such case you can use something like this approach (This is ruby syntax, so you'll have to adjust):"234kjh23ljkgh34klj2345klj".gsub(/(\d)(\D)/, "#{$1} #{$2}").gsub(/(\D)(\d)/, "#{$2} #{$1}") => "23 jk 5 jkgk 5 lk 534 lj"
Syntax for lookahead in pg (from the doc): (?=re) positive lookahead matches at any point where a substring matching re begins (AREs only) (?!re) negative lookahead matches at any point where no substring matching re begins (AREs only)
Thanks so much for that explanation, but can you translate me the above to fit inte pg? I am really not that in regular expressions
2

Here's an approach tested with PostgreSQL and verified to work. It's a bit tortured, so performance might be ... interesting.

CREATE AGGREGATE array_cat_agg (
  BASETYPE = anyarray,
  SFUNC = array_cat,
  STYPE = anyarray
);
SELECT array_to_string(array_cat_agg(a), ' ')
FROM regexp_matches('234kjh23ljkgh34klj2345klj', '(\D*)(\d*)', 'g') x(a);

We need array_cat_agg because regular array_agg can't aggregate arrays of arrays.

Alternately, a form of @davidrac's approach that'll work with PostgreSQL and probably perform significantly better (though I haven't tested) is:

SELECT regexp_replace(
  regexp_replace(
     '234kjh23ljkgh34klj2345klj', '(\d)(\D)', '\1 \2', 'g'
  ), '(\D)(\d)', '\1 \2', 'g');

This is executing the replacement in two passes. First it's inserting a space where series of digits end and series of non-digits begin. Then in another pass it's inserting spaces where series of non-digits end and series of digits begin.

Update: Here's an improved formulation:

SELECT trim(regexp_replace('234kjh23ljkgh34klj2345klj', '(?!\d)(\D+)|(?!\D)(\d+)', '\1\2 ', 'g'));

7 Comments

@davidrac Verified that your 2nd formulation works with Pg. Rephrased for Pg above, along with an earlier (uglier) implementation. +1'd your answer.
Thanks. But i gut that exact same teststring (234kjh23ljkgh34klj2345klj) as result when i try the above sql with the double regexp_replace
@calimero Could you be using an old version of PostgreSQL (9.0 or older) from before the standard_conforming_strings change? Try SET standard_conforming_strings = on; then test again. (This is why you always mention your PostgreSQL version in questions). If it works with standard_conforming_strings on, then (a) upgrade PostgreSQL and (b) see postgresql.org/docs/current/static/… for how to make the query work without upgrading PostgreSQL. Basically, instead of '\' use E'\\'; double backslashes and use E''.
Its psql 8.1. But i am not allowed to change that parameter (ERROR: parameter "standard_conforming_strings" cannot be changed). Guess i have to wait till the admin is here
@calimero No, just rewrite the query to work with your (frankly prehistoric) version of PostgreSQL, as per the docs link above. '(\d)(\D)' becomes E'(\\d)(\\D) and so on. You need to urgently start planning to upgrade your end-of-life and obsolete PostgreSQL by the way, and always mention your version in questions especially since it's so incredibly obsolete. See also postgresql.org/support/versioning .
|
1

Best way from would be to put a blank at every change from alpha-numeric to numeric.

Its not hard to do:

$ echo "D2c1 22" | sed 's|\([a-ZA-Z]\)\([0-9]\)|\1 \2|g;s|\([0-9]\)\([a-ZA-Z]\)|\1 \2|g'
D 2 c 1 22

Here I used sed and its regexp because you doesn't mention which language you use. Main idea is to use 2 regexp which replaces alpha with digit and digit with alpha to first character, space and second character.

1 Comment

Thanks very much for the quick resonses. I'd like to do it in postgresql.
1

You can match using the regex

(?<=[a-z])(?=[0-9])|(?<=[0-9])(?=[a-z])

and replace it with a space.

See it in Perl

2 Comments

As @davidrac's solution, it seems Pg's regular expression engine doesn't cope with that one. See sqlfiddle.com/#!12/d41d8/148
Generally better to use \d and \D too, so you can cope with any digit and non-digit sequences, not just lower-case alphanumeric.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.