39

I need to clean up a string column with both whitespaces and tabs included within, at the beginning or at the end of strings (it's a mess !). I want to keep just one whitespace between each word. Say we have the following string that includes every possible situation :

mystring = '  one two    three      four    '
  • 2 whitespaces before 'one'
  • 1 whitespace between 'one' and 'two'
  • 4 whitespaces between 'two' and 'three'
  • 2 tabs after 'three'
  • 1 tab after 'four'

Here is the way I do it :

  1. I delete leading and trailing whitespaces
  2. I delete leading and trailing tabs
  3. I replace both 'whitespaces repeated at least two' and tabs by a sole whitespace

WITH

  t1 AS (SELECT'  one two    three      four    '::TEXT AS mystring),

  t2 AS (SELECT TRIM(both ' ' from mystring) AS mystring FROM t1),

  t3 AS (SELECT TRIM(both '\t' from mystring) AS mystring FROM t2)

  SELECT regexp_replace(mystring, '(( ){2,}|\t+)', ' ', 'g') FROM t3 ;

I eventually get the following string, which looks nice but I still have a trailing whitespace...

'one two three four '

Any idea on doing it in a more simple way and solving this last issue ?

Many thanks !

1
  • Any help ? Someone posted a comment yesterday and deleted it... I've had no time to look at it. Thanks ! Commented Sep 19, 2014 at 6:13

3 Answers 3

87
SELECT trim(regexp_replace(col_name, '\s+', ' ', 'g')) as col_name FROM table_name;

Or In case of update :

UPDATE table_name SET col_name = trim(regexp_replace(col_name, '\s+', ' ', 'g'));

The regexp_replace is flags are described on this section of the documentation.

Sign up to request clarification or add additional context in comments.

3 Comments

On Postgres 9.5 use \s instead of \\s.
On Postgres 9.4 and 10.0 it also seems that \s should be used rather than \\s.
What had tripped me up is that regexp_replace does not do a global replace by default and you must issue the 'g' switch as the fourth parameter for it to behave in that way. It looked like the operation was having no effect since my string samples had single spaces initially followed later in the string with multiple spaces.
3

SELECT trim(regexp_replace(mystring, '\s+', ' ', 'g')) as mystring FROM t1;

Posting an answer in case folks don't look at comments.

Use '\s+'

Not '\\s+'

Worked for me.

Comments

0

It didn't work for me with trim and regexp_replace. So I came with another solution:

SELECT trim(
    array_to_string(
        regexp_split_to_array('  test    with many  spaces  for        this   test  ', E'\\s+')
    , ' ')
) as mystring;

First regexp_split_to_array eliminates all spaces leaving "blanks" at the beginning and the end.

-- regexp_split_to_array output:
-- {"",test,with,many,spaces,for,this,test,""}

When using array_to_string all the ',' become spaces

-- regexp_split_to_array output ( '_' instead of spaces for viewing ):
-- _test_with_many_spaces_for_this_test_

The trim is to remove the head and tail

-- trim output ( '_' instead of spaces for viewing ):
-- test_with_many_spaces_for_this_test

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.