1

I have a column in sql server 2008 which is a varchar(255). The column is full of 'measurements' and there are around 15,000 records. As the column is of type string and has been manually inputted by human it has become messy e.g.

  • 100x200cm
  • 200x400mm
  • 3" by 9"
  • 30 x 20
  • 12cm long

Basically I need to extract all measurements (that I can) into usable dimensions and store each unit into a new column (height/width/depth).

Can anyone suggest the best way to go about doing this?

I was looking at regular expression but not sure on the best way forward

1
  • 2
    Good luck on that one. I wouldn't do it in SQL anyway: way too many combinations to parse. Commented Jul 31, 2011 at 21:59

1 Answer 1

1

I recommend a very strict parser which throws an exception whenever it encounters something it cannot handle explicitly. This way you are save from surprises.

First of, you should use a scripting language for that.

Fetch the records and iterate over them and for each record you try to parse it:

  • ^([1-9][0-9]*)x([1-9][0-9]*)([a-z]+)$ - if it matches, you match $3 against a list of valid units, for example "cm", "mm", "m", etc.

  • ^([1-9][0-9]*)([^ ]+) by ([1-9][0-9]*)(.+)$ - if it matches, compare the units at match $2 and $4. Match them against a list of valid units again, and if that's successful, but they are not the same, convert one into the other.

You keep adding strict checks for each item which throws an exception. This way, your list of expressions expands with the cases you handle.

Sign up to request clarification or add additional context in comments.

3 Comments

Note: SQL Server doesn't support regex in T-SQL
That's why I noted to use a scripting language.
Sorry it was for OP, not for you. Oracle has regex and some folk get mixed up... :-)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.