1

I need to split a string from one table into two columns in another table. There are a few different variety of numbers and rules but no clear delimiter. Can I use combo of SUBSTR and INSTR or do I need to use if-then loops in PL/SQL to satisfy all the rules?

Input Table

5 Kent Street 
3 A lindt Street
2/15 bold Street 
9/34-36 carmen Road
12/5a sandford Street

Result

Number  |Street
--------------------
5       |Kent Street
3A      |lindt Street
2/15    |bold Street
9/34-36 |carmen Road
12/5a   |sandford Street
5
  • What database? e.g. SQL Server, Oracle, MySQL, etc. Commented Apr 14, 2012 at 13:27
  • @JeffreyKemp: I guess Oracle (because of the PL/SQL in the title) Commented Apr 14, 2012 at 13:31
  • 1
    You state that there are a variety of rules and types of numbers but you don't tell us what sort of rules and formats you need to support. Your example data can all be parsed simply by using the first space character as a delimiter. Commented Apr 14, 2012 at 17:55
  • @Justin: except for the second example (3 A lindt Street). Agreed, however, we need more info about the rules required. Commented Apr 16, 2012 at 6:11
  • You commented below "The requirement for the task is to use PL/SQL to solve this problem", so is this homework? Commented Apr 29, 2012 at 18:30

2 Answers 2

2

I wouldn't do this in pl/sql, it is really unnecessary.

Oracle SQL has REGEXP_SUBSTR, REGEXP_REPLACE, REGEXP_COUNT. You also have IF and CASE expressions you can put in the SELECT clause. Go to the SQL reference, FUNCTIONS section: http://docs.oracle.com/cd/E11882_01/server.112/e26088/functions.htm#i1482196

Your dataset has some interesting variations - letters and fractions in the house/building number. What you don't have (yet) is street names in multiple parts (e.g. Melnea Cass Boulevard here in Boston) or street names with concatenated/missing ("Broadway") or unusual ("Cedar Passway") 'street' designators.

Start with sample_data as a refactored query to hold your input data. You'll probably have a table or view or something instead.

For now we are holding the assumption that each street name has two words. We start by counting them with REGEXP_COUNT. This is the subquery COUNTED_DATA with value WORDS as the count of words. Note I append a space to each line of input, in case there isn't one at the end of the input row of data, so that the count is correct.

I search for each word as

[^ ]+[ ]+

That is, one or more non-spaces followed by one or more spaces. I don't want to use zero or more spaces ([ ]*) because that is ambiguous.

Then we use a regular expression to pick out the last two words and the first (words minus 2) words.

Here's the result:

with sample_data as (
    select '5 Kent Street' as addr from dual
    union all select '3 A lindt Street' as addr from dual
    union all select '2/15 bold Street' as addr from dual
    union all select '9/34-36 carmen Road' as addr from dual
    union all select '12/5a sandford Street' from dual
)
select 
    counted_data.addr as "Original Address",
    substr (regexp_replace (addr || ' ', '(([^ ]+[ ]+){' || (words-2) ||'})([^ ].*)','\1'), 1, 10) as "Number",
    substr (trim (regexp_replace (addr || ' ', '(([^ ]+[ ]+){' || (words-2) ||'})([^ ].*)','\3')), 1, 25) as "Street"
from
(
    select sample_data.addr, regexp_count(addr || ' ', '[ ]+') as words
    from sample_data
) counted_data

Original Address      Number     Street                   
--------------------- ---------- -------------------------
5 Kent Street         5          Kent Street               
3 A lindt Street      3 A        lindt Street              
2/15 bold Street      2/15       bold Street               
9/34-36 carmen Road   9/34-36    carmen Road               
12/5a sandford Street 12/5a      sandford Street           

To make this readable I used 'substr' to cut down the length of the output columns. ("COLUMN" doesn't work in SQL Developer.)

Sign up to request clarification or add additional context in comments.

2 Comments

reply. The requirement for the task is to use PL/SQL to solve this problem.
Is the input a CLOB, VARCHAR2, TABLE OF VARCHAR2?
0

I can suggest you three possibilities:

  1. If the separation character is clear and simple, use as you suggest a combo of SUBSTR and INSTR.

  2. If your database supports regular expression function, and it is simple matching your address field with a regular expression use it. For example MySQL has REGEXP.

  3. But if the parsing of the string is complex and the variants are too many then use an external script with a complete programming language.

    Connect to the database, for example in Java, process the string and insert the results in the new table. Remember to use bulk operations if your rows number is huge, so it will be faster.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.