Split string into two columns (SQL or PL/SQL)

Question

I need to split a string from one table into two columns in another table. There are a few different variety of numbers and rules but no clear delimiter. Can I use combo of SUBSTR and INSTR or do I need to use if-then loops in PL/SQL to satisfy all the rules?

Input Table

5 Kent Street 
3 A lindt Street
2/15 bold Street 
9/34-36 carmen Road
12/5a sandford Street

Result

Number  |Street
--------------------
5       |Kent Street
3A      |lindt Street
2/15    |bold Street
9/34-36 |carmen Road
12/5a   |sandford Street

@JeffreyKemp: I guess Oracle (because of the PL/SQL in the title) — user330315
– user330315, Commented Apr 14, 2012 at 13:31
You state that there are a variety of rules and types of numbers but you don't tell us what sort of rules and formats you need to support. Your example data can all be parsed simply by using the first space character as a delimiter. — Justin Cave
– Justin Cave, Commented Apr 14, 2012 at 17:55
@Justin: except for the second example (3 A lindt Street). Agreed, however, we need more info about the rules required. — Jeffrey Kemp
– Jeffrey Kemp, Commented Apr 16, 2012 at 6:11
You commented below "The requirement for the task is to use PL/SQL to solve this problem", so is this homework? — Burhan Ali
– Burhan Ali, Commented Apr 29, 2012 at 18:30

Andrew Wolfe · Accepted Answer · 2012-04-14 16:22:27Z

I wouldn't do this in pl/sql, it is really unnecessary.

Oracle SQL has REGEXP_SUBSTR, REGEXP_REPLACE, REGEXP_COUNT. You also have IF and CASE expressions you can put in the SELECT clause. Go to the SQL reference, FUNCTIONS section: http://docs.oracle.com/cd/E11882_01/server.112/e26088/functions.htm#i1482196

Your dataset has some interesting variations - letters and fractions in the house/building number. What you don't have (yet) is street names in multiple parts (e.g. Melnea Cass Boulevard here in Boston) or street names with concatenated/missing ("Broadway") or unusual ("Cedar Passway") 'street' designators.

Start with sample_data as a refactored query to hold your input data. You'll probably have a table or view or something instead.

For now we are holding the assumption that each street name has two words. We start by counting them with REGEXP_COUNT. This is the subquery COUNTED_DATA with value WORDS as the count of words. Note I append a space to each line of input, in case there isn't one at the end of the input row of data, so that the count is correct.

I search for each word as

[^ ]+[ ]+

That is, one or more non-spaces followed by one or more spaces. I don't want to use zero or more spaces ([ ]*) because that is ambiguous.

Then we use a regular expression to pick out the last two words and the first (words minus 2) words.

Here's the result:

with sample_data as (
    select '5 Kent Street' as addr from dual
    union all select '3 A lindt Street' as addr from dual
    union all select '2/15 bold Street' as addr from dual
    union all select '9/34-36 carmen Road' as addr from dual
    union all select '12/5a sandford Street' from dual
)
select 
    counted_data.addr as "Original Address",
    substr (regexp_replace (addr || ' ', '(([^ ]+[ ]+){' || (words-2) ||'})([^ ].*)','\1'), 1, 10) as "Number",
    substr (trim (regexp_replace (addr || ' ', '(([^ ]+[ ]+){' || (words-2) ||'})([^ ].*)','\3')), 1, 25) as "Street"
from
(
    select sample_data.addr, regexp_count(addr || ' ', '[ ]+') as words
    from sample_data
) counted_data

Original Address      Number     Street                   
--------------------- ---------- -------------------------
5 Kent Street         5          Kent Street               
3 A lindt Street      3 A        lindt Street              
2/15 bold Street      2/15       bold Street               
9/34-36 carmen Road   9/34-36    carmen Road               
12/5a sandford Street 12/5a      sandford Street

To make this readable I used 'substr' to cut down the length of the output columns. ("COLUMN" doesn't work in SQL Developer.)

reply. The requirement for the task is to use PL/SQL to solve this problem.

dash1e · Accepted Answer · 2012-04-14 04:11:43Z

0

I can suggest you three possibilities:

If the separation character is clear and simple, use as you suggest a combo of SUBSTR and INSTR.
If your database supports regular expression function, and it is simple matching your address field with a regular expression use it. For example MySQL has REGEXP.
But if the parsing of the string is complex and the variants are too many then use an external script with a complete programming language.

Connect to the database, for example in Java, process the string and insert the results in the new table. Remember to use bulk operations if your rows number is huge, so it will be faster.

answered Apr 14, 2012 at 4:11

dash1e

7,8251 gold badge32 silver badges39 bronze badges

Collectives™ on Stack Overflow

Split string into two columns (SQL or PL/SQL)

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest