I've been trying to standardize the "Address" field of a table with 400k lines. Problem is, this field was made to be a free form field, meaning that users (upon registering) could just enter anything they wanted into it. I've already managed to split this address field into 14 others with each word of the adress line in order.
Now i need to concatenate some of these fields back together into different fields such as:
- Street Name
- House Number
I wish it could be as easy as grabbing the first 3 fields and calling that the street name and the 4th field as House number but due to different street name lengths, the field with the "house number" is usually fields #4, #5 and rarely ever #3 or #6.
I've already thought of an approach that could work in this case and that would be concatenating these fields in a loop and using the first occurence of a field where its first digit is a number as a break point.(due to some houses being numbered "10C", "1A", "1B" and so on)
Due to not being not very good at PL/SQL, i don't know how to put this idea into code.
All that i've managed to do so far was writing a function to check if a string starts with a digit or not, so it could be used in an IF function.
How can i dynamically "traverse" the fields in a loop using PL/SQL? Would i use an array? Is it even possible?
EDIT: Examples of what this address field contains (in portuguese):
Avenida Doutor Theomário Pinto da Costa 450 Condominio Renaissance, rua 1, casa 1
Rua Álvaro Peres Filho 60 Casa azul em frente ao orelhão
Travessa Delegado Zé Lima 61 antiga Praça Rio Branco
Rua Finlândia 28 Qd 111
Alameda Áustria 107 Condomínio Jardim Europa I
There is a clear pattern in the data, which leads me to believe that this address data was colected in a segmented format, but then concatenated into a single table field. What i need to do is the reverse, basically.
The pattern goes:
Avenida(1) Doutor Theomário Pinto da Costa (2) 450(3) Condominio Renaissance, rua 1, casa 1(4)
1 = Street type (avenue, street and etc)
2 = Street Name
3 = House Number
4 = Reference (usually used to help with directions, this field is usually open by design. Things like Floor, building and Condo name goes here.)
As you can see from the examples above, the streets usually have widly different name lenghts, which makes setting a specific number of fields as the street name impossible.
CASEclause. Also, sometimes you can process data in stages using multiple CTEs.