Suppose you have two tables with substitutions which MUST be kept as they are and another table containing a body of names. How could I get all the possible substitutions?
Substitution Table
--------------------------------------
word subs_list
MOUNTAIN MOUNTAIN, MOUNT, MT, MTN
HOUSE HAUS, HOUSE
VIEW VU, VIEW
Synonyms table
-------------------------------------------------
EDUCATION SCHOOL, UNIVERSITY, COLLEGE, TRAINING
FOOD STORE, FOOD, CAFE
STORE FOOD, STORE, MARKET
REFRIGERATION FOODLOCKER, FREEZE, FRIDGE
names table
------------------------------------------------
MOUNT VU FOOD USA
MOUNTAIN VU STORE CA
Note: I know that it would be desirable to have just one substitution table, but both substution tables must remain because they served to additional purposes than the one explained above, those tables are already in used. In addition, the list of replacements in both tables are just a varchar with a string separated by commas
Considering the previous, the problem is to generate possible names derived by substitution. For instance, the name MOUNT VU FOOD USA should be decomposed to MOUNTAIN VIEW FOOD USA and MOUNTAIN VIEW STORE USA, the same fashion would apply for the second.
I have been able to get the replacements in a wrong order and all together in function, there is a way to get an array as output with the different names generated after replacement? So far I have created this function for replacement:
create or replace function replace_companies_array(i_sentence IN VARCHAR) returns VARCHAR[] AS $p_replaced$
DECLARE
p_replaced VARCHAR[];
subs RECORD;
flag boolean:= True;
cur_s CURSOR(i_sentence VARCHAR)
FOR SELECT w.input, coalesce(x.word, w.input) as word, count(*) OVER (PARTITION BY w.input) as counter
FROM regexp_split_to_table(trim(i_sentence), '\s') as w(input)
LEFT JOIN (
select s.word, trim(s1.token) as token
from subs01 s
cross join unnest(string_to_array(s.subs_list, ',')) s1(token)
union
select sy.word, trim(s2.token) as token
from syns01 sy
cross join unnest(string_to_array(sy.syn_list, ',')) s2(token)
) as x on lower(trim(w.input)) = lower(x.token)
order by counter;
BEGIN
OPEN cur_s(i_sentence);
LOOP
--fetch row into the substitutions
FETCH cur_s INTO subs;
--Exit when no more rows to fetch
EXIT WHEN NOT FOUND;
SELECT REGEXP_REPLACE(i_sentence,'(^|[^a-z0-9])' || subs.input || '($|[^a-z0-9])','\1' || UPPER(subs.word) || '\2','g')
INTO i_sentence;
END LOOP;
p_replaced:=array_append(p_replaced, i_sentence);
RETURN p_replaced;
END;
$p_replaced$ LANGUAGE plpgsql;
Thank you so much for your contributions