I am in the process of converting the schema of an existing database to Postgresql. I want to automate as much of this as possible, to avoid manual errors.
The original database uses CLUSTERED indices, however PG does not (really) have clustered indices. I want to write a bash script to replace all occurences of CLUSTERED indices, to a postgresql equivalent.
Essentially, I want to SUBSTITUTE lines like this:
CREATE clustered INDEX idx_foobar ON foobar (f1, f2, f3, f4,f5);
with a 2 line replacement like this:
CREATE INDEX idx_foobar ON foobar (f1, f2, f3, f4,f5);
CLUSTER foobar;
I think I have worked out the matching logic, I just need help with the regex, as I am not very familiar. The matching logic that seems to work is as follows:
- Find a line that starts with CREATE clustered INDEX (the line may begin with one or more non-newline whitespaces)
- store the name of the table (it follows one or more whitespace after the ON keyword)
- Remove the word clustered from the line matched in 1 above to create substitute text
- Append "\nCLUSTER $tablename" to the substitute text in step 3 above
- Replace matched line in step1 with the substitute text (obtained in step 4)
Could someone help me in incorporating this logic into a bash script so I can pass it the file to be processed?
Incidentally, I thought I could possibly use sed to do this, but I don't know if it will be easier (i.e. easier to understand) to write a bash script, instead of attempting to do this as a one liner in sed - but I am open to suggestions.