bash script to parse (and replace) all occurences of a block of text in file

Question

I am in the process of converting the schema of an existing database to Postgresql. I want to automate as much of this as possible, to avoid manual errors.

The original database uses CLUSTERED indices, however PG does not (really) have clustered indices. I want to write a bash script to replace all occurences of CLUSTERED indices, to a postgresql equivalent.

Essentially, I want to SUBSTITUTE lines like this:

CREATE clustered INDEX idx_foobar ON foobar (f1, f2, f3, f4,f5);

with a 2 line replacement like this:

CREATE INDEX idx_foobar ON foobar (f1, f2, f3, f4,f5); CLUSTER foobar;

I think I have worked out the matching logic, I just need help with the regex, as I am not very familiar. The matching logic that seems to work is as follows:

Find a line that starts with CREATE clustered INDEX (the line may begin with one or more non-newline whitespaces)
store the name of the table (it follows one or more whitespace after the ON keyword)
Remove the word clustered from the line matched in 1 above to create substitute text
Append "\nCLUSTER $tablename" to the substitute text in step 3 above
Replace matched line in step1 with the substitute text (obtained in step 4)

Could someone help me in incorporating this logic into a bash script so I can pass it the file to be processed?

Incidentally, I thought I could possibly use sed to do this, but I don't know if it will be easier (i.e. easier to understand) to write a bash script, instead of attempting to do this as a one liner in sed - but I am open to suggestions.

NeronLeVelu · Accepted Answer · 2013-11-03 15:14:32Z

1

sed --posix "/CREATE clustered INDEX/ {
   s/ *clustered */ /
   s/ON *\([^( ]*\) *(.*$/& CLUSTER \1;/
   }"

--posix to be available for non GNU also I make another regex than bob Schuster (very good one) just to have an alternative that allow more modification on the line if needed for other purpose like inserting comment in script.

here is the session on a cygwin bash (version oneline)

$ cat sample.txt
CREATE clustered INDEX idx_foobar ON foobar (f1, f2, f3, f4,f5);
blabla;

$ sed --posix "/CREATE clustered INDEX/ {s/ *clustered */ /;s/ON *\([^( ]*\) *(.*$/& CLUSTER \1;/;}" sample.txt
CREATE INDEX idx_foobar ON foobar (f1, f2, f3, f4,f5); CLUSTER foobar;
blabla;

edited Nov 3, 2013 at 15:14

answered Nov 3, 2013 at 8:06

NeronLeVelu

10.1k1 gold badge26 silver badges44 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Homunculus Reticulli Over a year ago

Hi NeronLeVelu, thanks for your example; however, I tried it and does not seem to work (the second line containing CLUSTER $tablename is not displayed): echo CREATED clustered INDEX idx_foo ON foobar(f1, f2, f3, f4,f5);' | sed --posix "/CREATE clustered INDEX/ {s/ *clusterd *//;s/ON *([^( ]*) *.*$/&; CLUSTER \1;/;}" CREATED clustered INDEX idx_foo ON foobar(f1, f2, f3, f4,f5);`

NeronLeVelu Over a year ago

error interface keyboard/chair, I forgot the last "e" in clustered and ( so here you are: sed --posix "/CREATE clustered INDEX/ {s/ *clustered *//;s/ON *([^( ]*) *(.*$/&; CLUSTER \1;/;}"

Homunculus Reticulli Over a year ago

Haha NeronLeVelu, I must have the same error interface - I missed that completely too. However, did you test the corrected command at the command line using echo?. I think you will find that it still doesn't work. I must be doing something wrong, I can't believe that all of the answers I have received so far do not work at the command line?!. Please try at the command line and let me know if you are able to get the desired result, using the command line. Thanks

NeronLeVelu Over a year ago

i just test from a cygwin session (bash by default) and with a cat of a file, so not an echo, maybe your echo is modifying the content.[session here after]$ sed --posix "/CREATE clustered INDEX/ {s/ *clustered *//;s/ON *([^( ]*) *(.*$/&; CLUSTER \1;/;}" sample.txt CREATEINDEX idx_foobar ON foobar (f1, f2, f3, f4,f5);; CLUSTER foobar; blabla; [end session] [sample.txt here after]CREATE clustered INDEX idx_foobar ON foobar (f1, f2, f3, f4,f5); blabla; [end of smaple.txt]

Denis de Bernardy Over a year ago

Isn't the Postgres syntax more like cluster table using index?

Bob Schuster · Accepted Answer · 2013-11-03 04:24:20Z

0

You could try sed, for example:

sed -r 's/^\s*(CREATE\s*)clustered\s*(INDEX.*ON\s*)(\w*)(\s+\(.*;)$/\1\2\3\4\nCLUSTER \3;/gi' original.txt > updated.txt

I followed your guidelines which is why the regex is a bit bulky, but you can revise the regex based on the actual content of your input file and whether you want to preserve extraneous spaces.

One good place to experiment with regex is: http://regex101.com

edited Nov 3, 2013 at 4:24

answered Nov 3, 2013 at 4:06

Bob Schuster

212 bronze badges

1 Comment

Homunculus Reticulli Over a year ago

Hi Bob, thanks for your answer. I tried a sample SQL line on it, but as you can see from below, it appears to have the same problem as NeronLeVelu's example in that the last line (CREATE $tablename) is not generated: ` echo 'CREATED clustered INDEX idx_foo ON foobar(f1, f2, f3, f4, f5);' | sed -r 's/^\s*(CREATE\s*)clustered\s*(INDEX.*ON\s*)(\w*)(\s+(.*;)$/\1\2\3\4\nCLUSTER \3;/gi' CREATED clustered INDEX idx_foo ON foobar(f1, f2, f3, f4, f5); `

potong · Accepted Answer · 2013-11-03 11:10:14Z

0

This might work for you (GNU sed):

sed -r 's/^(\s*CREATE) (cluster)ed(.* (\S+) \(.*\);)\s*$/\1\3\n\U\2 \L\4;/' file

answered Nov 3, 2013 at 11:10

potong

59.3k6 gold badges55 silver badges92 bronze badges

Comments

Denis de Bernardy · Accepted Answer · 2013-11-03 13:13:30Z

0

Be wary that clustering in Postgres isn't necessarily the same as in the original database you're using (I presume SQL Server?). Per the docs:

Clustering is a one-time operation: when the table is subsequently updated, the changes are not clustered. That is, no attempt is made to store new or updated rows according to their index order. (If one wishes, one can periodically recluster by issuing the command again. (...))

http://www.postgresql.org/docs/current/static/sql-cluster.html

This means that replacing create clustered index on table (...); with create index on table (...); cluster table; isn't going to work the way you're expecting.

In light of that, stick to removing clustered using sed, or make sure you add the additional using index part. If the latter, you'll also want to add an extra cluster table at the very end of the import, to actually cluster the data.

Methinks you ought to remove the clustered references altogether, and worry about adding them at the very end of your import, either manually or by generating an additional SQL file as part of or prior to the removal script.

answered Nov 3, 2013 at 13:13

Denis de Bernardy

79.1k14 gold badges138 silver badges158 bronze badges

3 Comments

Homunculus Reticulli Over a year ago

Thanks for your input Denis. I was under the impression (from the Postgresql doc) that CLUSTER table is an abbreviated form for CLUSTER table USING index. From the docs:

Cluster the table employees on the basis of its index employees_ind:  CLUSTER employees USING employees_ind;  Cluster the employees table using the same index that was used before:  CLUSTER employees;

Is my understanding wrong?

Denis de Bernardy Over a year ago

It's an abbreviation if -- and only if -- the table is already clustered. You first need to issue a cluster ... using .... Only then can you use the abbreviated form, which is really syntactic sugar to avoid relying the index name over and over again when reclustering.

Denis de Bernardy Over a year ago

Also, note that writing this kind of script in ruby, python, or whatever other language is a lot less error prone and readable. Stick to shell if and only if you need unusual levels of portability. (And avoid bashisms, so it conforms to posix /bin/sh, when you do.)

Collectives™ on Stack Overflow

bash script to parse (and replace) all occurences of a block of text in file

4 Answers 4

5 Comments

1 Comment

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

5 Comments

1 Comment

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related