3

I am currently writing this manually using a line read from a file and am trying to read all table ddls where the table begins a_

An input of this:

Other stuff: 

Other stuff: 
create table a_table1 (
    id number(10,0) not null,
    timestamp number(19,0) not null,
    primary key (id)
)
stuff
create table a_table2 (
    id number(10,0) not null,
    primary key (id)
)

Other stuff: 
create table b_table1 (
    id number(10,0) not null,
    timestamp number(19,0) not null,
    primary key (id)
)
other stuff 

other stuff

should output only this

create table a_table1 (
    id number(10,0) not null,
    timestamp number(19,0) not null,
    primary key (id)
)
create table a_table2 (
    id number(10,0) not null,
    primary key (id)
)

Currently I am using LineReaders and remembering when I see create table and then reading everything until I see )

Is this the most efficient way? Is there some fancy reg ex I could use?

I tried the following reg ex but this didnt work as it just returns the whole string again. Perhaps the new lines are breaking it

"^.*create.*a_(.*?)\\).*$", "$1")

Any advice would be appreciated

Thanks

5
  • 1
    I have done something very similar recently -- a SQL script parser which extracts SQL statements from a script and loads them into Java objects. I can tell you that it took A LOT of regex parsing -- it is hard to do it in a single statement -- you have to program the logic to recognize different syntactic pattern (e.g. create table, insert into etc.). So it will be a little more complex than a single regex Commented Jan 2, 2013 at 18:37
  • @foampile thanks for the reply. I am only considering the create tables so was hoping it would be not too complicated. I may just stick with the line reader then Commented Jan 2, 2013 at 18:41
  • in that case, i would try to find everything that starts with "create table". can you assume that the last ) is alone on the line as an end of statement delimiter? Commented Jan 2, 2013 at 18:45
  • @foampile yes the ) is the only thing on that last row. I can use create table but have to consider ignoring all those tables that dont start with a_ At the moment I am splitting the string into an array and doing a manual lookahead. As you said it gets quite complicated! Commented Jan 2, 2013 at 18:54
  • one of the problems is that java regex doesn't know the concept of new line, i.e. if you slurp the whole file into a single string, rather than go line by line, it is difficult to test for new line Commented Jan 2, 2013 at 19:32

2 Answers 2

3

Try something like this:

    ByteArrayOutputStream baos = new ByteArrayOutputStream();

    IOUtils.copyLarge(getClass().getClassLoader().getResourceAsStream("input.txt"), baos);
    String org = baos.toString();

    final Pattern compile = Pattern.compile("(?s)(create table a_.*?\n\\)\n)");
    final Matcher matcher = compile.matcher(org);
    while (matcher.find()) {
        System.out.println(matcher.group());
    }

input.txt

Other stuff:

Other stuff:
create table a_table1 (
    id number(10,0) not null,
    timestamp number(19,0) not null,
    primary key (id)
)
stuff
create table a_table2 (
    id number(10,0) not null,
    primary key (id)
)

Other stuff:
create table b_table1 (
    id number(10,0) not null,
    timestamp number(19,0) not null,
    primary key (id)
)
other stuff

output

create table a_table1 (
    id number(10,0) not null,
    timestamp number(19,0) not null,
    primary key (id)
)
create table a_table2 (
    id number(10,0) not null,
    primary key (id)
)
Sign up to request clarification or add additional context in comments.

3 Comments

+1, but it should be mentioned that this only works because the closing parenthesis is the first one that occurs in the first column of the text. So as long as the input is formatted exactly this way, this will work, but only then.
Exactly, its a one in a million regexp ;) it won't work unless the input looks more or less exactly like the example and that the closing parenthesis is as you say.
this didn't work for me. i slurped the whole file in a single String. besides, i don't think java regex supports \n
3

Following regex based code will work as long as there is only 2 level nesting of parenthesis in the create table sql statements:

String sql = "Other stuff: \n\nOther stuff: \ncreate table a_table1 (\nid number(10,0) not null,\ntimestamp number(19,0) not null,\nprimary key (id)\n)\nstuff\ncreate table a_table2 (\nid number(10,0) not null,\nprimary key (id)\n)\n\nOther stuff: \ncreate table b_table1 (\nid number(10,0) not null,\ntimestamp number(19,0) not null,\nprimary key (id)\n)\nother stuff \n\nother stuff\n\n";
Pattern p = Pattern.compile(
   "(?i)create\\s+table\\s+a_\\w+\\s+\\((?:[^()]+|\\([^()]+\\))*\\)"
);
Matcher m = p.matcher(sql);
while (m.find()) {
    System.out.println(m.group());
}

OUTPUT

create table a_table1 (
   id number(10,0) not null,
   timestamp number(19,0) not null,
   primary key (id)
)
create table a_table2 (
   id number(10,0) not null,
   primary key (id)
)

3 Comments

thanks @anubhava This worked but I preferred Peter's solution as I think that is more maintainable. +1 for your effort. Thanks again
Sure its your prerogative to choose any answer as the accepted one but to me that other answer appears to be very much input format dependent. If closing parenthesis has a space before or after \n\\)\n will fail to match.
The accepted answer didn't even use regex as the question requested.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.