2

I have a MySQL dump file over 1 terabyte big. I need to extract the CREATE TABLE statements from it so I can provide the table definitions.

I purchased Hex Editor Neo but I'm kind of disappointed I did. I created a regex CREATE\s+TABLE(.|\s)*?(?=ENGINE=InnoDB) to extract the CREATE TABLE clause, and that seems to be working well testing in NotePad++.

However, the ETA of extracting all instances is over 3 hours, and I cannot even be sure that it is doing it correctly. I don't even know if those lines can be exported when done.

Is there a quick way I can do this on my Ubuntu box using grep or something?

UPDATE

Ran this overnight and output file came blank. I created a smaller subset of data and the procedure is still not working. It works in regex testers however, but grep is not liking it and yielding an empty output. Here is the command I'm running. I'd provide the sample but I don't want to breach confidentiality for my client. It's just a standard MySQL dump.

grep -oP "CREATE\s+TABLE(.|\s)+?(?=ENGINE=InnoDB)" test.txt > plates_schema.txt

UPDATE It seems to not match on new lines right after the CREATE\s+TABLE part.

4
  • your create table... lines finish in one line? otherwise what delimiter are you using? Commented Jun 3, 2015 at 17:31
  • shoot, good point. I'll update. But nevertheless I need a more automated way to do this. Commented Jun 3, 2015 at 17:32
  • There's no way to do the MySql dump again, only outputting the CREATE TABLE statements? Commented Jun 3, 2015 at 17:33
  • I wish. This file is the constraint I must work with. Commented Jun 3, 2015 at 17:35

4 Answers 4

2

You can use Perl for this task... this should be really fast.

Perl's .. (range) operator is stateful - it remembers state between evaluations. What it means is: if your definition of table starts with CREATE TABLE and ends with something like ENGINE=InnoDB DEFAULT CHARSET=utf8; then below will do what you want.

perl -ne 'print if /CREATE TABLE/../ENGINE=InnoDB/' INPUT_FILE.sql > OUTPUT_FILE.sql

EDIT:

Since you are working with a really large file and would probably like to know the progress, pv can give you this also:

pv INPUT_FILE.sql | perl -ne 'print if /CREATE TABLE/../ENGINE=InnoDB/' > OUTPUT_FILE.sql

This will show you progress bar, speed and ETA.

Sign up to request clarification or add additional context in comments.

8 Comments

I'm all for fast. Having issues with pcregrep on the large file, saying the --buffer-size is too little. So i will give this a try next.
I ran the test on the small sample. Worked perfectly. Running it on the 1 TB data dump file now. If this avoids the performance issues of regular expressions, you will be my hero.
Well, whatever tool you use here, disk IO is really what takes lots of time, you need to read 1TB of data. hope you're on SSD :)
Just adjust the closing regex and you're good to go with this solution, if you have InnoDB & MyISAM tables then: perl -ne 'print if /CREATE TABLE/../ENGINE=(InnoDB|MyISAM)/' INPUT_FILE.sql > OUTPUT_FILE.sql
Yeah I ended up using perl -ne 'print if /CREATE TABLE/../ENGINE= and omitted the ENGINE argument altogether. It worked perfectly after that.
|
2

You can use the following:

grep -ioP "^CREATE\s+TABLE[\s\S]*?(?=ENGINE=InnoDB)" file.txt > output.txt

2 Comments

This looks too easy... I'll get off my Windows machine and head to the Linux box to test this.
I'll check on this in two hours. I will let you know if this cranks it out successfully.
1

If you can run mysqldump again, simply add --no-data.

Comments

0

Got it! grep does not support matching across multiple lines. I found this question helpul and I ended up using pcregrep instead.

pcregrep -M "CREATE\s+TABLE(.|\n|\s)+?(?=ENGINE=InnoDB)" test.txt > plates.schema.txt

1 Comment

Unless there are comments with ';', go all the way to the next semicolon; there could be other important info, such as DEFAULT CHARACTER SET.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.