Extracting CREATE TABLE definitions from MySQL dump?

Question

I have a MySQL dump file over 1 terabyte big. I need to extract the CREATE TABLE statements from it so I can provide the table definitions.

I purchased Hex Editor Neo but I'm kind of disappointed I did. I created a regex CREATE\s+TABLE(.|\s)*?(?=ENGINE=InnoDB) to extract the CREATE TABLE clause, and that seems to be working well testing in NotePad++.

However, the ETA of extracting all instances is over 3 hours, and I cannot even be sure that it is doing it correctly. I don't even know if those lines can be exported when done.

Is there a quick way I can do this on my Ubuntu box using grep or something?

UPDATE

Ran this overnight and output file came blank. I created a smaller subset of data and the procedure is still not working. It works in regex testers however, but grep is not liking it and yielding an empty output. Here is the command I'm running. I'd provide the sample but I don't want to breach confidentiality for my client. It's just a standard MySQL dump.

grep -oP "CREATE\s+TABLE(.|\s)+?(?=ENGINE=InnoDB)" test.txt > plates_schema.txt

UPDATE It seems to not match on new lines right after the CREATE\s+TABLE part.

your create table... lines finish in one line? otherwise what delimiter are you using? — karthik manchala
– karthik manchala, Commented Jun 3, 2015 at 17:31
shoot, good point. I'll update. But nevertheless I need a more automated way to do this. — tmn
– tmn, Commented Jun 3, 2015 at 17:32
There's no way to do the MySql dump again, only outputting the CREATE TABLE statements? — ragerory
– ragerory, Commented Jun 3, 2015 at 17:33

Michal Gasek · Accepted Answer · 2015-06-05 01:00:21Z

2

You can use Perl for this task... this should be really fast.

Perl's .. (range) operator is stateful - it remembers state between evaluations. What it means is: if your definition of table starts with CREATE TABLE and ends with something like ENGINE=InnoDB DEFAULT CHARSET=utf8; then below will do what you want.

perl -ne 'print if /CREATE TABLE/../ENGINE=InnoDB/' INPUT_FILE.sql > OUTPUT_FILE.sql

EDIT:

Since you are working with a really large file and would probably like to know the progress, pv can give you this also:

pv INPUT_FILE.sql | perl -ne 'print if /CREATE TABLE/../ENGINE=InnoDB/' > OUTPUT_FILE.sql

This will show you progress bar, speed and ETA.

edited Jun 5, 2015 at 1:00

answered Jun 5, 2015 at 0:24

Michal Gasek

6,4531 gold badge20 silver badges22 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

tmn Over a year ago

I'm all for fast. Having issues with pcregrep on the large file, saying the --buffer-size is too little. So i will give this a try next.

tmn Over a year ago

I ran the test on the small sample. Worked perfectly. Running it on the 1 TB data dump file now. If this avoids the performance issues of regular expressions, you will be my hero.

Michal Gasek Over a year ago

Well, whatever tool you use here, disk IO is really what takes lots of time, you need to read 1TB of data. hope you're on SSD :)

Michal Gasek Over a year ago

Just adjust the closing regex and you're good to go with this solution, if you have InnoDB & MyISAM tables then: perl -ne 'print if /CREATE TABLE/../ENGINE=(InnoDB|MyISAM)/' INPUT_FILE.sql > OUTPUT_FILE.sql

tmn Over a year ago

Yeah I ended up using perl -ne 'print if /CREATE TABLE/../ENGINE= and omitted the ENGINE argument altogether. It worked perfectly after that.

|

karthik manchala · Accepted Answer · 2015-06-05 18:22:44Z

2

You can use the following:

grep -ioP "^CREATE\s+TABLE[\s\S]*?(?=ENGINE=InnoDB)" file.txt > output.txt

edited Jun 5, 2015 at 18:22

answered Jun 3, 2015 at 17:33

karthik manchala

13.7k1 gold badge34 silver badges55 bronze badges

2 Comments

tmn Over a year ago

This looks too easy... I'll get off my Windows machine and head to the Linux box to test this.

tmn Over a year ago

I'll check on this in two hours. I will let you know if this cranks it out successfully.

Rick James · Accepted Answer · 2015-06-08 20:03:09Z

1

If you can run mysqldump again, simply add --no-data.

answered Jun 8, 2015 at 20:03

Rick James

144k15 gold badges144 silver badges255 bronze badges

Comments

Community · Accepted Answer · 2017-05-23 11:58:34Z

0

Got it! grep does not support matching across multiple lines. I found this question helpul and I ended up using pcregrep instead.

pcregrep -M "CREATE\s+TABLE(.|\n|\s)+?(?=ENGINE=InnoDB)" test.txt > plates.schema.txt

edited May 23, 2017 at 11:58

CommunityBot

11 silver badge

answered Jun 5, 2015 at 0:24

tmn

11.6k19 gold badges60 silver badges122 bronze badges

1 Comment

Rick James Over a year ago

Unless there are comments with ';', go all the way to the next semicolon; there could be other important info, such as DEFAULT CHARACTER SET.

Collectives™ on Stack Overflow

Extracting CREATE TABLE definitions from MySQL dump?

4 Answers 4

8 Comments

2 Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

8 Comments

2 Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related