MySQL to JSON inconsistent extraction

Question

I have a MySQL database with 6 tables and about 2 million rows all together.

I want to migrate all the data into MongoDB.

I decided to do this by converting the SQL tables into JSON and importing it to MongoDB.

I wrote a program in Golang to extract the data and output it as JSON.

This is the main function of the program:

func main() {
    // Open a database connection
    var err error
    db, err = sql.Open("mysql", "root:password@tcp(127.0.0.1:3306)/employees")
    checkErr(err)
    // Check if reachable
    if err = db.Ping(); err != nil {
        log.Fatal("Database is unreachable:", err)
    }
    // Populate variables with data
    err = populateVars()
    checkErr(err)
    // Marshal variables into JSON
    binaryJSON, err := json.Marshal(collection)
    checkErr(err)
    // Write JSON to a file
    err = writeStringToFile("/home/user01/Temporary/sql2data.json", string(binaryJSON))
    checkErr(err)
}

The problem is that the output is inconsistent.

Every time I run the program, the resulting file has a different size and some random fields are missing.

What could be causing this?

It doesn't seem like it's a problem with the logic of the program since everything executes without errors, and most fields are populated just fine.

Could I be reading the information too fast, so that some things get lost occasionally?

Or is there something else that I'm missing?

Edit:

Most of the work happens inside the populateVars() function call.

It has multiple blocks of code that execute a given SQL query and populate struct variables according to the schema.

This is one such block:

rows, err = db.Query("SELECT emp_no, dept_emp.dept_no, dept_name, from_date, to_date FROM dept_emp JOIN departments ON departments.dept_no = dept_emp.dept_no;")
checkErr(err)
i := 0
for rows.Next() {
    var id int
    var depNumber string
    var depName string
    var fromDate string
    var toDate string
    var position = "Employee"
    err = rows.Scan(&id, &depNumber, &depName, &fromDate, &toDate,)
    // For debugging purposes:
    fmt.Println(id, depNumber, depName, fromDate, toDate, position, i)
    if err != nil {
        return err
    }
    for i := range collection {
        if collection[i].ID == id {
            collection[i].Departments = append(collection[i].Departments, Department{DepartmentNumber: depNumber, DepartmentName: depName, FromDate: fromDate, ToDate: toDate, Position: position})
            // For debugging purposes:
            fmt.Println(collection[i].Departments)
        }
    }
    i++
}

Here's a GitHub link to the whole program: https://github.com/dchmie01/mysql_to_json/blob/master/main.go

Edit 2:

It seems like the issue has to do with query timeout.

Each query takes about 10 min to execute but at about 6 minutes in, I get this error, and the program stops executing the query:

[mysql] 2017/04/29 17:35:16 packets.go:66: unexpected EOF
[mysql] 2017/04/29 17:35:16 packets.go:412: busy buffer
2017/04/29 17:35:16 driver: bad connection

And in the MySQL log file it says:

2017-04-29T16:28:49.975805Z 102 [Note] Aborted connection 102 to db: 'employees' user: 'root' host: 'localhost' (Got timeout writing communication packets)

So far I tried playing around with MySQL variables to disable any timeouts that might be present, but no luck.

I think the issue might be with the mysql driver for Go.

You know you can use CSV: Mysql SELECT INTO OUTFILE and mongoiport --type csv. — Alex Blex
– Alex Blex, Commented Apr 29, 2017 at 14:27
Not easily, I need to restructure the schema, not just dump all the tables. — Rtsne42
– Rtsne42, Commented Apr 29, 2017 at 14:28
I usually use aggregation for this, and if you do something really complex, I assume the inconsistency may come from there. The code in the question is not sufficient to give any answer on that part. — Alex Blex
– Alex Blex, Commented Apr 29, 2017 at 14:34
Thanks, I added a longer explanation. I'm mostly just appending struct variables to a slice of structs when reading each row. — Rtsne42
– Rtsne42, Commented Apr 29, 2017 at 14:54
Is the size of the collection the same as the number of rows returned from the query? That is, did you preallocate the collection to a know number of returned rows? And do all of your queries, whose result you use to populate the same collection slice, return the same number of rows? — mkopriva
– mkopriva, Commented Apr 29, 2017 at 15:22

Alex Blex · Accepted Answer · 2017-05-02 11:28:12Z

Consider to use Mysql SELECT INTO OUTFILE and mongoiport --type csv instead.

The only thing that the program does is embedding 1-to-many and many-to-many documents, which can be easily done with aggregation framework.

A step-by step example:

export csv from mysql

SELECT * from employees INTO OUTFILE '/tmp/employees.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"';
SELECT * from salaries INTO OUTFILE '/tmp/salaries.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"';
SELECT * from titles INTO OUTFILE '/tmp/titles.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"';
SELECT * from departments INTO OUTFILE '/tmp.departments.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"';
SELECT * from dept_emp INTO OUTFILE '/tmp/dept_emp.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"';
SELECT * from dept_manager INTO OUTFILE '/tmp/dept_manager.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"';

import csv into mongo (define 'field spec' according to your schema, see example for employees field spec)

mongoimport -d <dbname> -c tmp_employees -f 'id.int32(),birth.date(2006-01-02),first_name.string(),last_name.string(),gender.string(),hire_date.date(2006-01-02)' --columnsHaveTypes --type csv --file /tmp/employees.csv --drop 
mongoimport -d <dbname> -c tmp_salaries -f 'field spec' --columnsHaveTypes --type csv --file /tmp/salaries.csv --drop 
mongoimport -d <dbname> -c tmp_titles -f 'field spec' --columnsHaveTypes --type csv --file /tmp/titles.csv --drop 
mongoimport -d <dbname> -c tmp_departments -f 'field spec' --columnsHaveTypes --type csv --file /tmp/departments.csv --drop 
mongoimport -d <dbname> -c tmp_dept_emp -f 'field spec' --columnsHaveTypes --type csv --file /tmp/dept_emp.csv --drop 
mongoimport -d <dbname> -c tmp_dept_manager -f 'field spec' --columnsHaveTypes --type csv --file /tmp/dept_manager.csv --drop

embed data from mongo shell

db.tmp_employees.aggregate([
    // 1-to-many joins
    {$lookup: {
        from: 'tmp_salaries',
        localField: 'id',
        foreignField: 'emp_no',
        as: 'salaries'
    }},
    {$lookup: {
        from: 'tmp_titles',
        localField: 'id',
        foreignField: 'emp_no',
        as: 'titles'
    }},
    // many-to-many joins
    {$lookup: {
        from: 'tmp_dept_emp',
        localField: 'id',
        foreignField: 'emp_no',
        as: 'dept_emp'
    }},
    {$lookup: {
        from: 'tmp_dept_manager',
        localField: 'id',
        foreignField: 'emp_no',
        as: 'dept_manager'
    }},
    {$unwind: { path: '$dept_emp', preserveNullAndEmptyArrays: true }},
    {$lookup: {
        from: 'tmp_departments',
        localField: 'dept_emp.dept_no',
        foreignField: 'dept_no',
        as: 'dept_emp_deps'
    }},    
    {$unwind: { path: '$dept_emp_deps', preserveNullAndEmptyArrays: true }},
    {$group: {
        _id: '$_id',
        root: {$first: '$$ROOT'},
        dept_manager: {$first: '$dept_manager'},
        departments_emp: {$push: {
            department_number: '$dept_emp.emp_no',
            department_name: '$dept_emp_deps.dept_name',
            from_date: '$dept_emp.from_date',
            to_date: '$dept_emp.to_date',
            position: '$dept_emp.position'
        }},
    }},
    {$unwind: { path: '$dept_manager', preserveNullAndEmptyArrays: true }},
    {$lookup: {
        from: 'tmp_departments',
        localField: 'dept_manager.dept_no',
        foreignField: 'dept_no',
        as: 'dept_manager_deps'
    }},    
    {$unwind: { path: '$dept_manager_deps', preserveNullAndEmptyArrays: true }},
    {$group: {
        _id: '$_id',
        root: {$first: '$root'},
        departments_emp: {$first: '$departments_emp'},
        departments_manager: {$push: {
            department_number: '$dept_manager.emp_no',
            department_name: '$dept_manager_deps.dept_name',
            from_date: '$dept_manager.from_date',
            to_date: '$dept_manager.to_date',
            position: '$dept_manager.position'
        }},
    }},
    // combine departments to a single array
    {$project: {
        root: 1,
        departments_all: {$concatArrays: [ "$departments_emp", "$departments_manager" ] }
    }},
    //final reshape
    {$project: {
        id: '$root.id',
        birth_date: '$root.birth_date',
        first_name: '$root.first_name',
        last_name: '$root.last_name',
        gender: '$root.gender',
        hire_date: '$root.hire_date',
        salaries: '$root.salaries',
        titles: '$root.titles',
        departments: {$filter: {
            input: "$departments_all",
            as: "departments",
            cond: { $ne: [ "$$departments", {} ] }}}
    }},
    { $out : "employees" }
])

delete imported collections from mongo shell

db.tmp_employees.drop();
db.tmp_salaries.drop();
db.tmp_titles.drop();
db.tmp_departments.drop();
db.tmp_dept_emp.drop();
db.tmp_dept_manager.drop();

Collectives™ on Stack Overflow

MySQL to JSON inconsistent extraction

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related