0

I have a MySQL database with 6 tables and about 2 million rows all together.

I want to migrate all the data into MongoDB.

I decided to do this by converting the SQL tables into JSON and importing it to MongoDB.

I wrote a program in Golang to extract the data and output it as JSON.

This is the main function of the program:

func main() {
    // Open a database connection
    var err error
    db, err = sql.Open("mysql", "root:password@tcp(127.0.0.1:3306)/employees")
    checkErr(err)
    // Check if reachable
    if err = db.Ping(); err != nil {
        log.Fatal("Database is unreachable:", err)
    }
    // Populate variables with data
    err = populateVars()
    checkErr(err)
    // Marshal variables into JSON
    binaryJSON, err := json.Marshal(collection)
    checkErr(err)
    // Write JSON to a file
    err = writeStringToFile("/home/user01/Temporary/sql2data.json", string(binaryJSON))
    checkErr(err)
}

The problem is that the output is inconsistent.

Every time I run the program, the resulting file has a different size and some random fields are missing.

What could be causing this?

It doesn't seem like it's a problem with the logic of the program since everything executes without errors, and most fields are populated just fine.

Could I be reading the information too fast, so that some things get lost occasionally?

Or is there something else that I'm missing?

Edit:

Most of the work happens inside the populateVars() function call.

It has multiple blocks of code that execute a given SQL query and populate struct variables according to the schema.

This is one such block:

rows, err = db.Query("SELECT emp_no, dept_emp.dept_no, dept_name, from_date, to_date FROM dept_emp JOIN departments ON departments.dept_no = dept_emp.dept_no;")
checkErr(err)
i := 0
for rows.Next() {
    var id int
    var depNumber string
    var depName string
    var fromDate string
    var toDate string
    var position = "Employee"
    err = rows.Scan(&id, &depNumber, &depName, &fromDate, &toDate,)
    // For debugging purposes:
    fmt.Println(id, depNumber, depName, fromDate, toDate, position, i)
    if err != nil {
        return err
    }
    for i := range collection {
        if collection[i].ID == id {
            collection[i].Departments = append(collection[i].Departments, Department{DepartmentNumber: depNumber, DepartmentName: depName, FromDate: fromDate, ToDate: toDate, Position: position})
            // For debugging purposes:
            fmt.Println(collection[i].Departments)
        }
    }
    i++
}

Here's a GitHub link to the whole program: https://github.com/dchmie01/mysql_to_json/blob/master/main.go

Edit 2:

It seems like the issue has to do with query timeout.

Each query takes about 10 min to execute but at about 6 minutes in, I get this error, and the program stops executing the query:

[mysql] 2017/04/29 17:35:16 packets.go:66: unexpected EOF
[mysql] 2017/04/29 17:35:16 packets.go:412: busy buffer
2017/04/29 17:35:16 driver: bad connection

And in the MySQL log file it says:

2017-04-29T16:28:49.975805Z 102 [Note] Aborted connection 102 to db: 'employees' user: 'root' host: 'localhost' (Got timeout writing communication packets)

So far I tried playing around with MySQL variables to disable any timeouts that might be present, but no luck.

I think the issue might be with the mysql driver for Go.

20
  • You know you can use CSV: Mysql SELECT INTO OUTFILE and mongoiport --type csv. Commented Apr 29, 2017 at 14:27
  • Not easily, I need to restructure the schema, not just dump all the tables. Commented Apr 29, 2017 at 14:28
  • I usually use aggregation for this, and if you do something really complex, I assume the inconsistency may come from there. The code in the question is not sufficient to give any answer on that part. Commented Apr 29, 2017 at 14:34
  • Thanks, I added a longer explanation. I'm mostly just appending struct variables to a slice of structs when reading each row. Commented Apr 29, 2017 at 14:54
  • Is the size of the collection the same as the number of rows returned from the query? That is, did you preallocate the collection to a know number of returned rows? And do all of your queries, whose result you use to populate the same collection slice, return the same number of rows? Commented Apr 29, 2017 at 15:22

1 Answer 1

1

Consider to use Mysql SELECT INTO OUTFILE and mongoiport --type csv instead.

The only thing that the program does is embedding 1-to-many and many-to-many documents, which can be easily done with aggregation framework.

A step-by step example:

  1. export csv from mysql

    SELECT * from employees INTO OUTFILE '/tmp/employees.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"';
    SELECT * from salaries INTO OUTFILE '/tmp/salaries.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"';
    SELECT * from titles INTO OUTFILE '/tmp/titles.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"';
    SELECT * from departments INTO OUTFILE '/tmp.departments.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"';
    SELECT * from dept_emp INTO OUTFILE '/tmp/dept_emp.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"';
    SELECT * from dept_manager INTO OUTFILE '/tmp/dept_manager.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"';
    
  2. import csv into mongo (define 'field spec' according to your schema, see example for employees field spec)

    mongoimport -d <dbname> -c tmp_employees -f 'id.int32(),birth.date(2006-01-02),first_name.string(),last_name.string(),gender.string(),hire_date.date(2006-01-02)' --columnsHaveTypes --type csv --file /tmp/employees.csv --drop 
    mongoimport -d <dbname> -c tmp_salaries -f 'field spec' --columnsHaveTypes --type csv --file /tmp/salaries.csv --drop 
    mongoimport -d <dbname> -c tmp_titles -f 'field spec' --columnsHaveTypes --type csv --file /tmp/titles.csv --drop 
    mongoimport -d <dbname> -c tmp_departments -f 'field spec' --columnsHaveTypes --type csv --file /tmp/departments.csv --drop 
    mongoimport -d <dbname> -c tmp_dept_emp -f 'field spec' --columnsHaveTypes --type csv --file /tmp/dept_emp.csv --drop 
    mongoimport -d <dbname> -c tmp_dept_manager -f 'field spec' --columnsHaveTypes --type csv --file /tmp/dept_manager.csv --drop 
    
  3. embed data from mongo shell

    db.tmp_employees.aggregate([
        // 1-to-many joins
        {$lookup: {
            from: 'tmp_salaries',
            localField: 'id',
            foreignField: 'emp_no',
            as: 'salaries'
        }},
        {$lookup: {
            from: 'tmp_titles',
            localField: 'id',
            foreignField: 'emp_no',
            as: 'titles'
        }},
        // many-to-many joins
        {$lookup: {
            from: 'tmp_dept_emp',
            localField: 'id',
            foreignField: 'emp_no',
            as: 'dept_emp'
        }},
        {$lookup: {
            from: 'tmp_dept_manager',
            localField: 'id',
            foreignField: 'emp_no',
            as: 'dept_manager'
        }},
        {$unwind: { path: '$dept_emp', preserveNullAndEmptyArrays: true }},
        {$lookup: {
            from: 'tmp_departments',
            localField: 'dept_emp.dept_no',
            foreignField: 'dept_no',
            as: 'dept_emp_deps'
        }},    
        {$unwind: { path: '$dept_emp_deps', preserveNullAndEmptyArrays: true }},
        {$group: {
            _id: '$_id',
            root: {$first: '$$ROOT'},
            dept_manager: {$first: '$dept_manager'},
            departments_emp: {$push: {
                department_number: '$dept_emp.emp_no',
                department_name: '$dept_emp_deps.dept_name',
                from_date: '$dept_emp.from_date',
                to_date: '$dept_emp.to_date',
                position: '$dept_emp.position'
            }},
        }},
        {$unwind: { path: '$dept_manager', preserveNullAndEmptyArrays: true }},
        {$lookup: {
            from: 'tmp_departments',
            localField: 'dept_manager.dept_no',
            foreignField: 'dept_no',
            as: 'dept_manager_deps'
        }},    
        {$unwind: { path: '$dept_manager_deps', preserveNullAndEmptyArrays: true }},
        {$group: {
            _id: '$_id',
            root: {$first: '$root'},
            departments_emp: {$first: '$departments_emp'},
            departments_manager: {$push: {
                department_number: '$dept_manager.emp_no',
                department_name: '$dept_manager_deps.dept_name',
                from_date: '$dept_manager.from_date',
                to_date: '$dept_manager.to_date',
                position: '$dept_manager.position'
            }},
        }},
        // combine departments to a single array
        {$project: {
            root: 1,
            departments_all: {$concatArrays: [ "$departments_emp", "$departments_manager" ] }
        }},
        //final reshape
        {$project: {
            id: '$root.id',
            birth_date: '$root.birth_date',
            first_name: '$root.first_name',
            last_name: '$root.last_name',
            gender: '$root.gender',
            hire_date: '$root.hire_date',
            salaries: '$root.salaries',
            titles: '$root.titles',
            departments: {$filter: {
                input: "$departments_all",
                as: "departments",
                cond: { $ne: [ "$$departments", {} ] }}}
        }},
        { $out : "employees" }
    ])
    
  4. delete imported collections from mongo shell

    db.tmp_employees.drop();
    db.tmp_salaries.drop();
    db.tmp_titles.drop();
    db.tmp_departments.drop();
    db.tmp_dept_emp.drop();
    db.tmp_dept_manager.drop();
    
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.