2

I wrote a naive Node.js script to migrate a MySQL table to an ArangoDB collection.

It works quite well, despite that there are always records missing as if the connection was closed too early. How many documents will be missing is not random however, it's always the same amount:

  • There are 68,750 records in the source,

  • my self-built buffer has a size of 1,000 and

  • 68,682 (-68) documents are created in ArangoDB

var mysql = require('mysql');
var arango = require('arango');

var docs = [];

function processRow(row, connection) {
    if (docs.length < 1000 && row !== false) {
        docs.push(row);
    } else {
        connection.pause();
        db.import.importJSONData(
            "target_collection",
            JSON.stringify(docs, function(key, value) {
                if (value == null || (typeof value === "string" && !value.trim())) {
                    return undefined;
                } else {
                    return value;
                }
            }),
            {
                createCollection: true,
                waitForSync: false
            },
            function(err, ret) {
                docs = [];
                connection.resume();
                if (row === false) process.exit();
            }
        );
    }
}

var connection = mysql.createConnection({
    host: 'localhost',
    user: 'root',
    password: ''
});

var db = arango.Connection("http://localhost:8529/my_database");
connection.connect();

var query = connection.query('SELECT * FROM my_database.source_table');
var i = 0;

query
    .on('error', function(err) {
        console.log(err);
    })
    .on('result', function(row) {
        i++;
        if (i % 1000 == 0) console.log(i);

        processRow(row, connection);

    })
    .on('end', function() {
        processRow(false, connection);
    });

Another version of the script I wrote uses a transform stream and imports exactly 68,744 records, and a third script all of the records but creates the target collection and records as it finishes, although it should write every n source records.

Is there something obvious I am missing here?

A counter variable can confirm that all 68,750 records are read and there are no source records which are completely empty (all columns NULL), as there's at least a primary key integer (and I also tried without customized JSON stringify handler).


Solution:

Do something with every nth row when buffer is full, credits go to mscdex and mchacki for finding this obvious mistake!

Fixed stream_array_join.js

2 Answers 2

4

There is a slight error in your process row function. You trigger it with one row at a step and pushing all rows into the docs array. When executing it for the 1000 row the docs are written into ArangoDB and you insert the next row. And here is the error, the 1000th row is not stored in docs at any point. One possible fix:

        db.import.importJSONData(
        "target_collection",
        JSON.stringify(docs, function(key, value) {
            if (value == null || (typeof value === "string" && !value.trim())) {
                return undefined;
            } else {
                return value;
            }
        }),
        {
            createCollection: true,
            waitForSync: false
        },
        function(err, ret) {
            docs = [row]; // Insert row here
            connection.resume();
            if (row === false) process.exit();
        }
    );
Sign up to request clarification or add additional context in comments.

Comments

2

In processRow() you're not doing anything with row in your else branch when row is not false.

So you might need to change:

function(err, ret) {
  docs = [];
  connection.resume();
  if (row === false) process.exit();
}

to something like:

function(err, ret) {
  if (row)
    docs = [row];
  else
    docs = [];
  connection.resume();
  if (row === false) process.exit();
}

1 Comment

Thanks, that was obvious and hard to spot at the same time!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.