This is a question that suggests confusion about an asynchronous streaming API and seems to ask at least three things.
- How do I get
output to contain an array-of-arrays representing the parsed CSV data?
That output will never exist at the top-level, like you (and many other programmers) hope it would, because of how asynchronous APIs operate. All the data assembled neatly in one place can only exist in a callback function. The next best thing syntactically is const output = await somePromiseOfOutput() but that can only occur in an async function and only if we switch from streams to promises. That's all possible, and I mention it so you can check it out later on your own. I'll assume you want to stick with streams.
An array consisting of all the rows can only exist after reading the entire stream. That's why all the rows are only available in the author's "Stream API" example only in the .on('end', ...) callback. If you want to do anything with all the rows present at the same time, you'll need to do it in the end callback.
From https://csv.js.org/parse/api/ note that the author:
- uses the on readable callback to push single records into a previously empty array defined externally named
output.
- uses the on error callback to report errors
- uses the on end callback to compare all the accumulated records in output to the expected result
...
const output = []
...
parser.on('readable', function(){
let record
while (record = parser.read()) {
output.push(record)
}
})
// Catch any error
parser.on('error', function(err){
console.error(err.message)
})
// When we are done, test that the parsed output matched what expected
parser.on('end', function(){
assert.deepEqual(
output,
[
[ 'root','x','0','0','root','/root','/bin/bash' ],
[ 'someone','x','1022','1022','','/home/someone','/bin/bash' ]
]
)
})
- As to the goal on interfacing with sqlite, this is essentially building a customized streaming endpoint.
In this use case, implement a customized writable stream that accepts the output of parser and sends rows to the database.
Then you simply chain pipe calls as
fs.createReadStream(__dirname+'/fs_read.csv')
.pipe(parser)
.pipe(your_writable_stream)
Beware: This code returns immediately. It does not wait for the operations to finish. It interacts with a hidden event loop internal to node.js. The event loop often confuses new developers who are arriving from another language, used to a more imperative style, and skipped this part of their node.js training.
Implementing such a customized writable stream can get complicated and is left as an exercise for the reader. It will be easiest if the parser emits a row, and then the writer can be written to handle single rows. Make sure you are able to notice errors somehow and throw appropriate exceptions, or you'll be cursed with incomplete results and no warning or reason why.
A hackish way to do it would have been to replace console.log(data) in let parser = ... with a customized function writeRowToSqlite(data) that you'll have to write anyway to implement a custom stream. Because of asynchronous API issues, using return data there does not do anything useful. It certainly, as you saw, fails to put the data into the output variable.
- As to why
output in your modified posting does not contain the data...
Unfortunately, as you discovered, this is usually wrong-headed:
const output = fs.createReadStream(__dirname + "/users.csv").pipe(parser);
console.log(output);
Here, the variable output will be a ReadableStream, which is not the same as the data contained in the readable stream. Put simply, it's like when you have a file in your filesystem, and you can obtain all kinds of system information about the file, but the content contained in the file is accessed through a different call.
.importcommand to import the csv directly from sqlite. You could also spawn a process from node.js that will run a shell script and do the import procedure.