4

I have a CSV file can contain around million records, how can I remove columns starting with _ and generate a resulting csv

For the sake of simplicity, consider i have the below csv

Sr.No Col1 Col2 _Col3   Col4 _Col5
1     txt  png  676766  win  8787
2     jpg  pdf  565657  lin  8787
3     pdf  jpg  786786  lin  9898

I would want the output to be


Sr.No Col1 Col2 Col4
1     txt  png  win 
2     jpg  pdf  lin 
3     pdf  jpg  lin

Do i need to read the entire file to achive this or is there a better approach to do this.

const csv = require('csv-parser');
const fs = require('fs');

fs.createReadStream('data.csv')
  .pipe(csv())
  .on('data', (row) => {
    // generate a new csv with removing specific column
  })
  .on('end', () => {
    console.log('CSV file successfully processed');
  });

Any help on how can i achieve this would be helpful.

Thanks.

4 Answers 4

3

To anyone who stumbles on the post

I was able to transform the csv's using below code using fs and csv modules.

await fs.createReadStream(m.path)
      .pipe(csv.parse({delimiter: '\t', columns: true}))
      .pipe(csv.transform((input) => {
        delete input['_Col3'];
        console.log(input);
        return input;
      }))
      .pipe(csv.stringify({header: true}))
      .pipe(fs.createWriteStream(transformedPath))
      .on('finish', () => {
        console.log('finish....');
      }).on('error', () => {
        console.log('error.....');
      });

Source: https://gist.github.com/donmccurdy/6cbcd8cee74301f92b4400b376efda1d

Sign up to request clarification or add additional context in comments.

Comments

1

Try this with csv lib

const csv = require('csv');
const fs = require('fs');

const csvString=`col1,col2
               value1,value2`

csv.parse(csvString, {columns: true})
   .pipe(csv.transform(({col1,col2}) => ({col1}))) // remove col2
   .pipe(csv.stringify({header:true}))
   .pipe(fs.createWriteStream('./file.csv'))

Comments

0

Actually you can handle that by using two npm packages.

https://www.npmjs.com/package/csvtojson to convert your library to JSON format

then use this https://www.npmjs.com/package/json2csv

with the second library. If you know what are the exact fields you want. you can pass parameters to specifically select the fields you want.

const { Parser } = require('json2csv');
 
const fields = ['field1', 'field2', 'field3'];
const opts = { fields };
 
try {
  const parser = new Parser(opts);
  const csv = parser.parse(myData);
  console.log(csv);
} catch (err) {
  console.error(err);
}

Or you can modify the JSON object manually to drop those columns

2 Comments

my file can be really big 500Mb or more. Looks like your solution is loading the entire data in memory which might result in program crashing, please correct if i am wrong
of course. If you load 500MB file to memory it might crash depending on your resource availability. but both those libraries provide a mechanism to use streaming, it's documentation. there you can use streaming methods. npmjs.com/package/csvtojson#use-stream npmjs.com/package/json2csv#json2csv-async-parser-streaming-api
0

With this function I accomplished the column removal from a CSV

removeCol(csv, col) {
   let lines = csv.split("\n");
   let headers = lines[0].split(",");
   let colNameToRemove = headers.find(h=> h.trim() === col);
   let index = headers.indexOf(colNameToRemove);
   let newLines = [];
   lines.map((line)=>{
       let fields = line.split(",");
       fields.splice(index, 1)
       newLines.push(fields)
   })
   let arrData = '';
   for (let index = 0; index < newLines.length; index++) {
       const element = newLines[index];
       arrData += element.join(',') + '\n'
   }
   return arrData;
} 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.