1

I have CSV file that looks like this:

account, name, email,
123, John, [email protected]
123, John, [email protected]
1234, Alex, [email protected]

I need to remove duplicate rows.I try to do it like this:

$inputHandle = fopen($inputfile, "r");
$csv = fgetcsv($inputHandle, 1000, ",");

$accounts_unique = array();

$accounts_unique = array_unique($csv);  

print("<pre>".print_r($accounts_unique, true)."</pre>");

But I get in print_r only first headers row. What needs to be done in order to make sure I 1. I clean the CSV file from duplicate rows 2. I can make some list of those duplicates (maybe store them in another CSV?)

1
  • fgetcsv only gets one row. If you need all rows, you need to loop. Commented Jul 1, 2013 at 13:48

3 Answers 3

4

Simple solution, but it requires a lot of memory if file is really big.

$lines = file('csv.csv');
$lines = array_unique($lines);
file_put_contents(implode(PHP_EOL, $lines));
Sign up to request clarification or add additional context in comments.

4 Comments

Hmm, I think I need some more logic there...How can I make note of duplicate rows?
and btw, the duplicates are not removed when I run this
@sectus -- just suggesting that you might want to use array_keys(array_flip()) or array_flip(array_flip()) rather than array_unique(), given the significant performance difference. @Alex -- array_diff_key($before, $after) will give you the dropped item keys if you used array_unique() or array_flip(array_flip()).
@Alex, sorry, changed answer (added $lines = )
1

I would go this route, which will be faster than array_unique:

$inputHandle = fopen($inputfile, "r");
$csv = trim(fgetcsv($inputHandle, 1000, ","));
$data = array_flip(array_flip($csv)); //removes duplicates that are the same
$dropped = array_diff_key($csv, $data); //Get removed items.

Note -- array_unique() and array_flip(array_flip()) will only match for duplicate lines that are exactly the same.

Updated to include information from my comments.

Comments

1

If you are going to loop the data from the CSV anyway I think it would be best to do something like this.

$dataset = array();
foreach($line as $data){
    $dataset[sha1($data)] = $data;
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.