7

I already managed to split the CSV file using this regex: "/,(?=(?:[^\"]\"[^\"]\")(?![^\"]\"))/"

But I ended up with an array of strings that contain the opening and ending double quotes. Now I need a regex that would strip those strings of the delimiter double quotes.

As far as I know the CSV format can encapsulate strings in double quotes, and all the double quotes that are already a part of the string are doubled. For example:

My "other" cat

becomes

"My ""other"" cat"

What I basically need is a regex that will replace all sequences of N doublequotes with a sequence of (N/2 - rounded down) double quotes.

Or is there a better way ? Thanks in advance.

6 Answers 6

20

There is function for reading csv files: fgetcsv

Sign up to request clarification or add additional context in comments.

4 Comments

+1 You're crazy to use regex for CSV in PHP when there's a built-in function that does exactly what you want.
Yes. Why do you want to re-invent the wheel when there is something out there which is very well tested and which works would solve your issue.
Because maybe you get a CSV export from a 3rd party who doesn't quote text fields correctly and fgetcsv incorrectly interprets the string 1.15 as a float with the value of 1.1499999999. However, in the end it was easier to write a quick script to fix the CSV file and then use fgetcsv :o)
fgetcsv doesn't do alright when the data is DBCS character such as Chinese, it will eat prefix SBCS character from DBCS character. One has to declare setlocale correctly first. So I prefer regular expression solution
4

Why do you bother splitting the file with regex when there's fgetcsv function that does all the hard work for you?

You can pass in the separator and delimiter and it will detect what to do.

1 Comment

Yes, as simple as the CSV format is, processing it with regexes is annoyingly awkward. If you've got a purpose-made parser available, by all means use that.
2

I agree with the others who said you should use the fgetcsv function instead of regexes. A regex may work okay on well-formed CSV data, but if the CSV is malformed or corrupt, the regex will silently fail, probably returning bogus results in the process.

However, the question was specifically about stripping unwanted quotation marks after the initial split. The one proposed solution (so far) is too naive, and it only deals the escaped quotes inside a field, not the actual delimiters. (I know the OP didn't ask about those, but they do need to be removed, so why not do them at the same as the others?) Here's my solution:

$csv_field = preg_replace('/"(.|$)/', '\1', $csv_field);

This regex matches a quotation mark followed by any character or by the end of the string, and replaces the matched character(s) with the second character, or with the empty string if it was the $ that matched. According to the spec, CSV fields can contain line separators; that doesn't seem to happen much, but you can add the 's' modifier to the regex if you need to.

Comments

2

For those of you who wan't to use regex instead of fgetcsv. Here is a complete example how to create a html table from csv using a regex.

    $data = file_get_contents('test.csv');
    $pieces = explode("\n", $data);

    $html .= "<table border='1'>\n";
    foreach (array_filter($pieces) as $line) {

            $html .= "<tr>\n";
            $keywords = preg_split('/,(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))/', $line,-1,PREG_SPLIT_DELIM_CAPTURE);

            foreach ($keywords as $col) {
                    $html .= "<td>".trim($col, '"')."</td>\n";
            }
            $html .= "</tr>\n";
    }
    $html .= "</table>\n";

Comments

2
preg_split('/,(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))/', $line,-1,PREG_SPLIT_DELIM_CAPTURE);

Has Problems with " inside of strings like "Toys"R"Us"

So u should use instead:

preg_split('/'.$seperator.'(?=(?:[^\"])*(?![^\"]))/', $line,-1, PREG_SPLIT_DELIM_CAPTURE);

2 Comments

This doesn't remove double quotes around the string and convert the double quotes (expressed as either "" or \") within the string. So I add this code: array_walk($m, create_function('&$item,$key','$item = str_replace(array(\'""\',\'\\"\'),\'"\',trim($item, \'"\'));')); , where m is the resulting array of the preg_split statement (note: I use create_function due to php version may < 5.3)
This doesn't work for csv line with comma in string.
0

Here's my quick attempt at it, although it will only work on word boundaries.

preg_replace('/([\W]){2}\b/', '\1', $csv)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.