1

I have a JS script that pulls data from a public Google Sheets feed which returns data in a sort of JSON-CSV format I need to parse.

Rows are comma-separated but commas inside each item are not escaped, e.g:

"a: Feb 21, 10:11, b: some content, c: more, d: even more"

Desired outcome:

{
  "a": "Feb 21, 10:11",
  "b": "some content",
  "c": "more",
  "d": "even more"
 }

Split attempt

data.split(',') returns:

{
  "a: Feb 21,
  "10:11",
  "b some content",
  "c: more",
  "d: even more"
 }

Regex attempt

The following regex is the closest I could get but it still wraps the comma but I need it to wrap the date to the beginning of the next occurrence of a property: (?[^,]+): (?[^,]+)

Attempt image - Regex101

4 Answers 4

1

See EDIT below for new answer.


The stated output is not deterministic, a comma can be either part of a cell or a separator, and there is no way to know which one it is.

Instead of JSON, you can publish a sheet as a .csv as follows:

  • select sheet to publish, such as Sheet2
  • select format Comma-separated values (.csv)

After that you can access the CSV data with URL:

https://docs.google.com/spreadsheets/d/e/<spreadsheet-id>/pub?gid=<tab-id>&single=true&output=csv

Example sheet:

| A1 cell   | B1 |
| A2, stuff | B2 |

Example returned CSV content:

A1 cell,B1
"A2, stuff",B2

There are plenty of tools to parse CSV, such as https://github.com/peterthoeny/parse-csv-js


EDIT, after learning that Esteban can't make changes to the data source:

You can use a split and forEach to construct the desired object:

const input = 'a: Feb 21, 10:11, b: some content, c: more, d: even more';
let result = {};
input.split(/, (?=[a-z]+:)/)
.forEach(item => {
  let key = item.replace(/:.*/, '');
  let val = item.replace(/.*?: */, '');
  result[key] = val;
});
console.log(JSON.stringify(result, null, ' '));

Output:

{
 "a": "Feb 21, 10:11",
 "b": "some content",
 "c": "more",
 "d": "even more"
}

Explanation:

  • let result = {}; - initialize and empty result object
  • .split(/, (?=[a-z]+:)/) - split between key/value pairs using a positive lookahead for an alpha: key
  • .forEach():
    • extract the key and value from the item
    • add new property to result object, property name is key
Sign up to request clarification or add additional context in comments.

2 Comments

I can't make changes to the data source so I have to parse the data via Javascript/jQuery
@EstebanOrtega: See EDIT in updated answer.
0

Here is a way to go:

(?<tag>\w+): (?<value>.+?)(?:,(?= \w+:)|$)

Demo & explanation

1 Comment

This is perfect, @Toto! I was able to implement it in Javascript like this. Still having issues with a euro sign, but this answers my question. Thank you! imgur.com/EEz23BG
0

Supposing that the sample string from the post were in cell A2, try this:

=SPLIT(REGEXREPLACE(" "&A2,",\s([\w]+[\d]*:)","|$1"),"|")

The plain-English explanation of the REGEX is this: "Find a comma followed by a space followed by some number of word-type characters followed by a possible single digit followed by a comma. If found, replace the leading comma and space with a pipe symbol and keep the rest as it was."

The the SPLIT acts on that pipe symbol.

A space is prepended to the entire string first so that "space followed by comma" can be found for that first label.

4 Comments

How would I apply this regex in my script? I can't make changes to the data source so I have to parse the data via Javascript/jQuery
What do you mean by not being able to make changes to the data source? Are you able to copy and paste the raw data from the data source to another sheet? Or use IMPORTRANGE to bring it into another sheet and process it there? The parameters of your situation are not clear from your post (nor the nature or context of the script). I think the best I can offer is the REGEX above. How that fits into the larger scope of your real-world project, script, etc., I'll have to leave with you. Or perhaps you can submit another post with more complete details and someone else can add to what I've provided.
I am programmatically pulling (jquery fetch) data that happens to come from a Google sheet's feed. I simply need a regular expression to capture the second part of the date (text containing commas) here: i.sstatic.net/h7fwX.png
See my newly added post.
0

Try this:

(?[^,\s]+): (?.+:\d\d|[^,]+)

The key here is to use the pipe symbol to create an or that first looks for the specific pattern of the date-time string and then the pattern of everything else (the one you started with).

In your regex101 builder, it looked like this:

enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.