JSON -> csv creating header line and padding header if found empty field

Question

I have a programme in bash that get JSONline files with several million of these object per line (See source)

{
  "company_number": "09626947",
  "data": {
    "address": {
      "address_line_1": "Troak Close",
      "country": "England",
      "locality": "Christchurch",
      "postal_code": "BH23 3SR",
      "premises": "9",
      "region": "Dorset"
    },
    "country_of_residence": "United Kingdom",
    "date_of_birth": {
      "month": 11,
      "year": 1979
    },
    "etag": "7123fb76e4ad7ee7542da210a368baa4c89d5a06",
    "kind": "individual-person-with-significant-control",
    "links": {
      "self": "/company/09626947/persons-with-significant-control/individual/FFeqke7T3LvGvX6xmuGqi5SJXAk"
    },
    "name": "Ms Angela Lynette Miller",
    "name_elements": {
      "forename": "Angela",
      "middle_name": "Lynette",
      "surname": "Miller",
      "title": "Ms"
    },
    "nationality": "British",
    "natures_of_control": [
      "significant-influence-or-control"
    ],
    "notified_on": "2016-06-06"
  }
}

I have my JQ query that looks like this:

for file in psc_chunk_*; do
jq --slurp --raw-output 'def pad($n): range(0;$n) as $i | 
.[$i]; ([.[] | .data.natures_of_control | length] | max) as $mx |
.[] | 
select(.data) |
[.company_number, .data.kind, .data.address.address_line_1, .data.address.country, .data.address.locality, .data.address.postal_code, .data.address.premises, .data.identification.country_registered, .data.identification.legal_authority, .data.identification.legal_form, .data.identification.place_registered, .data.identification.registration_number, .data.ceased_on, .data.country_of_residence, "\(.data.date_of_birth.year)-\(.data.date_of_birth.month)", .data.etag, .data.links.self, .data.name, .data.name_elements.title, .data.name_elements.forename, .data.name_elements.middle_name, .data.name_elements.surname, .data.nationality, .data.notified_on, (.data.natures_of_control | pad($mx))] |
@csv' $file > $file.csv;
done

Which is probably hurting the eyes of many JQ pros out there - it is not efficient in extracting key:value pairs and if the provider happens to change name of a key my code wouldn't work anymore.

Is there a way to just flatten all the json into a csv keeping the keys as headers - with the extra difficulty that there is a list natures_of_control which has a varying number of entries (for which i used the pad function to get a rectangular result).

aborruso · Accepted Answer · 2020-01-06 21:00:28Z

1

Try using Miller (https://github.com/johnkerl/miller), running

mlr --j2c unsparsify input.json>output.csv

answered Jan 6, 2020 at 21:00

aborruso

3,01514 silver badges31 bronze badges

Add a comment |

Stack Exchange Network

JSON -> csv creating header line and padding header if found empty field

1 Answer 1

You must log in to answer this question.

Hot Network Questions

JSON -> csv creating header line and padding header if found empty field

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions