1

I have a large dataset with one column, where each row contains text and I would like to transform each row to a json object and then dump all of them to a folder path. So, the folder path will contain as many json files as the rows of the dataset, with every json file containing the id and the text of every row of the dataset.

Is this possible? Because, for similar cases I only saw how to create one huge json object - and this is not what I want in this case. Here is my code so far:

SOLVED

import pandas as pd
import os
import sys
from os.path import expanduser as ospath
import simplejson as json
import numpy as np



sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '..', '..')))
data_folder = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "data", "model", 'Final.xlsx'))
single_response = pd.read_excel(ospath(data_folder), sheetname='Sheet 1')
answers_path = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "processes"))


class MyEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, np.int64):
            return int(obj)
        elif isinstance(obj, np.float):
            return float(obj)
        elif isinstance(obj, np.ndarray):
            return obj.tolist()
        elif isinstance(obj, dict):
            return dict(obj)
        else:
            return super(MyEncoder, self).default(obj)

#TODO: function will return idx and text and dump json files for each answer (idx, value) to "answers" path

def create_answer_process(Answer, idx):
    #answers = []
    for idx, value in single_response.iterrows():
        answer = {
            "id": idx,
            "pattern": value['Answer']
        }
        #answers.append(answer)

        #process = json.dumps(answers, cls=MyEncoder, indent=2)

        with open(os.path.join(answers_path, str(idx)) + '.json', 'w') as f:
             json.dump(answer, f, cls=MyEncoder, indent=2)

    return idx

Thanks @keredson !

1 Answer 1

1

You look pretty close. The problem is this line:

process = json.dumps(answers, cls=MyEncoder, indent=2)

You should dump answer, not answers. You likely don't need answers at all. So something like:

def create_answer_process(Answer, idx):
    for idx, value in single_response.iterrows():
        answer = {
            "id": idx,
            "pattern": value['Answer']
        }
        with open(os.path.join(answers_path, idx), 'w') as f:
          json.dump(answer, f, cls=MyEncoder, indent=2)
    return idx
Sign up to request clarification or add additional context in comments.

3 Comments

What you suggest, makes sense. I changed my code, but now i get TypeError: join() argument must be str or bytes, not 'int64' here: with open(os.path.join(answers_path, idx), 'w') as f: . If I change this line to with open(os.path.join(answers_path), 'w') as f: I get a Permission denied error and if I change it to with open((answers_path, idx), 'w') as f: I get TypeError: expected str, bytes or os.PathLike object, not tuple
In this case : with open(os.path.join(answers_path), 'w') as f: I don't know why I get a Permission denied error. I wonder if it has to do with the fact that each JSON object that will be dumped, should be given a name / extension. So, my answers_path, is a path to the folder I wanna store all the JSON objects, and if I were to create a huge JSON object, I would use something like that: answers_path = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", "processes", "processes.json")). But I want each JSON to be dumped separately. How to reform my path?
just os.path.join(answers_path) because you're trying to open a dir for writing, not a file in the dir. the first error is because idx not a string. what filename would you like in the dir? os.path.join(answers_path, str(idx)) should do it, as would os.path.join(answers_path, '%s.json'%idx), etc.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.