0

I have a script which predicts product names from input files. The code is as follows:

output_dir = "C:\\Users\\Lenovo\\.spyder-py3\\NER_training"
DIR = 'C:\\Users\\Lenovo\\.spyder-py3\\Testing\\'
print("Loading from", output_dir)
nlp2 = spacy.load(output_dir)
with open('eng_productnames.csv', newline='') as myFile:
    reader = csv.reader(myFile)
    for rowz in reader:
        try:
            filenamez = rowz[1]
            file = open(DIR+filenamez, "r", encoding ='utf-8')
            filecontentszz = file.read()
            for s in filecontentszz:
                filecontentszz = re.sub(r'\s+', ' ', filecontentszz)
                #filecontents = filecontents.encode().decode('unicode-escape')
                filecontentszz = ''.join([line.lower() for line in filecontentszz]) 
                doc2 = nlp2(filecontentszz)
                for ent in doc2.ents:
                    print(filenamez, ent.label_, ent.text)

                break

        except Exception as e:`

which gives me output in the form of a stringas:

07-09-18 N021024s16PASBUNDLEACK - Acknowledgement P.txt PRODUCT ABC1
06-22-18 Letter from Supl.txt PRODUCT ABC2
06-22-18 Letter from Req to Change .txt PRODUCT ABC3

Now I want to export all these details to a csv with 2 columns, one column as FILENAME and one column with PRODUCT having all filenames and product names under the respective column names. All product names start with PRODUCT and then the name in the string. How can I solve this:

Output csv should look like:

Filename                                                             PRODUCT
  07-09-18 Acknowledgement P.txt                                 ABC1
  06-22-18 Letter Req to Change.txt                              ABC2
3
  • can you show a sample of your input and what you expect the output to look like Commented Feb 10, 2019 at 0:24
  • I have added the sample output to the existing code. The input is the output of the for ent in doc2.ents: print(filenamez, ent.label_, ent.text) statement which returns a string like '10-26-18 Letter from Req - Written Resp.txt PRODUCT ABC3' ' Commented Feb 10, 2019 at 0:42
  • It's not clear what the relationship between the file name in your printed output and the filename column in the desired CSV is. Please can you clarify? Commented Feb 10, 2019 at 1:42

2 Answers 2

1

You can make a csv.writer to write each row to the output file, using writerow instead of printing to the screen.

output_dir = "C:\\Users\\Lenovo\\.spyder-py3\\NER_training"
DIR = 'C:\\Users\\Lenovo\\.spyder-py3\\Testing\\'
print("Loading from", output_dir)
nlp2 = spacy.load(output_dir)
with open('eng_productnames.csv', newline='') as input_file, \
        open('output.csv', 'w') as output_file:
    reader = csv.reader(input_file)
    writer = csv.writer(output_file)
    writer.writerow(["Filename", "Product"])  # this is the header row
    for rowz in reader:
        try:
            filenamez = rowz[1]
            file = open(DIR+filenamez, "r", encoding ='utf-8')
            filecontentszz = file.read()
            for s in filecontentszz:
                filecontentszz = re.sub(r'\s+', ' ', filecontentszz)
                #filecontents = filecontents.encode().decode('unicode-escape')
                filecontentszz = ''.join([line.lower() for line in filecontentszz]) 
                doc2 = nlp2(filecontentszz)
                for ent in doc2.ents:
                    writer.writerow([filenamez, ent.text])

                break

I'm assuming here that filenamez and ent.text contain the information you want in each column. If that's not the case then you can manipulate them to get what you need before writing to the CSV.

Sign up to request clarification or add additional context in comments.

Comments

0

There are many ways you can achieve this. One that I prefer is by using Pandas, which is a powerful library to work with CSV files. You can create a dictionary:

predicted_products = {'FILENAME': [], 'PRODUCT': []}

and iteratively append filenames and products to the corresponding lists.

After that is done, convert predicted_products to a DataFrame, and call to_csv function:

import Pandas as pd
predicted_products_df = pd.DataFrame.from_dict(predicted_products)
predicted_products_df.to_csv('your_path/file_name.csv')

I prefer this way, since you can edit data easier before you save the file.

To your existing code, I suppose that print(filenamez, ent.label_, ent.text) prints the output. If so then:

import Pandas as pd
output_dir = "C:\\Users\\Lenovo\\.spyder-py3\\NER_training"
DIR = 'C:\\Users\\Lenovo\\.spyder-py3\\Testing\\'
print("Loading from", output_dir)
nlp2 = spacy.load(output_dir)
predicted_products = {'FILENAME': [], 'PRODUCT': []}
with open('eng_productnames.csv', newline='') as myFile:
    reader = csv.reader(myFile)
    for rowz in reader:
        try:
            filenamez = rowz[1]
            file = open(DIR+filenamez, "r", encoding ='utf-8')
            filecontentszz = file.read()
            for s in filecontentszz:
                filecontentszz = re.sub(r'\s+', ' ', filecontentszz)
                #filecontents = filecontents.encode().decode('unicode-escape')
                filecontentszz = ''.join([line.lower() for line in filecontentszz]) 
                doc2 = nlp2(filecontentszz)
                for ent in doc2.ents:
                    print(filenamez, ent.label_, ent.text)
                    predicted_products['FILENAME'].append(filenamez + ' ' + ent.label_)
                    predicted_products['PRODUCT'].append(ent.text)
                break

        except Exception as e:

predicted_products_df = pd.DataFrame.from_dict(predicted_products)
predicted_products_df.to_csv('your_path/file_name.csv')

1 Comment

But dict object is not there, it's a string. Can you edit the part in my code for better understanding

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.