0

With Python and Pandas, I'm writing a script that passes text data from a csv through the pylanguagetool library to calculate the number of grammatical errors in a text. The script successfully runs, but appends the data to the end of the csv instead of to a new column.

The structure of the csv is:

CSV1

The working code is:

import pandas as pd
from pylanguagetool import api

df = pd.read_csv("Streamlit\stack.csv")

text_data = df["text"].fillna('')
length1 = len(text_data)

for i, x in enumerate(range(length1)):
    # this is the pylanguagetool operation
    errors = api.check(text_data, api_url='https://languagetool.org/api/v2/', lang='en-US')
    result = str(errors)
    # this pulls the error count "message" from the pylanguagetool json
    error_count = result.count("message")
    output_df = pd.DataFrame({"error_count": [error_count]})
    output_df.to_csv("Streamlit\stack.csv", mode="a", header=(i == 0), index=False)

The output is:

CSV2

Expected output:

CSV3

What changes are necessary to append the output like this?

1
  • 1
    Btw, your use of enumerate doesn't make much sense. It's simply giving you a list of tuples like [(0,0), (1,1)...]. I think what you meant to do was for i, x in enumerate(text_data) Commented Jul 26, 2021 at 13:49

2 Answers 2

4

Instead of using a loop, you might consider lambda which would accomplish what you want in one line:

df["error_count"] = df["text"].fillna("").apply(lambda x: len(api.check(x, api_url='https://languagetool.org/api/v2/', lang='en-US')["matches"]))

>>> df
   user_id  ... error_count
0       10  ...           2
1       11  ...           0
2       12  ...           0
3       13  ...           0
4       14  ...           0
5       15  ...           2

Edit:

You can write the above to a .csv file with:

df.to_csv("Streamlit\stack.csv", index=False)

You don't want to use mode="a" as that opens the file in append mode whereas you want (the default) write mode.

Sign up to request clarification or add additional context in comments.

3 Comments

Also safer taking the len of matches as the word "messages" might appear as one of the partial errors.
Many thanks! I'm not sure where my mistake is, but I'm still encountering the problem (sorry, new to Python/Pandas). What do I need to change to the second line df to write the data to csv? I put in df.to_csv("Streamlit\stack.csv", mode="a", header=False, index=False, but the problem is replicated.
Solved!! Many thanks for the assistance and the patience!
1

My strategy would be to keep the error counts in a list then create a separate column in the original database and finally write that database to csv:

text_data = df["text"].fillna('')
length1 = len(text_data)
error_count_lst = []
for i, x in enumerate(range(length1)):
    errors = api.check(text_data, api_url='https://languagetool.org/api/v2/', lang='en-US')
    result = str(errors)
    error_count = result.count("message")
    error_count_lst.append(error_count)

text_data['error_count'] = error_count_lst
text_data.to_csv('file.csv', index=False)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.