Python & Pandas: appending data to new column

Question

With Python and Pandas, I'm writing a script that passes text data from a csv through the pylanguagetool library to calculate the number of grammatical errors in a text. The script successfully runs, but appends the data to the end of the csv instead of to a new column.

The structure of the csv is:

The working code is:

import pandas as pd
from pylanguagetool import api

df = pd.read_csv("Streamlit\stack.csv")

text_data = df["text"].fillna('')
length1 = len(text_data)

for i, x in enumerate(range(length1)):
    # this is the pylanguagetool operation
    errors = api.check(text_data, api_url='https://languagetool.org/api/v2/', lang='en-US')
    result = str(errors)
    # this pulls the error count "message" from the pylanguagetool json
    error_count = result.count("message")
    output_df = pd.DataFrame({"error_count": [error_count]})
    output_df.to_csv("Streamlit\stack.csv", mode="a", header=(i == 0), index=False)

The output is:

Expected output:

What changes are necessary to append the output like this?

Btw, your use of enumerate doesn't make much sense. It's simply giving you a list of tuples like [(0,0), (1,1)...]. I think what you meant to do was for i, x in enumerate(text_data) — not_speshal
– not_speshal, Commented Jul 26, 2021 at 13:49

not_speshal · Accepted Answer · 2021-07-26 14:12:13Z

4

Instead of using a loop, you might consider lambda which would accomplish what you want in one line:

df["error_count"] = df["text"].fillna("").apply(lambda x: len(api.check(x, api_url='https://languagetool.org/api/v2/', lang='en-US')["matches"]))

>>> df
   user_id  ... error_count
0       10  ...           2
1       11  ...           0
2       12  ...           0
3       13  ...           0
4       14  ...           0
5       15  ...           2

Edit:

You can write the above to a .csv file with:

df.to_csv("Streamlit\stack.csv", index=False)

You don't want to use mode="a" as that opens the file in append mode whereas you want (the default) write mode.

edited Jul 26, 2021 at 14:12

answered Jul 26, 2021 at 13:43

not_speshal

23.2k2 gold badges18 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Henry Ecker Over a year ago

Also safer taking the len of matches as the word "messages" might appear as one of the partial errors.

Daniel Hutchinson Over a year ago

Many thanks! I'm not sure where my mistake is, but I'm still encountering the problem (sorry, new to Python/Pandas). What do I need to change to the second line df to write the data to csv? I put in df.to_csv("Streamlit\stack.csv", mode="a", header=False, index=False, but the problem is replicated.

Daniel Hutchinson Over a year ago

Solved!! Many thanks for the assistance and the patience!

Muhammad Rasel · Accepted Answer · 2021-07-26 13:41:56Z

1

My strategy would be to keep the error counts in a list then create a separate column in the original database and finally write that database to csv:

text_data = df["text"].fillna('')
length1 = len(text_data)
error_count_lst = []
for i, x in enumerate(range(length1)):
    errors = api.check(text_data, api_url='https://languagetool.org/api/v2/', lang='en-US')
    result = str(errors)
    error_count = result.count("message")
    error_count_lst.append(error_count)

text_data['error_count'] = error_count_lst
text_data.to_csv('file.csv', index=False)

answered Jul 26, 2021 at 13:41

Muhammad Rasel

7244 silver badges9 bronze badges

Collectives™ on Stack Overflow

Python & Pandas: appending data to new column

2 Answers 2

Edit:

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Edit:

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related