0

I'm working on a pandas dataframe that contains 3 columns named : drugName, review and rating. I'm trying to get the review according to its rate, if it is higher or equal to 6, so it is a positive review that I must write it in a csv file. Here is my code :

import csv
import pandas as pd
filename ="C:\\Users\Amin Chaari\Desktop\Book1.csv"

def user_text(filename):
     with open (filename, encoding="utf8") as f:
          datas = csv.reader(f, delimiter = ';')
          lines = [row for row in datas]
user={}
try:
    for i in range(1,5):
        if lines[0][i] != 'condition':
                print(lines[0][i])
                grouped_column = []
                for j,row in enumerate(lines):
                    if j>0:
                        grouped_column.append(row[i])
                        user.update({lines[0][i]:grouped_column})
except IndexError:
      pass
df1=pd.DataFrame(user)
df1.groupby(['review'])
return df1
df=user_text(filename)
for i in range (0,40303):
df['rating'][i]=float(df['rating'][i])

for i in range(0,40303):
if df['rating'][i] >= 6: 
   df['review'].to_csv("C:\\Users\\rev_pos.csv",encoding='utf8')

this the error that I get :

 AttributeError: 'str' object has no attribute 'to_csv'

3 Answers 3

1

Change the end of your code to the following:

df.loc[df['rating'][i] >= 6, 'review'].to_csv("C:\\Users\\rev_pos.csv",encoding='utf8')

This code filters the 'review' column by the 'rating' and then saves the result to a CSV all at once.

Sign up to request clarification or add additional context in comments.

2 Comments

do you mean only the last line or all the for loop ?
The last 3 lines, or all of the loop.
0

I cannot write it as a comment but here are some further suggestions for your code:

  • use the read_csv functionality of the pandas module instead of the csv module
import pandas as pd


def user_text(filename):
    df = pd.read_csv(filename, sep=';')
    return df
  • specify the datatype upon read-in instead of iterating over the array
import pandas as pd
import numpy as np
...
# assume the columns are called a and b
df = pd.read_csv(filename, sep=';', dtype={'a': np.float32, 'b': np.float32})
  • iterate over dataframes with df.iterrows
for i, row in df.iterrows():
    do_something(row)

Hope that helps

Comments

0

I've found a way to fix this problem, here is the code :

for i in range (0,40303):
if df.rating[i] >= 6:
    pos_rev.append(df.review[i])

df1=pd.DataFrame(pos_rev)
file2="C:/Users/Amin Chaari/Desktop/pos.csv"
df1.to_csv(file2,sep='\t',encoding='utf8')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.