python error when writing data in csv file

Question

write a python program to write data in .csv file,but find that every item in the .csv has a "b'" before the content, and there are blank line, I do not know how to remove the blank lines; and some item in the .csv file are unrecognizable characters,such as "b'\xe7\xbe\x85\xe5\xb0\x91\xe5\x90\x9b'", because some data are in Chinese and Japanese, so I think maybe something wrong when writing these data in the .csv file.Please help me to solve the problem the program is:

#write data in .csv file
def data_save_csv(type,data,id_name,header,since = None):
    #get the date when storage data
    date_storage()
    #create the data storage directory
    csv_parent_directory = os.path.join("dataset","csv",type,glovar.date)
    directory_create(csv_parent_directory)
    #write data in .csv
    if type == "group_members":
        csv_file_prefix = "gm"
    if since:
        csv_file_name = csv_file_prefix + "_" + since.strftime("%Y%m%d-%H%M%S") + "_" + time_storage() + id_name + ".csv"
    else:
        csv_file_name = csv_file_prefix + "_"  + time_storage() + "_" + id_name + ".csv"
    csv_file_directory = os.path.join(csv_parent_directory,csv_file_name)

    with open(csv_file_directory,'w') as csvfile:

        writer = csv.writer(csvfile,delimiter=',',quotechar='"',quoting=csv.QUOTE_MINIMAL)

        #csv header
        writer.writerow(header)

        row = []
        for i in range(len(data)):
            for k in data[i].keys():
                row.append(str(data[i][k]).encode("utf-8"))
            writer.writerow(row)
            row = []

the .csv file

Because you are explicitly appending bytes to your row: str(data[i][k]).encode("utf-8") Just remove encode — juanpa.arrivillaga
– juanpa.arrivillaga, Commented Jan 9, 2017 at 20:52
You do str(data[i][k]).... what is this thing in data[i][k] that needs to be cast to a string? — tdelaney
– tdelaney, Commented Jan 9, 2017 at 20:58
@tdelaney well, whatever it is, it's converted to bytes right after. I assume it's a string... For example, those bytes decoded give '羅少君' — juanpa.arrivillaga
– juanpa.arrivillaga, Commented Jan 9, 2017 at 20:59
@juanpa.arrivillaga using values is certainly better but there is no need to cast to str. CSV does that. Better yet, use the csv.DictWriter. — tdelaney
– tdelaney, Commented Jan 9, 2017 at 21:19

tdelaney · Accepted Answer · 2017-01-09 21:12:45Z

1

You have a couple of problems. The funky "b" thing happens because csv will cast data to a string before adding it to a column. When you did str(data[i][k]).encode("utf-8"), you got a bytes object and its string representation is b"..." and its filled with utf-8 encoded data. You should handle encoding when you open the file. In python 3, open opens a file with the encoding from sys.getdefaultencoding() but its a good idea to be explicit about what you want to write.

Next, there's nothing that says that two dicts will enumerate their keys in the same order. The csv.DictWriter class is built to pull data from dictionaries, so use it instead. In my example I assumed that header has the names of the keys you want. It could be that header is different, and in that case, you'll also need to pass in the actual dict key names you want.

Finally, you can just strip out empty dicts while you are writing the rows.

with open(csv_file_directory,'w', newline='', encoding='utf-8') as csvfile:
    writer = csv.DictWriter(csvfile, fieldnames=header, delimiter=',',
        quotechar='"',quoting=csv.QUOTE_MINIMAL)
    writer.writeheader()
    writer.writerows(d for d in data if d)

answered Jan 9, 2017 at 21:12

tdelaney

77.9k6 gold badges91 silver badges129 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

bin Over a year ago

and why there are blank lines,how to remove the blank lines

tdelaney Over a year ago

@bin You would get empty lines if some of the dictionaries didn't have anything in them. I filter out empty ones with d for d in data if d.

bin Over a year ago

hi tdelaney,I modify program as you said,but in the csv file there are still unrecognizable characters such as à¸ˆà¸µà¹€à¸™à¸µà¸¢à¸£à¹Œà¸ª à¸à¸²à¸£à¹Œà¹€à¸Šà¸à¸£à¹Œ, and there are still blank lines

tdelaney Over a year ago

You may need a different encoding such as utf-16 or the data could be bad in the dict itself. Try to figure out which of the dicts in data is producing the odd data, print it, and post that in your post. As for the blank lines, is it every other line? What tool did you use to read it? My example (if you did newline='') writes \r\n per line but that may confuse your reader. Do print(open(csv_file_directory, 'rb').read(500)) to see what the line endings look like.

bin Over a year ago

hi tdelaney, there is a item in data - id, it's a long number, but I find that in .csv file, this number is always missing the last digit, for example, the id should be 1030715900405221, but in the .csv file ,it's 103071590040522,misiing the last "1", the id number are all more than 10E15, a very long number, does it has some limit for the length of number in .csv file

|

Laughing Horse · Accepted Answer · 2017-01-09 20:56:22Z

0

It sounds like at least some of your issues have to do with incorrect unicode.

try implementing the snippet below into your existing code. As the comment say, the first part takes your input and converts it into utf-8.

The second bit will return your output in the expected format of ascii.

import codecs
import unicodedata

f = codecs.open('path/to/textfile.txt', mode='r',encoding='utf-8') #Take input and turn into unicode
    for line in f.readlines():
    line = unicodedata.normalize('NFKD', line).encode('ascii', 'ignore'). #Ensure output is in ASCII

edited Jan 9, 2017 at 20:56

answered Jan 9, 2017 at 20:52

Laughing Horse

15811 bronze badges

2 Comments

Laughing Horse Over a year ago

I interpreted the "unreadable" to mean that the text hadn't been converted back to ascii /humanreadable plain text

juanpa.arrivillaga Over a year ago

Right, but ASCII won't be able to handle all human readable text, and is rather limited, and the OP explicitly mentioned that they will need Chines and Japanese characters...

Collectives™ on Stack Overflow

python error when writing data in csv file

2 Answers 2

6 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related