0

write a python program to write data in .csv file,but find that every item in the .csv has a "b'" before the content, and there are blank line, I do not know how to remove the blank lines; and some item in the .csv file are unrecognizable characters,such as "b'\xe7\xbe\x85\xe5\xb0\x91\xe5\x90\x9b'", because some data are in Chinese and Japanese, so I think maybe something wrong when writing these data in the .csv file.Please help me to solve the problem the program is:

#write data in .csv file
def data_save_csv(type,data,id_name,header,since = None):
    #get the date when storage data
    date_storage()
    #create the data storage directory
    csv_parent_directory = os.path.join("dataset","csv",type,glovar.date)
    directory_create(csv_parent_directory)
    #write data in .csv
    if type == "group_members":
        csv_file_prefix = "gm"
    if since:
        csv_file_name = csv_file_prefix + "_" + since.strftime("%Y%m%d-%H%M%S") + "_" + time_storage() + id_name + ".csv"
    else:
        csv_file_name = csv_file_prefix + "_"  + time_storage() + "_" + id_name + ".csv"
    csv_file_directory = os.path.join(csv_parent_directory,csv_file_name)

    with open(csv_file_directory,'w') as csvfile:

        writer = csv.writer(csvfile,delimiter=',',quotechar='"',quoting=csv.QUOTE_MINIMAL)

        #csv header
        writer.writerow(header)

        row = []
        for i in range(len(data)):
            for k in data[i].keys():
                row.append(str(data[i][k]).encode("utf-8"))
            writer.writerow(row)
            row = []

the .csv file

10
  • using pycharm which use python 3.6 Commented Jan 9, 2017 at 20:43
  • Because you are explicitly appending bytes to your row: str(data[i][k]).encode("utf-8") Just remove encode Commented Jan 9, 2017 at 20:52
  • You do str(data[i][k]).... what is this thing in data[i][k] that needs to be cast to a string? Commented Jan 9, 2017 at 20:58
  • @tdelaney well, whatever it is, it's converted to bytes right after. I assume it's a string... For example, those bytes decoded give '羅少君' Commented Jan 9, 2017 at 20:59
  • 1
    @juanpa.arrivillaga using values is certainly better but there is no need to cast to str. CSV does that. Better yet, use the csv.DictWriter. Commented Jan 9, 2017 at 21:19

2 Answers 2

1

You have a couple of problems. The funky "b" thing happens because csv will cast data to a string before adding it to a column. When you did str(data[i][k]).encode("utf-8"), you got a bytes object and its string representation is b"..." and its filled with utf-8 encoded data. You should handle encoding when you open the file. In python 3, open opens a file with the encoding from sys.getdefaultencoding() but its a good idea to be explicit about what you want to write.

Next, there's nothing that says that two dicts will enumerate their keys in the same order. The csv.DictWriter class is built to pull data from dictionaries, so use it instead. In my example I assumed that header has the names of the keys you want. It could be that header is different, and in that case, you'll also need to pass in the actual dict key names you want.

Finally, you can just strip out empty dicts while you are writing the rows.

with open(csv_file_directory,'w', newline='', encoding='utf-8') as csvfile:
    writer = csv.DictWriter(csvfile, fieldnames=header, delimiter=',',
        quotechar='"',quoting=csv.QUOTE_MINIMAL)
    writer.writeheader()
    writer.writerows(d for d in data if d)
Sign up to request clarification or add additional context in comments.

6 Comments

and why there are blank lines,how to remove the blank lines
@bin You would get empty lines if some of the dictionaries didn't have anything in them. I filter out empty ones with d for d in data if d.
hi tdelaney,I modify program as you said,but in the csv file there are still unrecognizable characters such as จีเนียร์ส อาร์เชอร์, and there are still blank lines
You may need a different encoding such as utf-16 or the data could be bad in the dict itself. Try to figure out which of the dicts in data is producing the odd data, print it, and post that in your post. As for the blank lines, is it every other line? What tool did you use to read it? My example (if you did newline='') writes \r\n per line but that may confuse your reader. Do print(open(csv_file_directory, 'rb').read(500)) to see what the line endings look like.
hi tdelaney, there is a item in data - id, it's a long number, but I find that in .csv file, this number is always missing the last digit, for example, the id should be 1030715900405221, but in the .csv file ,it's 103071590040522,misiing the last "1", the id number are all more than 10E15, a very long number, does it has some limit for the length of number in .csv file
|
0

It sounds like at least some of your issues have to do with incorrect unicode.

try implementing the snippet below into your existing code. As the comment say, the first part takes your input and converts it into utf-8.

The second bit will return your output in the expected format of ascii.

import codecs
import unicodedata

f = codecs.open('path/to/textfile.txt', mode='r',encoding='utf-8') #Take input and turn into unicode
    for line in f.readlines():
    line = unicodedata.normalize('NFKD', line).encode('ascii', 'ignore'). #Ensure output is in ASCII

2 Comments

I interpreted the "unreadable" to mean that the text hadn't been converted back to ascii /humanreadable plain text
Right, but ASCII won't be able to handle all human readable text, and is rather limited, and the OP explicitly mentioned that they will need Chines and Japanese characters...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.