0

I have written a python(2.7) program to retreive data from a table in a database and copy it into a csv file. There are various data in non-printable format(unicode) which contain \n, \r. Because of \n, \r I am not able to retreive the data as it is in the table.

I have tried the following

str.replace('\n','').replace('\r',' ')
str.replace('\n','\\n').replace('\r', '\\r')

but it did not work out

csv code

 cur.execute('select * from db.table_name)
with open('test.csv','w') as csv_file:
    csv_writer=csv.writer(csv_file)
    for row in cur:
        print "row = ", count
        count = count + 1
        newrow=[];
        for index in range(0, len(row)):
            value= row[index]
            if(type(row[index])is str):
                 value=row[index].replace("\n"," ").replace("\r"," ")
            newrow.append(value)
       csv_writer.writerow(newrow)
12
  • I'm confused with that second replace line, what exactly do you want to happen there? Commented Jun 12, 2016 at 17:52
  • Why would you want to get rid of \r\n (they are linebreaks) and why wouldn't the replace work? please post some examples too Commented Jun 12, 2016 at 17:54
  • Show a small sample of code that generates your CSV incorrectly and we can likely show you how to fix it so these replacements are not needed. Commented Jun 12, 2016 at 17:57
  • 1
    Add a print(repr(value)) and add the output, does .replace("\\r"," ") have a different effect? Commented Jun 12, 2016 at 18:18
  • 1
    @Padraic Cunningham Thank you very much you saved my day. Commented Jun 12, 2016 at 18:28

4 Answers 4

3

str.replace() returns a new string, so you have to assign it to the original string to change it:

s = s.replace('\n','').replace('\r','')
Sign up to request clarification or add additional context in comments.

5 Comments

I am really sorry, I have used the same as above
c'mon that was a simple thing that's why I didn't mention.
@kickbhatwoski: You'd be surprised how many times the problem is something simple like that.
@kickbhatwoski You won't be surprised that people post very incomplete questions with insufficient information.
Yes Sir my bad, but I corrected my question by editing it.
2

Unicode has external serialized representations such as UTF-8 and UTF-16 and language-dependent internal implementations such as WCHAR. Your database read appears to have given you a UTF-16 serialized version of the string and all you have to do is decode it. You certainly don't want to remove the \r and \n because they are part of the multi-byte sequence and not really carriage return or newline at all.

As a simple example, I can remove all the the database and looping stuff and just work with the string you posted:

>>> value = '\r\xaeJ\x92>J\xe7\x1d\n\x89`\xc6\xf8\x9c<\x18'
>>> decoded = value.decode('UTF-16')
>>> print repr(decoded)
u'\uae0d\u924a\u4a3e\u1de7\u890a\uc660\u9cf8\u183c'
>>> print decoded
긍鉊䨾ᷧ褊왠鳸ᠼ
>>> 

8 Comments

Thank you, but @Padraic Cunningham gave the same answer few minutes ago.
Padraic asked you to post the result of print(repr(value)) which is important to figure out how to interpret the back-slashes in the example string you gave us. You mentioned you are reading unicode data and I'm not convinced that you will solve the problem without decoding the unicode into a python unicode string.
sorry I didn't get you
That's a nonsense string. Definitely not UTF-16.
It's a mix of Korean, Chinese, Mongolian and undefined codepoints...nothing coherent.
|
2

You can use regular expression to simplify your code:

For example:

import re
s = "Salut \n Comment ca va ?"
s = re.sub("\n|\r|\t", "",  s)

print(s)

Output will be as:

Salut Comment ca va ?

Comments

1

you can simply do it by adding .strip() at the end of input eg: n=input().strip() it will remove all '/r' in strings

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.