How can I avoid UnicodeDecodeError ascii error from my Redshift Python UDF?

Question

I'm using a redshift user defined function to interpret text from postgresql but I get this error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128).

None of the python code actually calls decode() but it seems like its happening in the background but I don't know how to stop it from doing that.

The return type of the udf is VARCHAR.

Don't know why you got the downvote... though showing your code would be rather useful. I don't do Redshift, sorry, so can't help much. Consider contacting Amazon's support. — Craig Ringer
– Craig Ringer, Commented Oct 2, 2015 at 8:20

Joe Harris · Accepted Answer · 2021-01-26 18:55:58Z

3

Since Redshift UDFs currently use Python 2.7, you need to set the default encoding.

CREATE OR REPLACE FUNCTION f_utf8_test(value VARCHAR(128))
    RETURNS VARCHAR(128)
STABLE
AS $$
  import sys
  reload(sys)
  sys.setdefaultencoding("utf-8")
  a=value
  return a
$$ LANGUAGE plpythonu;

answered Jan 26, 2021 at 18:55

Joe Harris

14.1k4 gold badges49 silver badges56 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

devopslife · Accepted Answer · 2015-10-15 16:13:47Z

0

How you got 0xff in? Redshift encodes in UTF-8 so that shouldn't be in there. Try to locate it and track down why it's there

answered Oct 15, 2015 at 16:13

devopslife

6681 gold badge9 silver badges22 bronze badges

1 Comment

devopslife Over a year ago

it's showing up normally in the client? if so then it's not saved as 0xff and you have to provide the code so we can see where you convert from utf to ascii

matt2000 · Accepted Answer · 2019-03-20 21:41:45Z

0

Redshift's Python engine is Python2, so strings are bytestrings, not unicode strings, and Redshift strangely assumes the byte-string returned from a python UDF is ASCII. You don't specify, but I assume you're returning a VARCHAR. You probably just need to call .decode('utf-8') on your python string before you return it.

answered Mar 20, 2019 at 21:41

matt2000

1,07311 silver badges17 bronze badges

Collectives™ on Stack Overflow

How can I avoid UnicodeDecodeError ascii error from my Redshift Python UDF?

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related