2

I'm using a redshift user defined function to interpret text from postgresql but I get this error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128).

None of the python code actually calls decode() but it seems like its happening in the background but I don't know how to stop it from doing that.

The return type of the udf is VARCHAR.

1
  • Don't know why you got the downvote... though showing your code would be rather useful. I don't do Redshift, sorry, so can't help much. Consider contacting Amazon's support. Commented Oct 2, 2015 at 8:20

3 Answers 3

3

Since Redshift UDFs currently use Python 2.7, you need to set the default encoding.

CREATE OR REPLACE FUNCTION f_utf8_test(value VARCHAR(128))
    RETURNS VARCHAR(128)
STABLE
AS $$
  import sys
  reload(sys)
  sys.setdefaultencoding("utf-8")
  a=value
  return a
$$ LANGUAGE plpythonu;
Sign up to request clarification or add additional context in comments.

Comments

0

How you got 0xff in? Redshift encodes in UTF-8 so that shouldn't be in there. Try to locate it and track down why it's there

1 Comment

it's showing up normally in the client? if so then it's not saved as 0xff and you have to provide the code so we can see where you convert from utf to ascii
0

Redshift's Python engine is Python2, so strings are bytestrings, not unicode strings, and Redshift strangely assumes the byte-string returned from a python UDF is ASCII. You don't specify, but I assume you're returning a VARCHAR. You probably just need to call .decode('utf-8') on your python string before you return it.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.