2

I want to use Python libraries to create UDF functions in Redshift, specifically ua-parser library.

Process of using custom Python libraries on Redshift is described here http://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_LIBRARY.html

In order to get the library with all dependencies, I used PipLibraryInstaller, by aws labs, which should put all the dependent libraries on S3, same as regular pip command.

But I cannot make ua-parser library work with this command.

I created and uploaded lib to S3 using following command

./installPipModuleAsRedshiftLibrary.sh -m ua-parser -s s3://bucket_location -r region_name

I then used following command to create the library

CREATE OR REPLACE LIBRARY ua_parser
LANGUAGE plpythonu
from 's3://bucket/ua-parser.zip'
WITH CREDENTIALS AS 'aws_access_key_id=AWS_key;aws_secret_access_key=secret_key'
region 'region_name'

Then I created function:

create function f_user_agent_parse (user_agent varchar) returns varchar IMMUTABLE 
as $$
from ua_parser import user_agent_parser as parser

parsed_string = parser.Parse(user_agent)

return type(parsed_string)
$$ 
language plpythonu;

When I try to execute the following:

select f_user_agent_parse('facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)') as s

I get the following error:

ERROR: XX000: ImportError: No module named _regexes. Please look at svl_udf_log for more information

It looks like regexes is not within the library. But, when I downloaded lib from S3, and looked into it, I see following files: enter image description here

What is the problem here? Im I doing something wring or there is a problem with the library?

2 Answers 2

1

Actually the problem was that I was running this command in windows, but it does not work from Windows environment.

It is really strange although native client for Redshift is Aginity, which runs only on Windows, but then we cannot use Python functionalities that Redshift offers

Sign up to request clarification or add additional context in comments.

Comments

0

Works for me with:

$ python --version
Python 2.7.10
$ pip --version
pip 7.1.2 from /Library/Python/2.7/site-packages/pip-7.1.2-py2.7.egg (python 2.7)

And executing the script from aws-labs:

Collecting ua-parser
  Using cached ua_parser-0.7.1-py2.py3-none-any.whl
  Saved /private/var/folders/ty/fw4v8qq54330h_b6tz47c8r40000gn/T/.ua-parser/ua_parser-0.7.1-py2.py3-none-any.whl

However, I have another problem executing the code you posted.
Upon executing the query in Redshift I got:

ERROR:  TypeError: expected string or Unicode object, type found. Please look at svl_udf_log for more information

I changed return type(parsed_string) to return parsed_string['user_agent']['family']:

db=# select f_user_agent_parse('facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)'::varchar(200));
 f_user_agent_parse
--------------------
 FacebookBot
(1 row)

Folder structure inside ua-parser.zip:

$ unzip ua-parser.zip
Archive:  ua-parser.zip
  inflating: ua_parser/__init__.py
  inflating: ua_parser/_regexes.py
  inflating: ua_parser/user_agent_parser.py
  inflating: ua_parser/user_agent_parser_test.py
  inflating: ua_parser-0.7.1.dist-info/DESCRIPTION.rst
  inflating: ua_parser-0.7.1.dist-info/metadata.json
  inflating: ua_parser-0.7.1.dist-info/top_level.txt
  inflating: ua_parser-0.7.1.dist-info/WHEEL
  inflating: ua_parser-0.7.1.dist-info/METADATA
  inflating: ua_parser-0.7.1.dist-info/RECORD

9 Comments

I know the code might not be correct, but I never reach to get that kind o error. I get that "_regexes" is not available module. I have python 2.7.12 and pip 8.1.2 So after u execute the script from the aws labs, dies it directly upload library to S3, and do you create the library using the same script to the one I provided above?
I created myself a new env with Python 2.7.12 and pip 8.1.2 but everything still works perfectly fine. I used the script from aws-labs as well and it can upload to S3 without problems. From your question it sounds like you get an error upon executing the function but now you write that it already dies upon uploading to S3? Can you clarify?
Nono, uploading to S3 works fine. Here are the steps I performed: 1) using aws-labs script, created and uploaded ua_parser.zip to S3 -> works fine 2) create library in Redshift using Aginity tool -> succeeded without errors 3) created function in Redshift using Aginity client -> succeeded without errors 4) trying to execute select statement using previously created function, (as described in the question) -> I get the error that there is no module called _regexes Thank you
I'm afraid I cannot reproduce your error then. The only difference between the steps we have performed is the client we've used (I issued all commands and statements from the command line via psql). I'd suggest to delete everything, i.e. issue drop function f_user_agent_parse (user_agent varchar) and drop library ua_parser and try again. If all else fails I could send you my S3 file -- maybe the files differ in some detail.
Can you send me a file please. I am interested in a floder structure, it might be different. Can you send it to me via personal message
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.