1

I am completely new to python but I found a package that I need to use and am testing it. The python package in question is pywurfl.

I have created a simple code based on the example given by reading the User-agent (UA) strings from a column in a simple text file. There are a very large number of UAs (some might have foreign characters). Now the file containing the UAs has been produced with the bash output command ">" and a perl script. For example, perl somescript.pl > outfile.txt.

However, when running the following code in that file I get an error.

#!/usr/bin/python

import fileinput
import sys

from wurfl import devices
from pywurfl.algorithms import LevenshteinDistance


for line in fileinput.input():
    line = line.rstrip("\r\n")    # equiv of chomp
    H = line.split('\t')

    if H[27]=='Mobile':

        user_agent = H[23].decode('utf8')           
        search_algorithm = LevenshteinDistance()
        device = devices.select_ua(user_agent, search=search_algorithm)

        sys.stdout.write( "%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s" % (user_agent, device.devid, device.devua, device.fall_back, device.actual_device_root, device.brand_name, device.marketing_name, device.model_name, device.device_os, device.device_os_version, device.mobile_browser, device.mobile_browser_version, device.model_extra_info, device.pointing_method, device.has_qwerty_keyboard, device.is_tablet, device.has_cellular_radio, device.max_data_rate, device.wifi, device.dual_orientation, device.physical_screen_height, device.physical_screen_width,device.resolution_height, device.resolution_width, device.full_flash_support, device.built_in_camera, device.built_in_recorder, device.receiver, device.sender, device.can_assign_phone_number, device.is_wireless_device, device.sms_enabled) + "\n")

    else:
        # do something else
        pass

Here H[23] is the column that has the UA string. but I get an error that looks like

UnicodeDecodeError: 'utf8' codec can't decode byte 0xa9 in position 0: unexpected code byte

When I replaced 'utf8' with 'latin1' I got the following error

 sys.stdout.write(................) # with the .... as in the code
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa9' in position 0: ordinal not in range(128).

Am I doing anything wrong here? I need to convert the UA string in Unicode because the package is so. I am not too well versed in Unicode, especially in python. How would I handle this error? For instance, find out the UA string that is giving this error so that I can make a more informed question?

1 Answer 1

2

Looks like you have 2 separate problems.

The first is that you're assuming the input file is utf-8, when it's not. Changing the input coding to latin-1 addresses that issue.

The second issue is that your stdout seems to be set up for ascii output, so the write fails. For that, this question may help.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.