0

Currently I am writing a script in Python 2.7 that works fine except for after running it for a few seconds it runs into an error:

Enter Shopify website URL (without HTTP):  store.highsnobiety.com
Scraping! Check log file @ z:\shopify_output.txt to see output.
!!! Also make sure to clear file every hour or so !!!
Copper Bracelet - 3mm - Polished ['3723603267']
Traceback (most recent call last):
  File "shopify_sitemap_scraper.py", line 38, in <module>
    print(prod, variants).encode('utf-8')
AttributeError: 'NoneType' object has no attribute 'encode'

The script is to get data from a Shopify website and then print it to console. Code here:

# -*- coding: utf-8 -*-
from __future__ import print_function
from lxml.html import fromstring
import requests
import time
import sys

reload(sys)
sys.setdefaultencoding('utf-8')

# Log file location, change "z://shopify_output.txt" to your location.
logFileLocation = "z:\shopify_output.txt"

log = open(logFileLocation, "w")

# URL of Shopify website from user input (for testing, just use store.highsnobiety.com during input)
url = 'http://' + raw_input("Enter Shopify website URL (without HTTP):  ") + '/sitemap_products_1.xml'

print ('Scraping! Check log file @ ' + logFileLocation + ' to see output.')
print ("!!! Also make sure to clear file every hour or so !!!")
while True :

    page = requests.get(url)
    tree = fromstring(page.content)

    # skip first url tag with no image:title
    url_tags =  tree.xpath("//url[position() > 1]")

    data = [(e.xpath("./image/title//text()")[0],e.xpath("./loc/text()")[0]) for e in  url_tags]

    for prod, url in data:
    # add xml extension to url
        page = requests.get(url + ".xml")
        tree = fromstring(page.content)
        variants = tree.xpath("//variants[@type='array']//id[@type='integer']//text()")
        print(prod, variants).encode('utf-8')

The most crazy part about it is that when I take out the .encode('utf-8') it gives me a UnicodeEncodeError seen here:

Enter Shopify website URL (without HTTP):  store.highsnobiety.com
Scraping! Check log file @ z:\shopify_output.txt to see output.
!!! Also make sure to clear file every hour or so !!!
Copper Bracelet - 3mm - Polished ['3723603267']
Copper Bracelet - 5mm - Brushed ['3726247811']
Copper Bracelet - 7mm - Polished ['3726253635']
Highsnobiety x EARLY - Leather Pouch ['14541472963', '14541473027', '14541473091']
Traceback (most recent call last):
  File "shopify_sitemap_scraper.py", line 38, in <module>
    print(prod, variants)
  File "C:\Python27\lib\encodings\cp437.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\xae' in position 13: character maps to <undefined>'

Any ideas? Have no idea what else to try after hours of googling.

2 Answers 2

1

snakecharmerb almost got it, but missed the cause of your first error. Your code

print(prod, variants).encode('utf-8')

means you print the values of the prod and variants variables, then try to run the encode() function on the output of print. Unfortunately, print() (as a function in Python 2 and always in Python 3) returns None. To fix it, use the following instead:

print(prod.encode("utf-8"), variants)
Sign up to request clarification or add additional context in comments.

2 Comments

Still getting the "AttributeError: 'list' object has no attribute 'encode'" with the new code
@DanielYveson sorry, I didn't realize that variants was a list. See my edited answer above.
1

Your console has a default encoding of cp437, and cp437 is unable to represent the character u'\xae'.

>>> print (u'\xae')
®
>>> print (u'\xae'.encode('utf-8'))
b'\xc2\xae'
>>> print (u'\xae'.encode('cp437'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.5/encodings/cp437.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character '\xae' in position 0: character maps to <undefined>

You can see that it's trying to convert to cp437 in the traceback: File "C:\Python27\lib\encodings\cp437.py", line 12, in encode

(I reproduced the problem in Python3.5, but it's the same issue in both versions of Python)

1 Comment

See @MattDMo's answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.