6

I'm trying to make a desktop notifier, and for that I'm scraping news from a site. When I run the program, I get the following error.

news[child.tag] = child.encode('utf8')
AttributeError: 'xml.etree.ElementTree.Element' object has no attribute 'encode'

How do I resolve it? I'm completely new to this. I tried searching for solutions, but none of them worked for me.

Here is my code:

import requests
import xml.etree.ElementTree as ET


# url of news rss feed
RSS_FEED_URL = "http://www.hindustantimes.com/rss/topnews/rssfeed.xml"


def loadRSS():
    '''
    utility function to load RSS feed
    '''
    # create HTTP request response object
    resp = requests.get(RSS_FEED_URL)
    # return response content
    return resp.content


def parseXML(rss):
    '''
    utility function to parse XML format rss feed
    '''
    # create element tree root object
    root = ET.fromstring(rss)
    # create empty list for news items
    newsitems = []
    # iterate news items
    for item in root.findall('./channel/item'):
        news = {}
        # iterate child elements of item
        for child in item:
            # special checking for namespace object content:media
            if child.tag == '{http://search.yahoo.com/mrss/}content':
                news['media'] = child.attrib['url']
            else:
                news[child.tag] = child.encode('utf8')
        newsitems.append(news)
    # return news items list
    return newsitems


def topStories():
    '''
    main function to generate and return news items
    '''
    # load rss feed
    rss = loadRSS()
    # parse XML
    newsitems = parseXML(rss)
    return newsitems
6
  • 1
    i have not worked with xml, but the error says that child is not a string object. so before encode it seems to convert your Element instance child to some string. Commented Jun 30, 2017 at 2:42
  • 1
    simply checking the docs, how about child.text.encode? Commented Jun 30, 2017 at 2:46
  • Ya i did that too but i'm getting the same error @Leonard2 Commented Jun 30, 2017 at 2:50
  • 1
    news[child.tag] = child.text.encode('utf8') AttributeError: 'NoneType' object has no attribute 'encode' Commented Jun 30, 2017 at 2:51
  • 1
    I guess this problem can be data-dependent. I mean, it depends on whether each child has some text or not. Commented Jun 30, 2017 at 2:56

1 Answer 1

2

You're trying to convert a str to bytes, and then store those bytes in a dictionary. The problem is that the object you're doing this to is an xml.etree.ElementTree.Element, not a str.

You probably meant to get the text from within or around that element, and then encode() that. The docs suggests using the itertext() method:

''.join(child.itertext())

This will evaluate to a str, which you can then encode().

Note that the text and tail attributes might not contain text (emphasis added):

Their values are usually strings but may be any application-specific object.

If you want to use those attributes, you'll have to handle None or non-string values:

head = '' if child.text is None else str(child.text)
tail = '' if child.text is None else str(child.text)
# Do something with head and tail...

Even this is not really enough. If text or tail contain bytes objects of some unexpected (or plain wrong) encoding, this will raise a UnicodeEncodeError.

Strings versus Bytes

I suggest leaving the text as a str, and not encoding it at all. Encoding text to a bytes object is intended as the last step before writing it to a binary file, a network socket, or some other hardware.

For more on the difference between bytes and characters, see Ned Batchelder's "Pragmatic Unicode, or, How Do I Stop the Pain?" (36 minute video from PyCon US 2012). He covers both Python 2 and 3.

Example Output

Using the child.itertext() method, and not encoding the strings, I got a reasonable-looking list-of-dictionaries from topStories():

[
  ...,
  {'description': 'Ayushmann Khurrana says his five-year Bollywood journey has '
                  'been “a fun ride”; adds success is a lousy teacher while '
                  'failure is “your friend, philosopher and guide”.',
    'guid': 'http://www.hindustantimes.com/bollywood/i-am-a-hardcore-realist-and-that-s-why-i-feel-my-journey-has-been-a-joyride-ayushmann-khurrana/story-KQDR7gMuvhD9AeQTA7tbmI.html',
    'link': 'http://www.hindustantimes.com/bollywood/i-am-a-hardcore-realist-and-that-s-why-i-feel-my-journey-has-been-a-joyride-ayushmann-khurrana/story-KQDR7gMuvhD9AeQTA7tbmI.html',
    'media': 'http://www.hindustantimes.com/rf/image_size_630x354/HT/p2/2017/06/26/Pictures/actor-ayushman-khurana_24f064ae-5a5d-11e7-9d38-39c470df081e.JPG',
    'pubDate': 'Mon, 26 Jun 2017 10:50:26 GMT ',
    'title': "I am a hardcore realist, and that's why I feel my journey "
             'has been a joyride: Ayushmann...'},
]
Sign up to request clarification or add additional context in comments.

6 Comments

when i write just child.text then i get the following error in my notification program message.append(signature=signature, *args) TypeError: Expected a string or unicode object
@ani: I don't see message.append in your code anywhere. Anyway, like I highlighted in my answer, text and tail can contain any object, so don't assume they're text in any form, let alone Unicode strings. (If you did get str strings and then used the encode() method to convert them to bytes, see my update about leaving them as strings.)
No i'm using message.append in another program
when i'm using str(child.text) i'm getting the following error UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 136: ordinal not in range(128)
@ani: Remember the bit that said "may be any application-specific object"? That includes "text" in unspecified (and possibly incorrect) encodings. Are you sure you can't get what you want from itertext()? I added an example of the output I got from it.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.