0

I am new to python. I have xml file as an input to my python script "html.xml". I made an array which contains html tags:

from xml.etree import ElementTree
tree = ElementTree.parse("html.xml")
olp = tree.findall("//tag_Name")
mylist = [t.text for t in olp]
print mylist

The output is:

[' !--...-- ', ' !DOCTYPE ', ' a ', ' abbr ', ' acronym ', ' address ', ' applet
 ', ' area ', ' article ', ' aside ', ' audio ', ' b ', ' base ', ' basefont ',
' bdi ', ' bdo ', ' big ', ' blockquote ', ' body ', ' br ', ' button ', ' canva
s ', ' caption ', ' center ', ' cite ', ' code ', ' col ', ' colgroup ', ' comma
nd ', ' datalist ', ' dd ', ' del ', ' details ', ' dfn ', ' dialog ', ' dir ',
' div ', ' dl ', ' dt ', ' em ', ' embed ', ' fieldset ', ' figcaption ', ' figu
re ', ' font ', ' footer ', ' form ', ' frame ', ' frameset ', ' h1 to h6 ', ' h
ead ', ' header ', ' hgroup ', ' hr ', ' html ', ' i ', ' iframe ', ' img ', ' i
nput ', ' ins ', ' kbd ', ' keygen ', ' label ', ' legend ', ' li ', ' link ', '
 map ', ' mark ', ' menu ', ' meta ', ' meter ', ' nav ', ' noframes ', ' noscri
pt ', ' object ', ' ol ', ' optgroup ', ' option ', ' output ', ' p ', ' param '
, ' pre ', ' progress ', ' q ', ' rp ', ' rt ', ' ruby ', ' s ', ' samp ', ' scr
ipt ', ' section ', ' select ', ' small ', ' source ', ' span ', ' strike ', ' s
trong ', ' style ', ' sub ', ' summary ', ' sup ', ' table ', ' tbody ', ' td ',
 ' textarea ', ' tfoot ', ' th ', ' thead ', ' time ', ' title ', ' tr ', ' trac
k ', ' tt ', ' u ', ' ul ', ' var ', ' video ', ' wbr ']

From the above array, I want to randomly select some tags and want to make a tree like: (As an example) Root node 'abbr' with child nodes 'a' and 'option'

'a' with child nodes 'video' and 'title'

'option' with child nodes 'output' and 'source' ......

Basically from the tree, I want to generate html page.

Can anyone tell me how I can do that? what should be the code in python? I am using python 2.7

1
  • could you more explain about what you want? Commented Mar 10, 2013 at 19:31

3 Answers 3

2

Have a look at BeautifulSoup, it will probably do what you want. The documentation is excellent.

BeautifulSoup does XML too.

If you genuinely want pseudo-random selection of tags from that list (why?), then you need to do:

import random

a_random_tag = random.choice(list_of_tags)
Sign up to request clarification or add additional context in comments.

2 Comments

they dont have example for xml as an input. They take the html page as an input. Do you have any example related to that? Can you tell me the python code for random selection of tags from above array and generating tree out of that.
I've edited my answer to address some of your concerns. Have a read of the BeautifulSoup documentation properly, you'll learn a lot in the process and should then be able to do what you want.
1

if you interested in making a tree in python from parsed data, you could use autovivification:

Autovivification is a distinguishing feature of the many programming languages involving the dynamic creation of data structures.

from collections import defaultdict

def tree():
    return defaultdict(tree)

lupin = tree()
lupin["express"][3] = "stand and deliver"

https://en.wikipedia.org/wiki/Autovivification

2 Comments

how can I create a tree from the above array i-e "mylist"?
Add in post your input xml
0

To randomly select tags from mylist in python, you can do this loop

import random

while len(mylist) > 0:
    idx = random.randint(1,len(mylist))-1
    tag = mylist[idx]

    # this next line is critical or the loop will never exit
    del mylist[idx]  # this removes it from the list


   ... do whatever you want with tag (add to your tree, create a new node, etc)...

There are other ways too, but that should get you going and you can optimize it from there

6 Comments

There's no need to faff about with randint, just use choice.
did you remember to import random? (edit: looks like the comment I was replying to was deleted)
@PaulEtherton: yes, just trying to show the mechanics of it though.
yes actually i ran it again but its still running and not stopping.. I think it got stuck somewhere. I use your code after my code as: while len(mylist) > 0: idx = random.randint(1,len(mylist))-1 tag = mylist[idx] print tag
any idea y it is stuck in while loop
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.