HTML Coding With Python

Question

I'm trying to convert a XML file to HTML using python. We have the .css file that contains the codes for the format of the output. We have been trying to run the following code:

def main():
    infile = open("WTExcerpt.xml", "r", encoding="utf8")
    headline=[]
    text = infile.readline()
    outfile = open("DemoWT.html", "w")
    print("<html>\n<head>\n<title>Winter's Tale</title>\n",file=outfile)
    print("<link rel='stylesheet' type='text/css' href='Shakespeare.css'>\n</head>\n<body>\n",file=outfile)               
    while text!="":
        #print(text)
        text = infile.readline()
        text = text.replace("<w>", "")

        if "<title>" in text and "</title>" in text:
            print("<h1>",text,"</h1>\n",file=outfile)
        elif text=="<head>":
            while text!="</head>":
                headline.append(text)
                print("<h3>headline<\h3>\n",file=outfile)       


main()

but we don't know how to make Python read "text" and "headline" as our variables (changing with every time the loop is executed) instead of a pure string. Do you have any idea? Thank you very much.

You should read about templating in Python. e.g. Jinja2 would probably make your life much easier: jinja.pocoo.org/docs/dev — Assaf Lavie
– Assaf Lavie, Commented Mar 26, 2016 at 21:28
I think this is one of those situations where you might get an answer to your question and it might solve your issue, but if you took a different approach the issue probably wouldn't even arise in the first place. One could point out that you're not closing your files after reading/writing, that maybe you should use with open(filename, "r") as f: for line in f: ... rather that open() and readline, that you can add the contents of headline to an h3 element by writing "<h3>{}</h3>\n".format(" ".join(headline)), etc. But really, why not just use an actual XML parsing module? — jDo
– jDo, Commented Mar 26, 2016 at 21:34

Community · Accepted Answer · 2017-05-23 12:23:50Z

You seem already to have worked out how to output a variable along with some string literals:

print("<h1>",text,"</h1>\n",file=outfile)

or alternatively

print("<h1>{content}</h1>\n".format(content=text), file=outfile)

or just

print("<h1>" + text + "</h1>\n", file=outfile)

The problem is more with how your loop reads in the headline - you need something like a flag variable (in_headline) to keep track of whether we are currently parsing text that is inside a <head> tag or not.

def main():
    with open("WTExcerpt.xml", "r", encoding="utf8") as infile, open("DemoWT.html", "w") as outfile:
        print("<html>\n<head>\n<title>Winter's Tale</title>\n",file=outfile)
        print("<link rel='stylesheet' type='text/css' href='Shakespeare.css'>\n</head>\n<body>\n",file=outfile)
        in_headline = False          
        headline = ""
        for line in infile:
            text = line.replace("<w>", "")
            if "<title>" in text and "</title>" in text:
                print("<h1>",text,"</h1>\n",file=outfile)
            elif text=="<head>":
                in_headline = True
                headline = ""
            elif text == "</head>":
                in_headline = False
                print("<h3>", headline, "</h3>\n", file=outfile)
            elif in_headline:
                headline += text

However, it is advisable to use an xml parser instead of, effectively, writing your own. This quickly becomes a complicated exercise - for example this code will break if <title>s are ever split across multiple lines, or if anything else is ever on the same line as the <head> tag.

n1c9 · Accepted Answer · 2016-03-26 21:21:48Z

0

couple issues I see:

1.instead of initially creating headline as an empty list, why not just set it to be assigned in the loop? 2.your 'while' loop will never complete. Instead of using a while loop, you should use a for loop like so:

def main():
    infile = open("WTExcerpt.xml", "r", encoding="utf8")
    outfile = open("DemoWT.html", "w")
    print("<html>\n<head>\n<title>Winter's Tale</title>\n",file=outfile)
    print("<link rel='stylesheet' type='text/css' href='Shakespeare.css'>\n</head>\n<body>\n",file=outfile)               
    for line in infile:
        text = line.replace("<w>", "")
        if "<title>" in text and "</title>" in text:
            print("<h1>",text,"</h1>\n",file=outfile)
        elif text=="<head>":
            in_headline = True
            headline = ""
        elif text == "</head>":
            in_headline = False
            print("<h3>", headline, "</h3>\n", file=outfile)
        elif in_headline:
            headline += text
main()

You should iterate over the file object instead of using a while loop - for 1 because the way you structured the while loop it would never end, and for 2 because it's exponentially more "Pythonic" :).

edited Mar 26, 2016 at 21:21

answered Mar 26, 2016 at 21:14

n1c9

2,6874 gold badges35 silver badges55 bronze badges

3 Comments

Stuart Over a year ago

elif text=="<head>": while text!="</head>": is not going to do anything

n1c9 Over a year ago

whoops, didn't even notice that part. you're quite correct - I'll edit my answer

n1c9 Over a year ago

and give your answer an upvote for seeing it initially :-)

Collectives™ on Stack Overflow

HTML Coding With Python

2 Answers 2

Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related