1

I'm trying to convert a XML file to HTML using python. We have the .css file that contains the codes for the format of the output. We have been trying to run the following code:

def main():
    infile = open("WTExcerpt.xml", "r", encoding="utf8")
    headline=[]
    text = infile.readline()
    outfile = open("DemoWT.html", "w")
    print("<html>\n<head>\n<title>Winter's Tale</title>\n",file=outfile)
    print("<link rel='stylesheet' type='text/css' href='Shakespeare.css'>\n</head>\n<body>\n",file=outfile)               
    while text!="":
        #print(text)
        text = infile.readline()
        text = text.replace("<w>", "")

        if "<title>" in text and "</title>" in text:
            print("<h1>",text,"</h1>\n",file=outfile)
        elif text=="<head>":
            while text!="</head>":
                headline.append(text)
                print("<h3>headline<\h3>\n",file=outfile)       


main()

but we don't know how to make Python read "text" and "headline" as our variables (changing with every time the loop is executed) instead of a pure string. Do you have any idea? Thank you very much.

2
  • You should read about templating in Python. e.g. Jinja2 would probably make your life much easier: jinja.pocoo.org/docs/dev Commented Mar 26, 2016 at 21:28
  • I think this is one of those situations where you might get an answer to your question and it might solve your issue, but if you took a different approach the issue probably wouldn't even arise in the first place. One could point out that you're not closing your files after reading/writing, that maybe you should use with open(filename, "r") as f: for line in f: ... rather that open() and readline, that you can add the contents of headline to an h3 element by writing "<h3>{}</h3>\n".format(" ".join(headline)), etc. But really, why not just use an actual XML parsing module? Commented Mar 26, 2016 at 21:34

2 Answers 2

1

You seem already to have worked out how to output a variable along with some string literals:

print("<h1>",text,"</h1>\n",file=outfile)

or alternatively

print("<h1>{content}</h1>\n".format(content=text), file=outfile)

or just

print("<h1>" + text + "</h1>\n", file=outfile)

The problem is more with how your loop reads in the headline - you need something like a flag variable (in_headline) to keep track of whether we are currently parsing text that is inside a <head> tag or not.

def main():
    with open("WTExcerpt.xml", "r", encoding="utf8") as infile, open("DemoWT.html", "w") as outfile:
        print("<html>\n<head>\n<title>Winter's Tale</title>\n",file=outfile)
        print("<link rel='stylesheet' type='text/css' href='Shakespeare.css'>\n</head>\n<body>\n",file=outfile)
        in_headline = False          
        headline = ""
        for line in infile:
            text = line.replace("<w>", "")
            if "<title>" in text and "</title>" in text:
                print("<h1>",text,"</h1>\n",file=outfile)
            elif text=="<head>":
                in_headline = True
                headline = ""
            elif text == "</head>":
                in_headline = False
                print("<h3>", headline, "</h3>\n", file=outfile)
            elif in_headline:
                headline += text

However, it is advisable to use an xml parser instead of, effectively, writing your own. This quickly becomes a complicated exercise - for example this code will break if <title>s are ever split across multiple lines, or if anything else is ever on the same line as the <head> tag.

Sign up to request clarification or add additional context in comments.

Comments

0

couple issues I see:

1.instead of initially creating headline as an empty list, why not just set it to be assigned in the loop? 2.your 'while' loop will never complete. Instead of using a while loop, you should use a for loop like so:

def main():
    infile = open("WTExcerpt.xml", "r", encoding="utf8")
    outfile = open("DemoWT.html", "w")
    print("<html>\n<head>\n<title>Winter's Tale</title>\n",file=outfile)
    print("<link rel='stylesheet' type='text/css' href='Shakespeare.css'>\n</head>\n<body>\n",file=outfile)               
    for line in infile:
        text = line.replace("<w>", "")
        if "<title>" in text and "</title>" in text:
            print("<h1>",text,"</h1>\n",file=outfile)
        elif text=="<head>":
            in_headline = True
            headline = ""
        elif text == "</head>":
            in_headline = False
            print("<h3>", headline, "</h3>\n", file=outfile)
        elif in_headline:
            headline += text
main()

You should iterate over the file object instead of using a while loop - for 1 because the way you structured the while loop it would never end, and for 2 because it's exponentially more "Pythonic" :).

3 Comments

elif text=="<head>": while text!="</head>": is not going to do anything
whoops, didn't even notice that part. you're quite correct - I'll edit my answer
and give your answer an upvote for seeing it initially :-)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.