2

I am writing a script to convert HTML to AMP. And have this code:

#!/usr/bin/python3

import argparse
from amp_tools import TransformHtmlToAmp
import codecs

arg_parser = argparse.ArgumentParser( description = "Copy source_file as target_file." )
arg_parser.add_argument( "source_file" )
arg_parser.add_argument( "target_file" )
arguments = arg_parser.parse_args()

source = arguments.source_file
target = arguments.target_file
html = ""
with codecs.open(source, encoding='utf-8', mode='r+') as f:
    for line in f:
        html = html + line.rstrip()
        
valid_amp = str(TransformHtmlToAmp(html)())

with codecs.open(target, encoding='utf-8', mode='w+') as f:
    f.write(valid_amp.rstrip())
    f.seek(0)

#print(str(valid_amp))
print( target, "successfully created !!" )

Now, this works but the file is saved is enclosed in b''. I don't want that. Is there way to avoide quotes in the output file?

Sample input: <!doctype html> <html lang="en"> <head> <title>News Article</title> <link href="base.css" rel="stylesheet" /> <script type="text/javascript" src="base.js"></script> </head> <body> <header> News Site </header> <article> <h1>Article Name</h1> <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam egestas tortor sapien, non tristique ligula accumsan eu.</p> </article> <img src="https://www.travelmanagers.com.au/wp-content/uploads/2012/08/AdobeStock_254529936_Railroad-to-Denali-National-Park-Alaska_750x500.jpg"> </body> </html>

Output: b'<div lang="en" class="amp-text"> <head> <title>News Article</title> <link href="base.css" rel="stylesheet"> <script type="text/javascript" src="base.js"></script> </head> <body> <header> News Site </header> <article> <h1>Article Name</h1> <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam egestas tortor sapien, non tristique ligula accumsan eu.</p> </article> <amp-img src="https://www.travelmanagers.com.au/wp-content/uploads/2012/08/AdobeStock_254529936_Railroad-to-Denali-National-Park-Alaska_750x500.jpg" width="750" height="500" layout="responsive"></amp-img> </body></div>'

1
  • 1
    Use the .decode() method, not str() to convert a byte string to a string. Commented Aug 5, 2022 at 14:51

1 Answer 1

1

You should substitute the row:

valid_amp = str(TransformHtmlToAmp(html)())

with:

valid_amp = bytes(TransformHtmlToAmp(html)()).decode("utf-8") 
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.