Parsing and updating markdown file with Python

Question

I'm creating a script that will traverse a markdown file and update the any image tags from

![Daffy Duck](http://www.nonstick.com/wp-content/uploads/2015/10/Daffy.gif)

to

![Daffy Duck](http://www.nonstick.com/wp-content/uploads/2015/10/Daffy.gif?alt-text="Daffy Duck")

I'm new to Python, so I'm unsure about syntax and my approach, but my current thinking is to create an new empty string, traverse the original markdown line by line, if an image tag is detected splice the alt text to the correct location and add the lines to the new markdown string. The code I have so far looks like:

import markdown
from markdown.treeprocessors import Treeprocessor
from markdown.extensions import Extension


originalMarkdown = '''
## New Article
Lorem ipsum dolor sit amet, consectetur adipiscing elit. In pretium nunc ligula. Quisque bibendum vel lectus sed pulvinar. Phasellus id magna ac arcu iaculis facilisis. Curabitur tincidunt sed ipsum vel lacinia. Nulla et semper urna. Quisque ultrices hendrerit magna nec tempor. 

![Daffy Duck](http://www.nonstick.com/wp-content/uploads/2015/10/Daffy.gif)
Quisque accumsan sem mi. Nunc orci justo, laoreet vel metus nec, interdum euismod ipsum. 
![Bugs Bunny](http://www.nationalnannies.com/wp-content/uploads/2012/03/bugsbunny.png)
 Suspendisse augue odio, pharetra ac erat eget, volutpat ornare velit. Sed ac luctus quam. Sed id mauris erat. Duis lacinia faucibus metus, nec vehicula metus consectetur eu.
'''

updatedMarkdown = ""

# First create the treeprocessor
class AltTextExtractor(Treeprocessor):
    def run(self, doc):
        "Find all alt_text and append to markdown.alt_text. "
        self.markdown.alt_text = []
        for image in doc.findall('.//img'):
         self.markdown.alt_text.append(image.get('alt'))

# Then traverse the markdown file and concatenate the alt text to the end of any image tags
class ImageTagUpdater(Treeprocessor):
    def run(self, doc):
      # Set a counter
      count = 0
      # Go through markdown file line by line
        for line in doc:
          # if line is an image tag
          if line > ('.//img'):
            # grab the array of the alt text
            img_ext = ImgExtractor(md)
            # find the second to last character of the line
            location = len(line) - 1
            # insert the alt text
            line += line[location] + '?' +  '"' + img_ext[count] +  '"'
            # add line to new markdown file 
        updatedMarkdownadd.add(line)

The above code is pseudo code. I'm able to successfully extract the strings I need from the original file but I'm unable to concatenate those strings to their respective image tags and update the original file.

What exactly is your question? What is your code doing wrong? Please give examples. — glibdud
– glibdud, Commented May 29, 2019 at 17:23
code is meant to be pseudo. i'm able to successfully extract the strings i need but i'm unable to concatenate those strings to the image tags and save the original file. — unfollow
– unfollow, Commented May 29, 2019 at 17:27
What markdown module are you using (since Python does not come with one)? — martineau
– martineau, Commented May 29, 2019 at 17:59
Seems to me, from reading the module's documentation on extensions, that you could probably do what you want in one step by preprocessing the input file with your own custom Treeprocessor subclass. — martineau
– martineau, Commented May 29, 2019 at 23:47

Jason Shaffner · Accepted Answer · 2019-05-29 17:50:14Z

2

Provided your files aren't huge, it might be easier to overwrite the file, rather than try to wedge little bits in here or there.

orig = '![Daffy Duck](http://www.nonstick.com/wp-content/uploads/2015/10/Daffy.gif)'
new = '![Daffy Duck](http://www.nonstick.com/wp-content/uploads/2015/10/Daffy.gif?alt-text="Daffy Duck")'

with open(filename, 'r') as f:
    text = f.readlines()
    new_text = "\n".join([line if line != orig else new for line in text])

with open(filename, 'w') as f:    
    f.write(new_text)

You could also use regex re.sub, but I suppose its a matter of preference.

answered May 29, 2019 at 17:50

Jason Shaffner

1092 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Parsing and updating markdown file with Python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related