How would you go about parsing markdown?

Posted by John Leidegren on Stack Overflow See other posts from Stack Overflow or by John Leidegren
Published on 2009-03-03T07:27:37Z Indexed on 2010/05/03 8:18 UTC
Read the original article Hit count: 439

Filed under:

markdown

|

parsing

You can find the syntax here.

The thing is, the source that follows with the download is written in perl. Which I have no intentions of honoring. It is riddled with regex and it relies on MD5 hashes to escape certain characters. Something is just wrong about that!

I'm about to hard code a parser for markdown and I'm wonder if someone had some experience with this?

Edit:

If you don't have anything meaningful to say about the actual parsing of markdown, spare me the time. (This might sound harsh, but yes, I'm looking for insight, not a solution i.e. third-party library).

To help a bit with the answers, regex are meant to identify patterns! NOT to parse an entire grammar. That people consider doing so is foobar.

If you think about markdown, it's fundamentally based around the concept of paragraphs.
As such, a reasonable approach might be to split the input into paragraphs.
There are many kinds of paragraphs e.g. heading, text, list, blockquote, code.
The challenge is thus to identify these paragraphs and in what context they occur.

I'll be back with a solution, once I find it's worthy to be shared.

© Stack Overflow or respective owner

Related posts about markdown

WMD markdown editor - HTML to Markdown conversion

as seen on Stack Overflow - Search for 'Stack Overflow'
I am using wmd markdown editor on a project and had a question: When I post the form containing the markdown text area, it (as expected) posts html to the server. However, say upon server-side validation something fails and I need to send the user back to edit their entry, is there anyway to refill… >>> More
Javascript to convert Markdown/Textile to HTML (and, ideally, back to Markdown/Textile)

as seen on Stack Overflow - Search for 'Stack Overflow'
There are several good Javascript editors for Markdown / Textile (e.g.: http://attacklab.net/showdown/, the one I'm using right now), but all I need is a Javascript function that converts a string from Markdown / Textile - HTML and back. What's the best way to do this? (Ideally it would be jQuery-friendly… >>> More
i18n / Markdown - Does Markdown support internationalization?

as seen on Stack Overflow - Search for 'Stack Overflow'
I'm building a CMS which needs to manage content in english, chinese, and spanish at a minimum. Do most markdown implementations handle UTF-8 encoded text? Is the Markdown language designed to be used with non-english languages? I'm currently using Markdown Extra by Michel Fortin. >>> More
To install Markdown's extensions by Python

as seen on Super User - Search for 'Super User'
The installation notes (git://gitorious.org/python-markdown/mainline.git) say in the file using_as_module.txt One of the parameters that you can pass is a list of Extensions. Extensions must be available as python modules either within the markdown.extensions package or on… >>> More
Error in {% markdown %} filter in Django Nonrel

as seen on Stack Overflow - Search for 'Stack Overflow'
I'm having trouble using Markdown in Django Nonrel. I followed this instructions (added 'django.contrib.markup' to INSTALLED_APPS, include {% load markup %} in the template and use |markdown filter after installing python-markdown) but I get the following error: Error in {% markdown %}… >>> More

Related posts about parsing

Hot to fix nautilus desktop on linux mint

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
so I'm using Linux Mint 13 with Cinnamon and suddenly there are no icons on the desktop and the right click doesn't work, it's like the desktop doesn't start up at all, but the Cinnamon interface and everything else are working just fine. This happens only when I open the session with Cinnamon, if… >>> More
Is parsing JSON faster than parsing XML

as seen on Stack Overflow - Search for 'Stack Overflow'
I'm creating a sophisticated JavaScript library for working with my company's server side framework. The server side framework encodes its data to a simple XML format. There's no fancy namespacing or anything like that. Ideally I'd like to parse all of the data in the browser as JSON. However, if… >>> More
Looking for a tutorial on Recursive Descent Parsing.

as seen on Stack Overflow - Search for 'Stack Overflow'
I am trying to parse some data to no success. Can anyone recommend a good introduction with a lot of examples to Recursive Descent Parsing? I haven't been able to find any. >>> More
Parsing XML with Hpricot, a Gem of a Ruby Gem

as seen on Internet.com - Search for 'Internet.com'
Need to parse complex XML documents but don't know where to begin? Leave the task to Ruby's powerful Hpricot library. >>> More
Parsing scripts that use curly braces

as seen on Programmers - Search for 'Programmers'
To get an idea of what I'm doing, I am writing a python parser that will parse directx .x text files. The problem I have deals with how the files are formatted. Although I'm writing it in python, I'm looking for general algorithms for dealing with this sort of parsing. .x files define data using… >>> More