The Simplest PHP Markdown Parser

<?php /*! * ======================================================= * Author : Taufik Nurrohman * URL : https://github.com/tovic * License : MIT * ======================================================= * * -- CODE: ---------------------------------------------- * * echo parseMD('this is a **bold** text'); * * ------------------------------------------------------- * */ // escape function __MDE($str, $x) { return preg_replace('#([' . preg_quote($x, '#') . '])#', '\\\$1', $str); } // un-escape function __MDD($str, $x) { return preg_replace('#\\\\([' . preg_quote($x, '#') . '])#', '$1', $str); } // main function function parseMD($content) { // character(s) to escape $x = '`~!#^*()-_+={}[]:\'"<>.'; // URL pattern $url = '(?:ht|f)tps?:\/\/[a-z0-9\-._~!$&\'()*+,;=:\/?\#[\]@%]+'; // normalize white-space $content = trim(str_replace(array("\r\n", "\r"), "\n", $content)); // parse code block $content = preg_replace_callback('#^( {4}|\t)(.*?)$#m', function($matches) use($x) { $s = str_replace("\t", ' ', htmlentities($matches[2], ENT_NOQUOTES)); return '<pre><code>' . __MDE($s, $x) . '</code></pre>'; }, $content); // parse code inline $content = preg_replace_callback('#`([^\n\\\]+?)`#', function($matches) use($x) { $s = htmlentities($matches[1], ENT_NOQUOTES); return '\\<code>' . __MDE($s, $x) . '</code>'; }, $content); // parse image and link $content = preg_replace_callback('#(!)?\[(.*?)\]\((.*?)( +([\'"])(.*?)\5)?\)#', function($matches) use($x) { $s2 = $matches[2]; $s3 = __MDE($matches[3], $x); $s6 = ! empty($matches[4]) && ! empty($matches[6]) ? __MDE($matches[6], $x) : ""; $s6 = $s6 ? ' title="' . htmlentities($s6) . '"' : ""; if( ! empty($matches[1])) { $str = '\\<img alt="' . htmlentities($s2) . '" src="' . $s3 . '"' . $s6 . '>'; } else { $str = '\\<a href="' . $s3 . '"' . $s6 . '>' . $s2 . '</a>'; } return $str; }, $content); // parse link $content = preg_replace_callback('#<(' . $url . ')>#', function($matches) use($x) { return '<a href="' . __MDE($matches[1], $x) . '">' . $matches[1] . '</a>'; }, $content); // parse header(s) $content = preg_replace_callback('#^(\#{1,6})\s*([^\#]+?)\s*\#*$#m', function($matches) { $i = strlen($matches[1]); return '<h' . $i . '>' . $matches[2] . '</h' . $i . '>'; }, $content); $content = preg_replace( array( // parse ATX header(s) '#^(.+?)\n={2,}$#m', '#^(.+?)\n-{2,}$#m', // parse horizontal rule '#^ {0,3}([*\-+] *){3,}$#m', // parse bold-italic text '#([*_]{2})([*_])([^\n\\\]+?)\2\1#', // parse bold text '#([*_]{2})([^\n\\\]+?)\1#', // parse italic text '#([*_])([^\n\\\]+?)\1#', // parse strike text // '#(~{2})([^\n\\\]+?)\1#', // parse unordered-list '#^ *[*\-+] +(.*?)$#m', // parse ordered-list '#^ *\d+\. +(.*?)$#m', // clean-up list ... '#\s*<\/(ol|ul)>\s*<\1>\s*#', // parse blockquote '#^(?:>|>) +(.*?)$#m', // clean-up blockquote ... '#\s*<\/blockquote>\s*<blockquote>\s*#', // clean-up code block ... '#<\/code><\/pre>(\s*)<pre><code(>| .*?>)#', // parse two or more white-space(s) at the end of text into a line break '#(\S) {2,}\n#' ), array( '<h1>$1</h1>', '<h2>$1</h2>', '<hr>', '\\<strong><em>$3</em></strong>', '\\<strong>$2</strong>', '\\<em>$2</em>', // '\\<del>$2</del>', "<ul>\n <li>$1</li>\n</ul>", "<ol>\n <li>$1</li>\n</ol>", "\n ", "<blockquote>\n <p>$1</p>\n</blockquote>", "\n ", '$1', "$1<br>\n" ), $content); // parse new-line to paragraph foreach($content = explode("\n\n", $content) as &$line) { if( $line !== "" // not empty && strpos($line, ' ') !== 0 // not a code block && strpos($line, "\t") !== 0 // --ibid && strpos($line, '<') !== 0 // not a HTML tag ) { $line = '<p>' . trim($line) . '</p>'; } } $content = implode("\n\n", $content); // typography (anything outside the HTML tag) $content = preg_replace_callback('#(^|<\/?[a-z]+[^>\n]*?>)(.*?)(<\/?[a-z]+[^>\n]*?>|$)#', function($matches) { $s = str_replace( array( '&', '<', '>', '---', '--', '...' ), array( '&', '<', '>', '—', '–', '…' ), $matches[2]); return $matches[1] . preg_replace( array( '#\'([^\'"]*?)\'#', '#"([^"]*?)"#', '#\b\'#', '#\'\b#', '#&(.*?);#' // restore the encoded html entity ), array( '‘$1’', '“$1”', '’', '‘', '&$1;' ), $s) . $matches[3]; }, $content); // un-escape character(s) $content = __MDD($content, $x); // output the result return $content; } // test code ... /* Title Here ========== Sub-Title Here -------------- Lorem ipsum **bold** dolor *italic* sit ***bold-italic*** amet. I have a [link](http://example.com) here, a [link](http://example.com "Example Title") with title attribute is also possible. Another test [`link`](http://example.com/foo?q=this+is+not+an_italic_text). # Heading 1 ## Heading 2 ### Heading 3 #### Heading 4 ##### Heading 5 ###### Heading 6 ####### Heading 7 (invalid) # Heading 1 # ## Heading 2 ## ### Heading 3 ### #### Heading 4 #### ##### Heading 5 ##### ###### Heading 6 ###### ####### Heading 7 (invalid) ####### ## Heading 2 #### Lorem ipsum __bold__ dolor _italic_ sit ___bold-italic___ amet, consectetuer <mark>adipiscing</mark> elit, sed diam nonummy nibh euismod tincidunt ut <abbr title="Hyper Text Markup Language">HTML</abbr> laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat. --- ---- ----- - - - - - - - - - - - - --- --- --- --- (this is a code block) *** +++ Lorem ipsum `code` dolor sit amet, consectetuer `code with <html dir="ltr"> tag in it` adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat. * list item 1 * list item 2 * list item 3 * not a list item. Lorem ipsum dolor sit amet. 1. list item 1 2. list item 2 3. list item 3 4. not a list item. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat. ![image 1](http://lorempixel.com/200/200/animals/1) ![image 2](http://lorempixel.com/200/200/animals/1 "Example Title") Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat. * list item 1 * list item 2 * this simple plugin ... * ... does not have ability ... * ... to parse nested list-item * list item 3 Lorem ipsum dolor sit amet. 1. list item 1 2. list item 2 1. this simple plugin ... 2. ... does not have ability ... 3. ... to parse nested list-item 3. list item 3 Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat. Some text with a line break. This text does not have a line break. Lorem ipsum dolor sit amet, consectetuer <http://example.com> adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat. > Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. > Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat. > > This simple plugin does not ... > > > ... have ability to parse ... > > ... nested blockquote > Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. > Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat. Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat. <!-- Do not parse **Markdown** pattern in _code block_. This is just a sample `code`. --> <!DOCTYPE html> <html dir="ltr"> <head> <meta charset="utf-8"> <title>Test HTML</title> </head> <body> <h1>Page Title</h1> <p>Page content.</p> <p>This is not a [link](http://example.com)</p> </body> </html> <!-- W . H . I . T . E . S . P . A . C . E --> <p>wow</p> Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat. Test for Long Line-Break ======================== Paragraph 1 Paragraph 2 Paragraph 3 Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat. Test for Plain Text =================== Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat. 4 > 1 I & You I ♥ U 8 < 21 Proper Typography Characters ---------------------------- Lorem "ipsum dolor" sit amet, consectetuer adipiscing elit, sed 'diam nonummy nibh euismod' tincidunt ut laoreet dolore magna aliquam erat volutpat. Ut wisi "eni'm ad minim" veniam, quis 'nostrud exerci tation ullamcorper suscipit 2009--2016 nisl ut ---aliquip ex--- ea commodo consequat... */
The simplest Markdown parser written in PHP.

1 Response

Update: now support fenced code block and table markup :)

Write a comment

You can use [html][/html], [css][/css], [php][/php] and more to embed the code. Urls are automatically hyperlinked. Line breaks and paragraphs are automatically generated.