Extract HTML Attributes with Regex in PHP

<?php /*! Based on <https://github.com/mecha-cms/cms/blob/master/system/kernel/converter.php> */ function extract_html_attributes($input) { if( ! preg_match('#^(<)([a-z0-9\-._:]+)((\s)+(.*?))?((>)([\s\S]*?)((<)\/\2(>))|(\s)*\/?(>))$#im', $input, $matches)) return false; $matches[5] = preg_replace('#(^|(\s)+)([a-z0-9\-]+)(=)(")(")#i', '$1$2$3$4$5<attr:value>$6', $matches[5]); $results = array( 'element' => $matches[2], 'attributes' => null, 'content' => isset($matches[8]) && $matches[9] == '</' . $matches[2] . '>' ? $matches[8] : null ); if(preg_match_all('#([a-z0-9\-]+)((=)(")(.*?)("))?(?:(\s)|$)#i', $matches[5], $attrs)) { $results['attributes'] = array(); foreach($attrs[1] as $i => $attr) { $results['attributes'][$attr] = isset($attrs[5][$i]) && ! empty($attrs[5][$i]) ? ($attrs[5][$i] != '<attr:value>' ? $attrs[5][$i] : "") : $attr; } } return $results; } /** * Usage: * ------ * * var_dump(extract_html_attributes('<div id="foo">content</div>')); * var_dump(extract_html_attributes('<img src="foo.jpg">')); * */
Although some people would suggest you to use the DOM module, but this one is safe enough.

Be the first to comment

You can use [html][/html], [css][/css], [php][/php] and more to embed the code. Urls are automatically hyperlinked. Line breaks and paragraphs are automatically generated.