html2text is a very simple script that uses PHP's DOM methods to load from HTML, and then iterates over the resulting DOM to correctly output plain text. For example:
<html> <title>Ignored Title</title> <body> <h1>Hello, World!</h1> <p>This is some e-mail content. Even though it has whitespace and newlines, the e-mail converter will handle it correctly. <p>Even mismatched tags.</p> <div>A div</div> <div>Another div</div> <div>A div<div>within a div</div></div> <a href="http://foo.com">A link</a> </body> </html>
Will be converted into:
Hello, World! This is some e-mail content. Even though it has whitespace and newlines, the e-mail converter will handle it correctly. Even mismatched tags. A div Another div A div within a div [A link](http://foo.com)
See the original blog post or the related StackOverflow answer.
You can use Composer to add the package to your project:
{ "require": { "soundasleep/html2text": "~0.5" } }
And then use it quite simply:
$text = Html2Text\Html2Text::convert($html);
You can also include the supplied html2text.php
and use $text = convert_html_to_text($html);
instead.
Some very basic tests are provided in the tests/
directory. Run them with composer install --dev && vendor/bin/phpunit
.
html2text
is dual licensed under both EPL v1.0 and LGPL v3.0, making it suitable for both Eclipse and GPL projects.
Also see html2text_ruby, a Ruby implementation.