P
P E Schoen
I recently discovered a security risk in my perl script that takes input
from an HTML form, saves it in a database, and converts it to HTML for
display. When I tried to add HTML to the content I was pleased with the
results, but I also considered the problems that could be caused by
incorrect format. And even worse, malicious code could be inserted for XSS
mischief.
So, I searched for a way to prevent this, and I found www.HTMLpurifier.org,
which has a PHP utility which fixes errors and blocks malicious code. But my
script is in Perl, and I didn't want to rewrite it in PHP. So, I found a way
to use the PHP script from my Perl EventProcessor.pl.
Essentially I am getting the environment variables from the form's POST, and
writing each to a "Raw.htm" file. Then I invoke the PHP script using a
heredoc as follows:
<iframe src="HTMLfilter.php"> </iframe>
and then (after a 1 second sleep to allow for processing) I read the
"Pure.htm" file back and use it for the variable. There may be better ways
to do this, but I'm just glad I got this to work, and I feel better about
having people use the submission form with this safeguard in place. If you
have any ideas as to how to do this better or in a different way, please let
me know. I could not find a Perl script to accomplish the same thing,
although there seem to be a few other utilities available, such as
http://www.delorie.com/web/purify.html.
The PHP script is as follows:
<?php
// HTMLfilter.php - PES - January 20, 2011
// This converts raw HTML to purified (safe) HTML
// Called from EventProcessor.pl
// Read Raw.htm, Write to Pure.htm
require_once 'library/HTMLPurifier.auto.php';
$config = HTMLPurifier_Config::createDefault();
// configuration goes here:
$config->set('Core.Encoding', 'UTF-8'); // replace with your encoding
$config->set('HTML.Doctype', 'XHTML 1.0 Transitional'); // replace with your
doctype
$purifier = new HTMLPurifier($config);
// untrusted input HTML read from "Raw.htm"
$fHTMLrawfile = "Raw.htm";
$fHTMLraw = fopen($fHTMLrawfile, 'r') or die("can't open file");
$html = fread($fHTMLraw, filesize($fHTMLrawfile));
fclose($fHTMLraw);
$pure_html = $purifier->purify($html);
// write purified HTML to "Pure.htm"
$fHTMLpurefile = "Pure.htm";
$fHTMLpure = fopen($fHTMLpurefile, 'w') or die("can't open file");
fwrite($fHTMLpure, $html);
fclose($fHTMLpure);
echo '<pre>' . htmlspecialchars($pure_html) . '</pre>';
// vim: et sw=4 sts=4
from an HTML form, saves it in a database, and converts it to HTML for
display. When I tried to add HTML to the content I was pleased with the
results, but I also considered the problems that could be caused by
incorrect format. And even worse, malicious code could be inserted for XSS
mischief.
So, I searched for a way to prevent this, and I found www.HTMLpurifier.org,
which has a PHP utility which fixes errors and blocks malicious code. But my
script is in Perl, and I didn't want to rewrite it in PHP. So, I found a way
to use the PHP script from my Perl EventProcessor.pl.
Essentially I am getting the environment variables from the form's POST, and
writing each to a "Raw.htm" file. Then I invoke the PHP script using a
heredoc as follows:
<iframe src="HTMLfilter.php"> </iframe>
and then (after a 1 second sleep to allow for processing) I read the
"Pure.htm" file back and use it for the variable. There may be better ways
to do this, but I'm just glad I got this to work, and I feel better about
having people use the submission form with this safeguard in place. If you
have any ideas as to how to do this better or in a different way, please let
me know. I could not find a Perl script to accomplish the same thing,
although there seem to be a few other utilities available, such as
http://www.delorie.com/web/purify.html.
The PHP script is as follows:
<?php
// HTMLfilter.php - PES - January 20, 2011
// This converts raw HTML to purified (safe) HTML
// Called from EventProcessor.pl
// Read Raw.htm, Write to Pure.htm
require_once 'library/HTMLPurifier.auto.php';
$config = HTMLPurifier_Config::createDefault();
// configuration goes here:
$config->set('Core.Encoding', 'UTF-8'); // replace with your encoding
$config->set('HTML.Doctype', 'XHTML 1.0 Transitional'); // replace with your
doctype
$purifier = new HTMLPurifier($config);
// untrusted input HTML read from "Raw.htm"
$fHTMLrawfile = "Raw.htm";
$fHTMLraw = fopen($fHTMLrawfile, 'r') or die("can't open file");
$html = fread($fHTMLraw, filesize($fHTMLrawfile));
fclose($fHTMLraw);
$pure_html = $purifier->purify($html);
// write purified HTML to "Pure.htm"
$fHTMLpurefile = "Pure.htm";
$fHTMLpure = fopen($fHTMLpurefile, 'w') or die("can't open file");
fwrite($fHTMLpure, $html);
fclose($fHTMLpure);
echo '<pre>' . htmlspecialchars($pure_html) . '</pre>';
// vim: et sw=4 sts=4