HTML purifier for Perl

P

P E Schoen

I recently discovered a security risk in my perl script that takes input
from an HTML form, saves it in a database, and converts it to HTML for
display. When I tried to add HTML to the content I was pleased with the
results, but I also considered the problems that could be caused by
incorrect format. And even worse, malicious code could be inserted for XSS
mischief.

So, I searched for a way to prevent this, and I found www.HTMLpurifier.org,
which has a PHP utility which fixes errors and blocks malicious code. But my
script is in Perl, and I didn't want to rewrite it in PHP. So, I found a way
to use the PHP script from my Perl EventProcessor.pl.

Essentially I am getting the environment variables from the form's POST, and
writing each to a "Raw.htm" file. Then I invoke the PHP script using a
heredoc as follows:

<iframe src="HTMLfilter.php"> </iframe>

and then (after a 1 second sleep to allow for processing) I read the
"Pure.htm" file back and use it for the variable. There may be better ways
to do this, but I'm just glad I got this to work, and I feel better about
having people use the submission form with this safeguard in place. If you
have any ideas as to how to do this better or in a different way, please let
me know. I could not find a Perl script to accomplish the same thing,
although there seem to be a few other utilities available, such as
http://www.delorie.com/web/purify.html.

The PHP script is as follows:

<?php

// HTMLfilter.php - PES - January 20, 2011
// This converts raw HTML to purified (safe) HTML
// Called from EventProcessor.pl
// Read Raw.htm, Write to Pure.htm

require_once 'library/HTMLPurifier.auto.php';

$config = HTMLPurifier_Config::createDefault();

// configuration goes here:
$config->set('Core.Encoding', 'UTF-8'); // replace with your encoding
$config->set('HTML.Doctype', 'XHTML 1.0 Transitional'); // replace with your
doctype

$purifier = new HTMLPurifier($config);

// untrusted input HTML read from "Raw.htm"
$fHTMLrawfile = "Raw.htm";
$fHTMLraw = fopen($fHTMLrawfile, 'r') or die("can't open file");
$html = fread($fHTMLraw, filesize($fHTMLrawfile));
fclose($fHTMLraw);

$pure_html = $purifier->purify($html);

// write purified HTML to "Pure.htm"
$fHTMLpurefile = "Pure.htm";
$fHTMLpure = fopen($fHTMLpurefile, 'w') or die("can't open file");
fwrite($fHTMLpure, $html);
fclose($fHTMLpure);

echo '<pre>' . htmlspecialchars($pure_html) . '</pre>';

// vim: et sw=4 sts=4
 
T

thomasrobert

responding to
http://www.1-script.com/forums/HTML-purifier-for-Perl-article122413--6.htm
thomasrobert wrote:
HTML Purifier is a standards-compliant HTML filter library written in PHP.
HTML
Purifier will not only remove all malicious code (better known as XSS)
with a
thoroughly audited, secure yet permissive whitelist, it will also make
sure your
documents are standards compliant, something only achievable with a
comprehensive
knowledge of W3C’s specifications.
 
P

P E Schoen

"Sherm Pendley" wrote in message
How hard did you look? I went to <http://search.cpan.org>, typed
in HTML, and in about three minutes I found HTML::Declaw on the
second page of results:

Perl generally doesn't rely on separate, self-contained scripts to
do such things. Instead, HTML::Declaw is a module that one can
use in one's own script.

I admit that I just did a quick Dogpile search rather than cpan. By that
time I already had some of the HTMLpurifier system working, and I was
impressed by the quick response from the support forum and the references
from various prestigious sites that are using it. Also, I am not very
proficient in Perl (or PHP or even HTML for that matter), so I really needed
a simple example from which I could build my application. And it was a good
learning experience to become familiar with at least a little basic PHP, and
ways to interface between PHP and Perl.

Thanks for the information.

Paul
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,955
Messages
2,570,117
Members
46,705
Latest member
v_darius

Latest Threads

Top