RXParse module (by robic0), Version 0.1000

R

robic0

Version 0.1000
Here it is. About %85 complete.
Not quite there yet. I know how to finish it but too lazy
right now. I may get a hair up my butt and finish it in a 2
day timeframe.

Whats not finished yet:
- Entities.. (I don't care to explain why, has placeholder)
- Attlist.. (same, has placeholder
- Dtd.. (same, matter of what I'm going to do in the way of a combined html package or not)
- Encoding (same, I'm %90 on this subject, but looking at the %10 for overall integration)
- other misc.. (same, its in my head)
- Class/package interface.. (I haven't digested/read the total bullshit on this yet, have to cut aside a day for this)

The parse framework for Xml/Xhtml is completely there. The standard used is Xml 1.1
The code, though incomplete, is %100 compliant.

This is hard to benchmark because it depends on the entities and complexity. In general,
I used a very complex html file for harcore numbers that runs about 340k in size.
Actually the most complex html file I could find. Xml will be significantly faster.
A 380k file parsed in .25 second with debug turned off. Of course I have a 3 drive
Hitachi 750 gig (total) raid 0 array on a Dfi SLI-DR, Amd 3700+ San Diego, overclocked to 3ghz, 620 ddr.
I haven't tested it but this might be faster than the current dll (C) alternatives.

It dies with my custum, accurate, full error reporting, exact line/column numbers. Just like the big boys.
The regexp's are designed with speed in mind. There is no faster a regexp processor can go in an all Perl
solution.
The nice part is that its open. This is just the core. Its not complicated. Customization can occurr within
the module itself (hey u gonna customize my code?) in reference to not only handlers but the core itself.

I know VanderDick is going to steel it. I will work on completeness in my idea of what it could be. My idea
will always be better than anybody elses out there, thats why I'm posting it here.

If you don't think this is of any use, you may have been kicked by a mule at one point in your life.
I don't care to answer why I'm posting it. Lifes too short I guess, could drop dead tommorow...
(Hey but some of you are definetly 'not worthy')

use strict;
use warnings;
$|=1;


#######################
# Useage examples
#######################

use Benchmark ':hireswallclock';

my $t0 = new Benchmark;

my $p = new RXParse();

#my $fname = "some.html";
my $fname = "config.html";

if (1) {
open DATA, $fname or die "can't open config.html...";
my $parse_ln = "";
$p->setDebugMode(1);
$parse_ln = join ('', <DATA>);
$p->parse(\$parse_ln);
#$p->parse($parse_ln);
close DATA;
}
else {
open DATA, $fname or die "can't open config.html...";
$p->setDebugMode(1);
$p->parse(*DATA);
close DATA;
#open my $fref, "config.html" or die "can't open config.html...";
#$p->parse($fref, 1);
#close $fref;
}

my $t1 = new Benchmark;
my $td = timediff($t1, $t0);
print STDERR "the code took:",timestr($td),"\n";
exit;




############################################
# XML - RXParse module (by robic0)
# ------------------------------------
# Compliant w3c XML: 1.1
# Resources:
# Extensible Markup Language (XML) 1.1
# W3C Recommendation 04 February 2004,
# 15 April 2004
# http://www.w3.org/TR/xml11/#NT-PITarget
############################################

package RXParse;
use vars qw(@ISA);
@ISA = qw();

#############################
# ReXparse private globals
# =========================
my (
%ErrMsg,%Dflth,
$Nstrt,$Nchar,$Name,
@UC_Nstart,@UC_Nchar,
$RxParse,$RxAttr,
$RxAttr_DL1,
$RxAttr_DL2,
$RxAttr_RM,
$RxPi,
$RxENTITY,
$Entities,
$RxEntConv,
%ents
);
my $initflg = 0;


####################
# ReXparse methods
# =================
sub new {
my ($class, @args) = @_;
my $self = {};
if (!$initflg) {
InitVars();
$initflg = 1;
}
$self->{'InParse'} = 0;
$self->{'debug'} = 0;
$self->{'origcontent'} = undef;
setDfltHandlers ($self);
return bless ($self, $class);
}

sub original_content {
my $self = shift;
if (defined $self->{'origcontent'} &&
ref($self->{'origcontent'}) eq 'SCALAR') {
return ${$self->{'origcontent'}};
} else {return "";}
}

sub setDebugMode {
my $self = shift;
my $dmode = shift;
if (defined $dmode && $dmode) {$dmode = 1;}
else {$dmode = 0;}
$self->{'debug'} = $dmode;
}

sub setDfltHandlers {
my ($self,$name) = @_;
if (defined $name) {
$name =~ s/^\s+//s; $name =~ s/\s+$//s;
my $hname = "h".lc($name);
if (exists $Dflth{$hname}) {
$self->{$hname} = $Dflth{$hname};
}
} else {
foreach my $key (%Dflth) {
$self->{$key} = $Dflth{$key};
}
}
}

sub setHandlers {
my ($self, @args) = @_;
my %oldh = ();
if (!scalar(@args)) {
while (my ($name,$val) = splice (@args, 0, 2)) {
$name =~ s/^\s+//s; $name =~ s/\s+$//s;
my $hname = "h".lc($name);
if (exists $self->{$hname}) {
$oldh{$name} = $self->{$hname};
if (ref($val) eq 'CODE') {
$self->{$hname} = $val;
} else {
# if its not a CODE ref,
# just set default handler
$self->setDfltHandlers ($name);
}
}
}
}
return %oldh;
}

sub parse {
my ($self, $data) = @_;
throwX ('30') unless (!$self->{'InParse'});
throwX ('31') unless (defined $data);
$self->{'InParse'} = 1;

# call processor
if (ref($data) eq 'SCALAR') {
print "SCALAR ref\n" if ($self->{'debug'});
eval {Processor($self, 1, $data);};
if ($@) {
Cleanup($self); die $@;
}
}
elsif (ref(\$data) eq 'SCALAR') {
print "SCALAR string\n" if ($self->{'debug'});
eval {Processor($self, 1, \$data);};
if ($@) {
Cleanup($self); die $@;
}
} else {
if (ref($data) ne 'GLOB' && ref(\$data) ne 'GLOB') {
$self->{'InParse'} = 0;
die "rp_error_parse, data source not a string or filehandle nor reference to one\n";
}
print "GLOB ref or filehandle\n" if ($self->{'debug'});
eval {Processor($self, 0, $data);};
if ($@) {
Cleanup($self); die $@;
}
}
$self->{'InParse'} = 0;
}

##############################
# ReXparse private func's
# ========================

sub Cleanup
{
my $obj = shift;
$obj->{'origcontent'} = undef;
$obj->{'debug'} = 0;
$obj->{'InParse'} = 0;
$obj->setDfltHandlers();
}

sub Processor
{
my ($obj, $BUFFERED, $rpl_mk) = @_;
my ($markup_file);
my $parse_ln = '';
my $dyna_ln = '';
my $ref_parse_ln = \$parse_ln;
my $ref_dyna_ln = \$dyna_ln;
if ($BUFFERED) {
$ref_parse_ln = $rpl_mk;
$ref_dyna_ln = \$dyna_ln;
} else {
# assume its a ref to a global or global itself
$markup_file = $rpl_mk;
$ref_dyna_ln = $ref_parse_ln;
}
my $ln_cnt = 0;
my $complete_comment = 0;
my $complete_cdata = 0;
my @Tags = ();
my $havroot = 0;
my $last_cpos = 0;
my $done = 0;
my $content = '';
my $altcontent = undef;

$obj->{'origcontent'} = \$content;

while (!$done)
{
$ln_cnt++;

# stream processing (if not buffered)
if (!$BUFFERED) {
if (!($_ = <$markup_file>)) {
# just parse what we have
$done = 1;
# boundry check for runnaway
if (($complete_comment+$complete_cdata) > 0) {
$ln_cnt--;
}
} else {
$$ref_parse_ln .= $_;

## buffer if needing comment/cdata closure
next if ($complete_comment && !/-->/);
next if ($complete_cdata && !/\]\]>/);

## reset comment/cdata flags
$complete_comment = 0;
$complete_cdata = 0;

## flag serialized comments/cdata buffering
if (/(<!--)|(<!\[CDATA\[)/)
{
if (defined $1) { # complete comment
if ($$ref_parse_ln !~ /<!--.*?-->/s) {
$complete_comment = 1;
next;
}
}
elsif (defined $2) { # complete cdata
if ($$ref_parse_ln !~ /<!\[CDATA\[.*?\]\]>/s) {
$complete_cdata = 1;
next;
}
}
}
## buffer until '>' or eof
next if (!/>/);
}
} else {
$ln_cnt = 1;
$done = 1;
}

## REGEX Parsing loop
while ($$ref_parse_ln =~ /$RxParse/g)
{
## handle contents
if (defined $14) {
$content .= $14;
$last_cpos = pos($$ref_parse_ln);
next;
}
## valid content here ... can be taken off
print "-"x20,"\n" if ($obj->{'debug'});
if (length ($content)) {
## check reserved characters in content
if ($content =~ /[<>]/) {
#$content =~ s/^\s+//s; $content =~ s/\s+$//s;
## mark-up characters in content
throwX('01', $ref_parse_ln, $last_cpos, $content, $ln_cnt);
}
if (!scalar(@Tags)) {
#$content =~ s/^\s+//s; $content =~ s/\s+$//s;
if ($content =~ /[^\s]/s) {
## content at root level
throwX('02', $ref_parse_ln, $last_cpos, $content, $ln_cnt);
}
}
# substitute special xml characters, then call content handler with $content
# ------------------------------------------------------
# $content has to be a constant if xml reserved chars
# are found, copy altered string to pass to handler
# otherwise pass original $content
# ------------------------------------------------------
if (defined ($altcontent = convertEntities (\$content))) {
$obj->{'hchar'}($obj, $$altcontent, $obj->{'debug'});
} else {
$obj->{'hchar'}($obj, $content, $obj->{'debug'});
}
#print "14 $content\n" if ($obj->{'debug'});
print "-"x20,"\n" if ($obj->{'debug'});
$content = '';
}
#if ($show_pos && $debug) {
# my $rr = pos $$ref_parse_ln;
# print "$rr ";
#}

## <tag> or </tag> or <tag/>
if (defined $2)
{
my ($l1,$l3) = (length($1),length($3));
if (($l1+$l3)==0) { ## <tag>
if (!scalar(@Tags) && $havroot) {
## new root node <tag>
throwX('03', $ref_parse_ln, pos($$ref_parse_ln), $2, $ln_cnt);
}
push @Tags,$2;
$havroot = 1;
# call start tag handler with $2
$obj->{'hstart'}($obj, $2);
}
elsif ($l1==1 && $l3==0) { ## </tag>
my $pval = pop @Tags;
if (!defined $pval) {
## missing start tag </tag>
throwX('04', $ref_parse_ln, pos($$ref_parse_ln), $2, $ln_cnt);
}
if ($2 ne $pval) {
## expected closing tag </tag>
throwX('05', $ref_parse_ln, pos($$ref_parse_ln), $pval, $ln_cnt);
}
# call end tag handler with $2
$obj->{'hend'}($obj, $2);
}
elsif ($l1==0 && $l3==1) { ## <tag/>
if (!scalar(@Tags) && $havroot) {
## new root node <tag/>
throwX('06', $ref_parse_ln, pos($$ref_parse_ln), $2, $ln_cnt);
}
$havroot = 1; # first and only <root/>
# call start tag handler, then end tag handler, with $2
$obj->{'hstart'}($obj, $2);
$obj->{'hend'}($obj, $2);
} else {
## <//node//> errors
throwX('07', $ref_parse_ln, pos($$ref_parse_ln), "$1$2$3", $ln_cnt);
}
#print "2 TAG: $1$2$3\n" if ($obj->{'debug'});
}
## <tag attrib/> or <tag attrib>
elsif (defined $5)
{
my $l7 = length($7);

## attributes
my $attref = getAttrARRAY($6);
unless (ref($attref)) {
## missing or extra token
throwX('08', $ref_parse_ln, pos($$ref_parse_ln), $attref, $ln_cnt);
}
if ($l7==0) { ## <tag attrib>
if (!scalar(@Tags) && $havroot) {
## new root node
throwX('03', $ref_parse_ln, pos($$ref_parse_ln), $5, $ln_cnt);
}
push @Tags,$5;
$havroot = 1;
# call start tag handler with $5 and attributes @{$attref}
$obj->{'hstart'}($obj, $5, @{$attref});
}
elsif ($l7==1) { ## <tag attrib/>
if (!scalar(@Tags) && $havroot) {
## new root node
throwX('06', $ref_parse_ln, pos($$ref_parse_ln), $7, $ln_cnt);
}
$havroot = 1; # first and only <root attrib/>
# call start tag handler with $5 and attributes @{$attref}, then end tag handler with $5
$obj->{'hstart'}($obj, $5, @{$attref});
$obj->{'hend'}($obj, $5);
} else {
## syntax error
throwX('07', $ref_parse_ln, pos($$ref_parse_ln), "$5$6$7", $ln_cnt);
}
#if ($obj->{'debug'}) {
# print "5,6 TAG: $5 Attr: $6$7\n" ;
#}
}
## XMLDECL or PI (processing instruction)
elsif (defined $8)
{
my $pi = $8;
# xml declaration ?
if ($pi =~ /^xml(.*?)$/) {
my $attref = getAttrARRAY($1);
unless (ref($attref)) {
## missing or extra token in xmldecl
throwX('14', $ref_parse_ln, pos($$ref_parse_ln), $attref, $ln_cnt);
}
if (!scalar(@{$attref})) {
## missing xmldecl parameters
throwX('15', $ref_parse_ln, pos($$ref_parse_ln), $pi, $ln_cnt);
}
my ($version,$encoding,$standalone);
while (my ($name,$val) = splice (@{$attref}, 0, 2)) {
if ('version' eq lc($name) && !defined $version) {
if ($val !~ /^[0-9]\.[0-9]+$/) {
## invalid version character data in xmldecl
throwX('16', $ref_parse_ln, pos($$ref_parse_ln), "$name = '$val'", $ln_cnt);
}
$version = $val;
} elsif ('encoding' eq lc($name) && !defined $encoding) {
if ($val !~ /^[A-Za-z][\w\.-]*$/) {
## invalid encoding character data in xmldecl
throwX('17', $ref_parse_ln, pos($$ref_parse_ln), "$name = '$val'", $ln_cnt);
}
$encoding = $val;
} elsif ('standalone' eq lc($name) && !defined $standalone) {
if ($val !~ /^(?:yes)|(?:no)$/) {
## invalid standalone character data in xmldecl
throwX('18', $ref_parse_ln, pos($$ref_parse_ln), "$name = '$val'", $ln_cnt);
}
$standalone = ($val eq 'yes' ? 1 : 0);
} else {
## unknown xmldecl parameter
throwX('19', $ref_parse_ln, pos($$ref_parse_ln), "$name = '$val'", $ln_cnt);
}
}
if (!defined $version) {
# missing version in xmldecl
throwX('20', $ref_parse_ln, pos($$ref_parse_ln), $pi, $ln_cnt);
}
# call xmldecl handler
$obj->{'hxmldecl'}($obj, $version,$encoding,$standalone);
}
# PI - processing instruction
elsif ($pi =~ /$RxPi/) {
# call pi handler
$obj->{'hproc'}($obj, $1, $2);
} else {
# unknown PI data
throwX('21', $ref_parse_ln, pos($$ref_parse_ln), $pi, $ln_cnt);
}
#print "8 VERSION: $8\n" if ($obj->{'debug'});
}
## META
elsif (defined $4) {
# If doctype is HTML then META is not closed
# parse meta data, call handler
$obj->{'hmeta'}($obj, $4, $obj->{'debug'});
#print "4 META: $4\n" if ($obj->{'debug'});
}
## DOCTYPE
elsif (defined $9) {
# parese doctype, call handler
$obj->{'hdoctype'}($obj, $9, $obj->{'debug'});
#print "9 DOCTYPE: $9\n" if ($obj->{'debug'});
}
## CDATA
elsif (defined $10) {
if (!scalar(@Tags)) {
## CDATA content at root
throwX('09', $ref_parse_ln, pos($$ref_parse_ln), $10, $ln_cnt);
}
# call cdata handler
$obj->{'hcdata'}($obj, $10, $obj->{'debug'});
#print "10 CDATA: $10\n" if ($obj->{'debug'});
}
## COMMENT
elsif (defined $11) {
# call comment handler
$obj->{'hcomment'}($obj, $11, $obj->{'debug'});
#print "11 COMMENT: $11\n" if ($obj->{'debug'});
}
## ATTLIST
elsif (defined $12) {
# parese attlist, call handler
$obj->{'hattlist'}($obj, $12, $obj->{'debug'});
#print "12 ATTLIST: $12\n" if ($obj->{'debug'});
}
## ENTITY
elsif (defined $13) {
# parese entity, call handler
my $entdata = $13;
if ($entdata =~ /$RxENTITY/) {
if (defined $1) {
# general entity replacement
# add general entity $3 (EntityDef)
} else {
# parameter entity replacement
# add parameter entity $3 (PEDef)
}
}
else {
# unknown ENTITY data
# throwx
}
$obj->{'hentity'}($obj, $13, $obj->{'debug'});
#print "13 ENTITY: $13\n" if ($obj->{'debug'});
}
}
$$ref_dyna_ln = $content;
$content = '';
}
if (!$havroot) {
# not valid xml
throwX('10', undef, undef, undef, undef);
}
if (scalar(@Tags)) {
my $str = '';
while (defined (my $etag = pop @Tags)) {
$str .= ", /$etag";
}
$str =~ s/^, +//;
# missing end tag
throwX('11', undef, undef, $str, undef);
}
if ($$ref_dyna_ln =~ /[^\s]/s) {
if ($$ref_dyna_ln =~ /[<>]/) {
#mark-up characters in content
throwX('12',undef, undef, $$ref_dyna_ln, undef);
} else {
# content at root level (end)
throwX('13',undef, undef, $$ref_dyna_ln, undef);
}
}
$obj->{'origcontent'} = undef;
return 1;
}

sub getAttrARRAY
{
my $attrstr = shift;
my $aref = [];
my ($alt_attval,$attval,$rx);

while ($attrstr =~ s/$RxAttr//) {
push @{$aref},$1;
if ($2 eq "'") {
$rx = \$RxAttr_DL1;
} else {
$rx = \$RxAttr_DL2;
}
if ($attrstr =~ s/$$rx//) {
if (defined $1) {
push @{$aref},$1;
next;
}
$attval = $2;
if (defined ($alt_attval = convertEntities (\$attval))) {
push @{$aref},$$alt_attval;
next;
}
push @{$aref},$attval;
next;
}
return $attrstr;
}
if ($attrstr=~/$RxAttr_RM/) {
$attrstr =~ s/^\s+//s; $attrstr =~ s/\s+$//s;
return $attrstr if (length($attrstr));
}
return $aref;
}

sub convertEntities
{
my $str_ref = shift;
my $alt_str = '';
my $res = 0;
my ($entchr);
while ($$str_ref =~ /$RxEntConv/gc) {
if (defined $4) {
# decimal
$entchr = chr($4) ;
if (exists $ents{$entchr}) {
$alt_str .= "$1".$entchr;
}
} elsif (defined $5) {
if (length($5) < 9) {
# hex
$entchr = chr(hex($5));
if (exists $ents{$entchr}) {
$alt_str .= "$1".$entchr;
}
}
} else {
$alt_str .= "$1$ents{$3}";
$res = 1;
}
}
if ($res) {
$alt_str .= substr $$str_ref, pos($$str_ref);
return \$alt_str;
}
return undef;
}

sub throwX
{
my ($errno, $lrefseg, $cseg_err, $datastr, $l_tot) = @_;
my ($line,$col) = (0,0);
if (defined $lrefseg) {
($line,$col) = getRealColumn($lrefseg, $l_tot, $cseg_err);
}
die "No such error message ($errno)\n" if (!exists $ErrMsg{$errno});
my $estr = '';
my $ctmpl = "\$estr = sprintf ($ErrMsg{$errno});";
eval $ctmpl;
die $estr;
}

sub getRealColumn
{
my ($lrefseg, $l_tot, $cseg_err) = @_;
my $cseg_offset = 0;
pos($$lrefseg) = 0;
my ($lseg_tot, $lseg_offset) = (0,1);
while ($$lrefseg =~ /\n/g) {
$lseg_tot++;
if (pos($$lrefseg) < $cseg_err) {
$cseg_offset = pos($$lrefseg);
$lseg_offset++;
next;
}
if ($l_tot <= 1) {
$lseg_tot = $l_tot;
last;
}
}
return ($l_tot-$lseg_tot+$lseg_offset, $cseg_err-$cseg_offset);
}


# Globals Init
# =============
sub InitVars
{
%Dflth = (
'hstart' => \&dflt_start,
'hend' => \&dflt_end,
'hchar' => \&dflt_char,
'hcdata' => \&dflt_cdata,
'hcomment' => \&dflt_comment,
'hmeta' => \&dflt_meta,
'hattlist' => \&dflt_attlist,
'hentity' => \&dflt_entity,
'hdoctype' => \&dflt_doctype,
'helement' => \&dflt_element,
'hxmldecl' => \&dflt_xmldecl,
'hproc' => \&dflt_proc,
);

@UC_Nstart = (
"\\x{C0}-\\x{D6}",
"\\x{D8}-\\x{F6}",
"\\x{F8}-\\x{2FF}",
"\\x{370}-\\x{37D}",
"\\x{37F}-\\x{1FFF}",
"\\x{200C}-\\x{200D}",
"\\x{2070}-\\x{218F}",
"\\x{2C00}-\\x{2FEF}",
"\\x{3001}-\\x{D7FF}",
"\\x{F900}-\\x{FDCF}",
"\\x{FDF0}-\\x{FFFD}",
"\\x{10000}-\\x{EFFFF}",
);
@UC_Nchar = (
"\\x{B7}",
"\\x{0300}-\\x{036F}",
"\\x{203F}-\\x{2040}",
);
$Nstrt = "[A-Za-z_:".join ('',@UC_Nstart)."]";
$Nchar = "[-\\w:\\.".join ('',@UC_Nchar).join ('',@UC_Nstart)."]";
$Name = "(?:$Nstrt$Nchar*?)";
#die "$Name\n";

$RxParse =
qr/(?:<(?:(?:(\/*)($Name)\s*(\/*))|(?:META(.*?))|(?:($Name)((?:\s+$Name\s*=\s*["'][^<]*['"])+)\s*(\/*))|(?:\?(.*?)\?)|(?:!(?:(?:DOCTYPE(.*?))|(?:\[CDATA\[(.*?)\]\])|(?:--(.*?[^-])--)|(?:ATTLIST(.*?))|(?:ENTITY(.*?)))))>)|(.+?)/s;
# ( <( ( 1 12 2 3 3)|( 4 4)|( 5 56( ) 6 7 7)|( 8 8 )|( !( ( 9 9)|( 0 0 )|( 1 1 )|(
2 2)|( 3 3))))>)|4 4

$RxAttr = qr/^\s+($Name)\s*=\s*("|')/;
$RxAttr_DL1 = qr/^(?:([^'&]*?)|([^']*?))'/;
$RxAttr_DL2 = qr/^(?:([^"&]*?)|([^"]*?))"/;
$RxAttr_RM = qr/[^\s\n]+/;
$RxPi = qr/^($Name)\s+(.*?)$/s;


#[52] AttlistDecl ::= '<!ATTLIST' S Name AttDef* S? '>'
#[53] AttDef ::= S Name S AttType S DefaultDecl



$RxENTITY = qr/\s+($Name)|(?:%\s+($Name))\s+(.*?)/s;
# 1 1 ( 2 2) 3 3
$Entities = "(?:amp)|(?:gt)|(?:lt)|(?:apos)|(?:quot)|(?:#(?:([0-9]+)|(x[0-9a-fA-F]+)))"; # cat more entities
# ( #( 4 4|5 5))
$RxEntConv = qr/(.*?)(&|%)($Entities);/s;
# 1 12 23 3
%ents = (
'amp' =>'&',
'gt' =>'>',
'lt' =>'<',
'apos'=>"'",
'quot'=>"\"",
'&' => '',
'<' => '',
'>' => '',
"'" => '',
"\"" => ''
);

%ErrMsg = (
'01' => "\"rp_error_01, mark-up or reserved characters in content (line %s, col %s), malformed element? '%s'\\n\", \$line, \$col, \$datastr",
'02' => "\"rp_error_02, content at root level (line %s, col %s): '%s'\\n\", \$line, \$col, \$datastr",
'03' => "\"rp_error_03, element wants to be new root node (line %s, col %s): '%s'\\n\", \$line, \$col, \$datastr",
'04' => "\"rp_error_04, missing start tag for '/%s' (line %s, col %s)\\n\", \$datastr, \$line, \$col",
'05' => "\"rp_error_05, expected closing tag '/%s' (line %s, col %s)\\n\", \$datastr, \$line, \$col",
'06' => "\"rp_error_06, element wants to be new root node (line %s, col %s): '%s/'\\n\", \$line, \$col, \$datastr",
'07' => "\"rp_error_07, tag syntax '%s' (line %s, col %s)\\n\", \$datastr, \$line, \$col",
'08' => "\"rp_error_08, invalid, missing or extra tokens in attribute asignment (line %s, col %s): %s\\n\", \$line, \$col, \$datastr",
'09' => "\"rp_error_09, CDATA content at root level (line %s, col %s): '%s'\\n\", \$line, \$col, \$datastr",
'10' => "\"rp_error_10, not a valid xml document\\n\"",
'11' => "\"rp_error_11, missing end tag '%s'\\n\", \$datastr",
'12' => "\"rp_error_12, mark-up or reserved characters in content (end), malformed element? '%s'\\n\", \$datastr",
'13' => "\"rp_error_13, content at root level (end): '%s'\\n\", \$datastr",
'14' => "\"rp_error_14, invalid, missing or extra tokens in xmldecl asignment (line %s, col %s): %s\\n\", \$line, \$col, \$datastr",
'15' => "\"rp_error_15, missing xmldecl parameters (line %s, col %s): %s\\n\", \$line, \$col, \$datastr",
'16' => "\"rp_error_16, invalid 'version' character data in xmldecl (line %s, col %s): %s\\n\", \$line, \$col, \$datastr",
'17' => "\"rp_error_17, invalid 'encoding' character data in xmldecl (line %s, col %s): %s\\n\", \$line, \$col, \$datastr",
'18' => "\"rp_error_18, invalid 'standalone' character data in xmldecl (line %s, col %s): %s\\n\", \$line, \$col, \$datastr",
'19' => "\"rp_error_19, unknown xmldecl parameter (line %s, col %s): %s\\n\", \$line, \$col, \$datastr",
'20' => "\"rp_error_20, missing xmldecl 'version' parameter (line %s, col %s): %s\\n\", \$line, \$col, \$datastr",
'21' => "\"rp_error_21, unknown or missing processing instruction parameters (line %s, col %s): '%s'\\n\", \$line, \$col, \$datastr",
'30' => "\"rp_error_30, already in parse\\n\"",
'31' => "\"rp_error_31, data source not defined\\n\"",
);
}

# Default Handlers
# =================
sub dflt_start {
my ($obj, $el, @attr) = @_;
if ($obj->{'debug'}) {
print "start _: $el\n";
while (my ($name,$val) = splice (@attr, 0,2)) {
print " "x12,"$name = $val\n";
}
}
}

sub dflt_end {my ($obj, $el) = @_;print "end _: /$el\n" if ($obj->{'debug'});}
sub dflt_char {my ($obj, $str) = @_;print "char _: $str\n" if ($obj->{'debug'});}
sub dflt_cdata {my ($obj, $str) = @_;print "cdata _: $str\n" if ($obj->{'debug'});}
sub dflt_comment {my ($obj, $str) = @_;print "comnt _: $str\n" if ($obj->{'debug'});}
sub dflt_meta {my ($obj, $str) = @_;print "meta _: $str\n" if ($obj->{'debug'});}
sub dflt_attlist {my ($obj, $parm) = @_;print "attlist_h _: $parm\n" if ($obj->{'debug'});}
sub dflt_entity {my ($obj, $parm) = @_;print "entity_h _: $parm\n" if ($obj->{'debug'});}
sub dflt_doctype {my ($obj, $parm) = @_;print "doctype_h _: $parm\n" if ($obj->{'debug'});}
sub dflt_element {my ($obj, $parm) = @_;print "element_h _: $parm\n" if ($obj->{'debug'});}

sub dflt_xmldecl {
my ($obj, $version,$encoding,$standalone) = @_;

if ($obj->{'debug'}) {
print "xmldecl_h _: version = $version\n" if (defined $encoding);
print " "x14,"encoding = $encoding\n" if (defined $encoding);
print " "x14,"standalone = $standalone\n" if (defined $standalone);
}
}
sub dflt_proc {
my ($obj, $target,$data) = @_;

if ($obj->{'debug'}) {
print "proc_h _: target = $target\n";
print " "x14,"data = $data\n";
}
}
__END__

#$RxParse =
qr/(?:<(?:(?:(\/*)($Name)\s*(\/*))|(?:META(.*?))|(?:($Name)((?:\s+$Name\s*=\s*["'][^<]*['"])+)\s*(\/*))|(?:\?(.*?)\?)|(?:!(?:(?:DOCTYPE(.*?))|(?:\[CDATA\[(.*?)\]\])|(?:--(.*?[^-])--)|(?:ATTLIST(.*?))|(?:ENTITY(.*?)))))>)|(.+?)/s;
# ( <( ( 1 12 2 3 3)|( 4 4)|( 5 56( ) 6 7 7)|( 8 8 )|( !( ( 9 9)|( 0 0 )|( 1 1 )|(
2 2)|( 3 3))))>)|4 4
#$Entities = "(?:amp)|(?:gt)|(?:lt)|(?:apos)|(?:quot)|(?:#(?:([0-9]+)|(x[0-9a-fA-F]+)))";
# ( #( 3 3 4 4))
#$RxEntConv = qr/(.*?)&($Entities);/s;
 
R

robic0

On Sun, 16 Apr 2006 18:11:30 -0700, robic0 wrote:

By the way, you won't find 1-liners in this code. I tried to be as exessively verbose as possilbe.
By the time your 1/4 of the way through the Processor, you head will be so dizzy you might
have a brain infarction.

Warning, trying to understand this code may cause you to go into a coma!!!!!!!!
I made every attempt to put placeholders where appropriate. When your finished with simple code
and your still concious, congratulations!!!

Just cut and past into a file and run the damned thing.
There is not much parameter error checking. Its not a polished module. Be carefull of that.
The thrust was always on the parsing, not a finished module.
Its an asskicking Xml/Xhtml parser.

Forgot to mention that you have the option of passing in a reference to a large buffered file as
well as a globe to a handle. Of course the buffer refference runs %30 faster.

Hehehe
 
R

robic0

On Sun, 16 Apr 2006 18:11:30 -0700, robic0 wrote:

Not bad for a drunky redneck hick with eithg grade educating living
in skid row with no job. My last job was begging for money outside a 7-11.
Wrote the code on paper and went to libraby to post
 
U

Uri Guttman

normally i avoid this troll and of course posting any code reviews of
his crappy modules. but i am still drawn to skim its code for laughs and
for future examples of how not to code in perl. and i came across this
nugget:


r> while ($$ref_parse_ln =~ /$RxParse/g)
r> {
r> ## handle contents
r> if (defined $14) {
r> $content .= $14;

and MUCH MUCH later:

r> $RxParse =
r> qr/(?:<(?:(?:(\/*)($Name)\s*(\/*))|(?:META(.*?))|(?:($Name)((?:\s+$Name\s*=\s*["'][^<]*['"])+)\s*(\/*))|(?:\?(.*?)\?)|(?:!(?:(?:DOCTYPE(.*?))|(?:\[CDATA\[(.*?)\]\])|(?:--(.*?[^-])--)|(?:ATTLIST(.*?))|(?:ENTITY(.*?)))))>)|(.+?)/s;

yes, that is one long line of regex with at least 14 grabs. read it and
weep. you may want to gouge out your eyes and i sympathize with you.

the regex is assigned way away from where it is used (and it's a
horrible regex to boot. does this moron really think he can parse SGML
type stuff with a regex?). but the use of $14 is one of the worst pieces
of perl code i have ever seen. and there is an amazing amount of bad
perl out there (easy to find on the web and in too much of cpan). but i
have never seen $14 used before. that takes a really microencephalic
brain to use a numbered grab that large, with such an ugly regex and
being so far away from the regex. but we know this troll well enough to
know it can code this poorly and now we have proof.

so flame back at me. take some of my cpan code and try to code review
it. hell, i would love to see you do that as you might even learn
something which is always possible, even for a lump of granite.

and this post is for google news searches so there is one post on record
which states that this code or any other by this troll should ever be
used. if it ever appears on cpan, i will lead a campaign (should be very
easy to do) to make sure its rating will approach negative infinity.

have fun,

uri
 
U

Uri Guttman

"r" == robic0 <robic0> writes:

r> read'm an weep fraud....... its god code, the numbers say it all.
r> bottom decl is sweet. you didn't run it and don't. i posted this gem
r> aimed at your rectum. seems it hit target

i made a mistake. it uses at least up to $17 which is worse than $14. i
apologize for that error in my review.

you don't even understand the criticism. i knew that would be the
case. but i have to do it for others who may read this sometime in the
future.

and what about my challenge for you to review some of my code? try it
and we will laugh away at your comments. please do it. i really need
your take on my code. only you could possibly review it properly. try
doing it for file::slurp or even better, sort::maker. i am sure you
could rewrite those to greatly improve their speed and coding style. go
for it!

uri
 
A

A. Sinan Unur

r> read'm an weep fraud....... its god code, the numbers say it all.
r> bottom decl is sweet. you didn't run it and don't.
r> ... seems it hit target

Nah, my guess is Uri had also been procrastinating right up to the tax
deadline, and, like me, was looking for amusement in-between filling in
forms. Lucky for us, you are here to provide that amusement.
i made a mistake. it uses at least up to $17 which is worse than $14.
i apologize for that error in my review.

Without your post, I would never have seen his code, so, thank you. On
the other hand, let's agree that once the toes join the count, we are in
fubar territory ; -)
and what about my challenge for you to review some of my code?
....

you could rewrite those to greatly improve their speed and coding
style.

Oh no!

Sinan
--
A. Sinan Unur <[email protected]>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
 
J

John Bokma

Uri Guttman said:
normally i avoid this troll and of course posting any code reviews of
his crappy modules.

A troll does things on purpose because he has a lot of fun when people,
especially those that should know better, bite.
 
R

robic0

r> read'm an weep fraud....... its god code, the numbers say it all.
r> bottom decl is sweet. you didn't run it and don't. i posted this gem
r> aimed at your rectum. seems it hit target

i made a mistake. it uses at least up to $17 which is worse than $14. i
apologize for that error in my review.

you don't even understand the criticism. i knew that would be the
case. but i have to do it for others who may read this sometime in the
future.

and what about my challenge for you to review some of my code? try it
and we will laugh away at your comments. please do it. i really need
your take on my code. only you could possibly review it properly. try
doing it for file::slurp or even better, sort::maker. i am sure you
could rewrite those to greatly improve their speed and coding style. go
for it!

uri

You don't know XMl at all. Regexp has presidence just like anything else.
There is no 'shortcut' in a bunch of OR's, and XML has a shitload of OR's,
with presidence. It don't matter if its C or asm. Common chars are factored.
Presedince is observed to the standard, with internal exclusive chunks not regex'd
parsed in that line. There is nothing (period) faster than that main regexp, in
any language!

If fact, regexp is used to describe the 1.1 standard. What I have done is to
factor out all overhead and sub-groups. There is nothing faster than that
'line' of $14 (there's only 14) caps, nothing!! Its the extreme minimal.
Some groups (like entities and others) have sub-processing regexp's.
Content is in the correct position. Every position, every single detail
you will find in every single compliant 1.1 processor. I know this, not because
I have looked at any of them, I know it becase of logic.

Perhaps you could code the anti-robic0 module and post it to cpan.
I may look at that.

I'm a homeless guy without a job. I have to pan for bucks to post here...
 
R

robic0

Nah, my guess is Uri had also been procrastinating right up to the tax
deadline, and, like me, was looking for amusement in-between filling in
forms. Lucky for us, you are here to provide that amusement.


Without your post, I would never have seen his code, so, thank you. On
the other hand, let's agree that once the toes join the count, we are in
fubar territory ; -)


Oh no!

Sinan

All the jealousy is coming out, I see it. You all know its real good.
I mean real good.....
 
R

robic0

A troll does things on purpose because he has a lot of fun when people,
especially those that should know better, bite.

I didn't write this complex piece of work for fun dude....
 
U

Uri Guttman

"r" == robic0 <robic0> writes:

r> Perhaps you could code the anti-robic0 module and post it to cpan.
r> I may look at that.

no, the challenge is to you to code review one of my cpan modules. you
are typical of trolls in that you are basically a bully and a
coward. classic cliche stereotype that is also too often true as in your
case. you prey upon newbies who don't know you and you antagonize the
regulars who haven't yet killfiled you. so in your own deluded world you
get little negative or realistic feedback because it is obvious that you
will never take it nor learn from it. so you lash out with curses and
other inferiority complex actions. and of course any complaints about
that roll off your back since you know you are correct and all those
others much be wrong. so the only way you can possibly work your way out
of your self-imposed hell is to actually listen to someone and prove
that you are not a coward. take one of my modules on cpan

http://search.cpan.org/~uri

and code review all or part of it here. point out code you don't like,
explain why you think it could be better, rewrite it better, whatever,
as long as it is constructive. no just cursing at me and my code with
nothing to back it up. this is a challenge any quality coder would have
no problem attempting. and to your surprise if you actually point out
something which could be better, i will patch it and release it. you
see, i am actually about the true quality of my work and not trying to
cut you down. your comments are rarely about the code and that is so
common with trolls.

r> I'm a homeless guy without a job. I have to pan for bucks to post
r> here...

i won't even consider any sympathy for that is most likely a lie or
worse a delusion. and even if it were true, that doesn't grant you
permission to troll here and act like you do. do you have sense of shame
at how you behave here? do you even have some respect for yourself
(unlike the disrespect you show others all the time)? do you care how
others perceive you or are you just trying to be as annoying as
possible? there are other very poor perl hacker wannabes out there and
you don't seen them acting out their angst here. did you ever think that
if you were more sociable and actually learned perl better and got along
with the perl community you could actually get a job and get off the
street? but that is moot as i don't care nor want to know your actual
situation. that point is just to show how self destructive you are. any
possiblity of someone in the perl community ever hiring you is now slim.

so do the challenge or you will be publically known as a typical perl
troll and chicken. i triple dog dare ya (that didn't work the last time
and i bet it won't this time. but i have to try!).

uri
 
R

robic0

r> Perhaps you could code the anti-robic0 module and post it to cpan.
r> I may look at that.

no, the challenge is to you to code review one of my cpan modules. you
are typical of trolls in that you are basically a bully and a
coward. classic cliche stereotype that is also too often true as in your
case. you prey upon newbies who don't know you and you antagonize the
regulars who haven't yet killfiled you. so in your own deluded world you
get little negative or realistic feedback because it is obvious that you
will never take it nor learn from it. so you lash out with curses and
other inferiority complex actions. and of course any complaints about
that roll off your back since you know you are correct and all those
others much be wrong. so the only way you can possibly work your way out
of your self-imposed hell is to actually listen to someone and prove
that you are not a coward. take one of my modules on cpan

http://search.cpan.org/~uri

and code review all or part of it here. point out code you don't like,
explain why you think it could be better, rewrite it better, whatever,
as long as it is constructive. no just cursing at me and my code with
nothing to back it up. this is a challenge any quality coder would have
no problem attempting. and to your surprise if you actually point out
something which could be better, i will patch it and release it. you
see, i am actually about the true quality of my work and not trying to
cut you down. your comments are rarely about the code and that is so
common with trolls.

r> I'm a homeless guy without a job. I have to pan for bucks to post
r> here...

i won't even consider any sympathy for that is most likely a lie or
worse a delusion. and even if it were true, that doesn't grant you
permission to troll here and act like you do. do you have sense of shame
at how you behave here? do you even have some respect for yourself
(unlike the disrespect you show others all the time)? do you care how
others perceive you or are you just trying to be as annoying as
possible? there are other very poor perl hacker wannabes out there and
you don't seen them acting out their angst here. did you ever think that
if you were more sociable and actually learned perl better and got along
with the perl community you could actually get a job and get off the
street? but that is moot as i don't care nor want to know your actual
situation. that point is just to show how self destructive you are. any
possiblity of someone in the perl community ever hiring you is now slim.

so do the challenge or you will be publically known as a typical perl
troll and chicken. i triple dog dare ya (that didn't work the last time
and i bet it won't this time. but i have to try!).

uri

I absolutely agree with everything you have said!
I have several problems (as do we all). I really don't mean to be a bully at all.
I'm trying to curb that as much as possible. I really have an alchol problem
though. Sometimes you have to read between the lines with me, I mean no harm
at all but this thing is an anchor on my neck. I'm harmess wouldn't hurt a fly.
I actually like all the regulars here.

I have one other problem since I was born and can't resolve it, never could, don't
think it may be possible, I'm so old, not enough time. When I write code that I'm
motivated to do, my mind goes into half here, half there world. The there world
I can't describe but my eyes roll back into my head for a half a second. If everything
is going at a constant velocity, the there mind pushes up things you can't believe.
I can't. I never could. I can't cross the curtain. For seconds, my fingers are typing
from there and I gain conciounce saying whoa, not knowing what that was. Those flashes
have been happening to me since I was born. That sub-mind was doing things I could not
in the wake world. Apparently, no one else around me could do either. So growing up
in a Catholic school, the Nuns used to call my mother and ask her why I could do
twelve numbered division in 1 second, in my head, in the third grade. No one was more
shocked than me. I only know there's something ominus on the other side. It apparenty
is just slightly cloaked, cloaked from my conciouness, but accessabe with a slight
brain chemistry change triggered by a concious thought. This is how I managed to survive
feast an famon in my life. That thing is always there. I don't know what it is, all I
know is I can call it up with a thought, somethimes I have to dwell and force. The
problem is that its fierce. When it gets me, its like liquid fire. I can only watch.
I have never understood what it is and it won't tell me. Its a thinly veiled curtain,
on the other side is a fire that burns so hot I don't hardly want to look. Its exteme
in intelligence what ever it is. I stay away from it with alchohol and drugs because
it reaks my body. It forces my body to hours and hours of creativity that it, my
body just can't keep up with. Its like the universe flashes on me in an instant.
I can't control it. Its very weird.

For these reasons, I don't care about any community or anything. I'm just trying to
stay alive with it since I was born. I have no notion of self esteme or world,
and it follows, others as well. My life is not the normal mind kind of life you would
expect...
 
R

robic0

Nah, my guess is Uri had also been procrastinating right up to the tax
deadline, and, like me, was looking for amusement in-between filling in
forms. Lucky for us, you are here to provide that amusement.


Without your post, I would never have seen his code, so, thank you. On
the other hand, let's agree that once the toes join the count, we are in
fubar territory ; -)


Oh no!

Sinan

Oh, back to normal. After u post a pice of work u think everybody
is somehow praising u.

For u sinon, keep ur plonk going, i'm releasing something else too
wouldn't want you, and man I really don't want your fucking opinion
or for you to even read a fuckin damned thing I post again. Please
motherfucker plonk me and please never ever fucking respond to one
of my posts again!

Thank u motherfucker!!!!
 
R

robic0

I absolutely agree with everything you have said!
I have several problems (as do we all). I really don't mean to be a bully at all.
I'm trying to curb that as much as possible. I really have an alchol problem
though. Sometimes you have to read between the lines with me, I mean no harm
at all but this thing is an anchor on my neck. I'm harmess wouldn't hurt a fly.
I actually like all the regulars here.

I have one other problem since I was born and can't resolve it, never could, don't
think it may be possible, I'm so old, not enough time. When I write code that I'm
motivated to do, my mind goes into half here, half there world. The there world
I can't describe but my eyes roll back into my head for a half a second. If everything
is going at a constant velocity, the there mind pushes up things you can't believe.
I can't. I never could. I can't cross the curtain. For seconds, my fingers are typing
from there and I gain conciounce saying whoa, not knowing what that was. Those flashes
have been happening to me since I was born. That sub-mind was doing things I could not
in the wake world. Apparently, no one else around me could do either. So growing up
in a Catholic school, the Nuns used to call my mother and ask her why I could do
twelve numbered division in 1 second, in my head, in the third grade. No one was more
shocked than me. I only know there's something ominus on the other side. It apparenty
is just slightly cloaked, cloaked from my conciouness, but accessabe with a slight
brain chemistry change triggered by a concious thought. This is how I managed to survive
feast an famon in my life. That thing is always there. I don't know what it is, all I
know is I can call it up with a thought, somethimes I have to dwell and force. The
problem is that its fierce. When it gets me, its like liquid fire. I can only watch.
I have never understood what it is and it won't tell me. Its a thinly veiled curtain,
on the other side is a fire that burns so hot I don't hardly want to look. Its exteme
in intelligence what ever it is. I stay away from it with alchohol and drugs because
it reaks my body. It forces my body to hours and hours of creativity that it, my
body just can't keep up with. Its like the universe flashes on me in an instant.
I can't control it. Its very weird.

For these reasons, I don't care about any community or anything. I'm just trying to
stay alive with it since I was born. I have no notion of self esteme or world,
and it follows, others as well. My life is not the normal mind kind of life you would
expect...

The foul lang comes from the conciouness trying to protect my self.
Most of the code, %90 of this one comes from the otherside of the veil,
when I've been paid, the code comes from across the curtain, my hands
cannot type as fast as the solution is read to me. Its very tiring.
My mind loses/sheds it fast by necisity. I have a broad background in
all sciences/anthropology. And a bs in mechanical engineering which I
shed along time ago, only keeping the physicss and math, shedding the
5 thermodynamics (and nuclear). Otherwise I'm homeless, my internal IQ
unmeasurable........
 
G

Guest

robic0 wrote:

: I actually like all the regulars here.

Must be a strange feeling of being liked this way.

: [...] I stay away from it with alchohol and drugs because
: it reaks my body. It forces my body to [...] hours of creativity that it, my
: body just can't keep up with. Its like the universe flashes on me in an
: instant.
: I can't control it. Its very weird.

Take on long distance running. It may help you. It will also keep you away
from a computer keyboard long enough as to allow you to rethink your stuff
before mindlessly keying it into heaps of 1s and 0s.

Oliver.
 
R

robic0

robic0 wrote:

: I actually like all the regulars here.

Must be a strange feeling of being liked this way.

: [...] I stay away from it with alchohol and drugs because
: it reaks my body. It forces my body to [...] hours of creativity that it, my
: body just can't keep up with. Its like the universe flashes on me in an
: instant.
: I can't control it. Its very weird.

Take on long distance running. It may help you. It will also keep you away
from a computer keyboard long enough as to allow you to rethink your stuff
before mindlessly keying it into heaps of 1s and 0s.

Oliver.
Yea, I would accept that as a premise. What the hell am I doing here anyway?
Believe me, I don't need an excuse to totally abandon this crap. Unfortunately,
for you, you don't know a rats ass about the code that was just posted!!!
 
R

robic0

On Sun, 16 Apr 2006 18:11:30 -0700, robic0 wrote:

[snip]

Well, I've waded through insults, mis-postes (Doktor Ruud) and phycobable.
Looks like the code is solid, even though incomplete. The code here is awsome.
I've posted it as a core parser. Its primitive, but its %85 complete of the 1.1
standard. As much as I've run through this code, I know the primitives are
accurate. This raw code is a godsend. This primitve form has numerous uses.
I posted it as a gift, and I get insults..... Nice fucking Perl community!!
This is an example of what Perl is all about!
 
J

John Bokma

Uri Guttman said:
at how you behave here? do you even have some respect for yourself

Do you? I often wonder what's worse: a troll I can ignore, or some
regulars with plenty of time on their hands to start again another long
"discussion" with a troll.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,967
Messages
2,570,148
Members
46,694
Latest member
LetaCadwal

Latest Threads

Top