Perl DBI/XML processing versus PHP ?

S

surfivor

I may be involved in a data migration project involving databases and
creating XML feeds. Our site is PHP based, so I imagine the team might
suggest PHP, but I had a look at the PHP documentation for one of the
Pear modules for creating XML and it didn't look like much. I've used
Perl XML:Twig and I have the impression that there is more Perl stuff
available as well as the Perl stuff being well documented as I have a
Perl DBI book, a Perl XML, a Perl data munging book and so on. Alot of
the PHP books seem to be mainly on building sites and connecting to
databases, but I haven't checked them all and I have most of the Perl
books. One problem is that some of the Perl stuff is hard to get on
Activestate Perl for windows, so I may have to try to explain to them
why I think it should be done in Perl and why I need a linux login to
do it, etc. I'm not that familiar with PHP, but am looking for
comments on this.
 
J

Jamie

In said:
I may be involved in a data migration project involving databases and
creating XML feeds. Our site is PHP based, so I imagine the team might
suggest PHP, but I had a look at the PHP documentation for one of the
Pear modules for creating XML and it didn't look like much. I've used
Perl XML:Twig and I have the impression that there is more Perl stuff
available as well as the Perl stuff being well documented as I have a
Perl DBI book, a Perl XML, a Perl data munging book and so on. Alot of
the PHP books seem to be mainly on building sites and connecting to
databases, but I haven't checked them all and I have most of the Perl
books. One problem is that some of the Perl stuff is hard to get on
Activestate Perl for windows, so I may have to try to explain to them
why I think it should be done in Perl and why I need a linux login to
do it, etc. I'm not that familiar with PHP, but am looking for
comments on this.

Seems to me, perl DBI is a bit more intuitive. Last time I looked,
php didn't really have placeholders (well, it did, depending on which
version of php was available where)

This was awhile ago, but.. in mysql, placeholders didn't help much. Other DBM's
though, like DB2 or postgresql will use place holders to cache a compiled SQL
statement, (this happens server side) this can really boost performance if you
have a lot of identical INSERT or SELECT statements that vary only in their
input parameters. The DBM only has to inspect and parse the SQL once if it uses
placeholders. This is rather a big deal if you want performance out of other
DBM's.

Trouble with PHP is that one version is incompatible with the next,
even along the same branches, 5.2 might not work with 5.1. It's sort
of hit -n- miss. On top of that the php you use may have been compiled
with flags the server didn't use or have ini settings that vary. Each
"php platform" is "unique" with it's own settings. Each version of
PHP is "unique" in that it may or may not work with your application.

All of this really makes a mess if you want to use other PHP scripts
from other packages. PHP is like having the rug ripped out from under
you at every turn.

PHP really isn't very good for larger projects. (but then, the same
has and can be said for perl)

Sometimes you can "throw an exception" in PHP sometimes, you can't, again..
depending on the version and quirks of whoever installed PHP. Perl on the other
hand, it's a pretty safe bet eval { ... die "BOO"; } will work, this is
important if you'll be doing any transactions or need to do things
like break out of an XML parse operation from a callback, but catch
it in the caller space.

If you need to /consume/ XML, then you definately do not want PHP, as
there really isn't the same flexibility with getting at the input
as it arrives. It wants to put everything into $_POST before your
script can have any say in the matter. This makes it quite difficult
to parse an input document "as it arrives" (there are other ways
around this in PHP, but none of them allow you direct access to the
wire the way perl does)

Perl on the other hand, allows you to get at the raw input data if
you need it.

I've been told that you can purchase extra tools to run PHP in a sort
of peristent environment, but, in the free cost version it's impossible
to create say, an XSLT tranformer object and share it with new requests.

In PHP that data needs to be loaded each and every time, tree needs to
be built for each request. With perl (and FastCGI or mod_perl), you can
create a heavy object once and share it among requests. (pros and cons to
this, of course. You can also accidently store other information and create
memory leaks if your not careful)

The "heavy object sharing" lends itself really well to XSLT processors
or DOM objects that don't change, load once and re-use over and over.
This can really be crucial with XML processing, as XSLT transformers
tend to be kind of expensive to create.

PHP has the advantage that it's "template" is fairly standardized (well,
most people have settled on "<?php ?>" so you don't have a dozen different
overkill template engines.

Hopefully I've given you enough "perl firepower" but, in my experience, no
matter how you phrase the above points, people always assume perl is older
technology and therefore inferior. PHP is modern, popular and therefore
better:

"Don't try to confuse me with the facts!" - LOL


Jamie
 
J

Jerry Stuckle

Jamie said:
Seems to me, perl DBI is a bit more intuitive. Last time I looked,
php didn't really have placeholders (well, it did, depending on which
version of php was available where)

True, I also find perl DBI more intuitive. But both work well.
This was awhile ago, but.. in mysql, placeholders didn't help much. Other DBM's
though, like DB2 or postgresql will use place holders to cache a compiled SQL
statement, (this happens server side) this can really boost performance if you
have a lot of identical INSERT or SELECT statements that vary only in their
input parameters. The DBM only has to inspect and parse the SQL once if it uses
placeholders. This is rather a big deal if you want performance out of other
DBM's.

True, if you're running 1K inserts per second it can make a difference.
For most perl and PHP isn't doing nearly that much. And if you need
the performance boost, a compiled language would provide a greater boost
than just using placeholders.
Trouble with PHP is that one version is incompatible with the next,
even along the same branches, 5.2 might not work with 5.1. It's sort
of hit -n- miss. On top of that the php you use may have been compiled
with flags the server didn't use or have ini settings that vary. Each
"php platform" is "unique" with it's own settings. Each version of
PHP is "unique" in that it may or may not work with your application.

Obviously you don't understand PHP very well. I've been programming
since the early 4.0.x and currently am at 5.2.1. I can't remember when
I've had to chance any code for a new release. Use good programming
practices and upgrades are painless.

Now that doesn't mean I haven't rewritten code. But it's been to take
advantage of new features rather than because a new version required a
change.

This is not to say that sometimes changes may not be needed. The PHP
developers are doing their best to clean up what has been a mess of a
language - poorly planned and executed. It's changing, but sometimes
change is painful.

Fortunately perl was better planned and implemented.
All of this really makes a mess if you want to use other PHP scripts
from other packages. PHP is like having the rug ripped out from under
you at every turn.

Not at all. I use a lot of different packages. No problem.
PHP really isn't very good for larger projects. (but then, the same
has and can be said for perl)

Wrong on both count. I've seen both used in some quite large projects.
- one perl project I know about was over 250K LOC, for instance. And
that's perl code - not html.
Sometimes you can "throw an exception" in PHP sometimes, you can't, again..
depending on the version and quirks of whoever installed PHP. Perl on the other
hand, it's a pretty safe bet eval { ... die "BOO"; } will work, this is
important if you'll be doing any transactions or need to do things
like break out of an XML parse operation from a callback, but catch
it in the caller space.

Yep, exception handling was added to PHP, so like with any new addition,
you can use it in later versions but not in earlier ones. Gee - you
know, the same thing is true in any product. It's hard to use something
before it was added.

Perl doesn't have this problem as much because it has been around longer
and is more stable. But I suspect the same was true with the early
versions of perl.
If you need to /consume/ XML, then you definately do not want PHP, as
there really isn't the same flexibility with getting at the input
as it arrives. It wants to put everything into $_POST before your
script can have any say in the matter. This makes it quite difficult
to parse an input document "as it arrives" (there are other ways
around this in PHP, but none of them allow you direct access to the
wire the way perl does)

No, that's different implementations. But I've never seen this as a
problem in either PHP or perl. The input document should be arriving
pretty much all at once, anyway.
Perl on the other hand, allows you to get at the raw input data if
you need it.

Yep, it can be nice at times. But that's one reason there is no "best"
language.
I've been told that you can purchase extra tools to run PHP in a sort
of peristent environment, but, in the free cost version it's impossible
to create say, an XSLT tranformer object and share it with new requests.

In PHP that data needs to be loaded each and every time, tree needs to
be built for each request. With perl (and FastCGI or mod_perl), you can
create a heavy object once and share it among requests. (pros and cons to
this, of course. You can also accidently store other information and create
memory leaks if your not careful)

Yes, this is easier in perl. But as you also note - there are both pros
and cons.
The "heavy object sharing" lends itself really well to XSLT processors
or DOM objects that don't change, load once and re-use over and over.
This can really be crucial with XML processing, as XSLT transformers
tend to be kind of expensive to create.

True, but if you really have to, there are ways around it in PHP. But
if you really want speed in processing, I suggest you implement it in
C/C++. That is much faster.
PHP has the advantage that it's "template" is fairly standardized (well,
most people have settled on "<?php ?>" so you don't have a dozen different
overkill template engines.

Hopefully I've given you enough "perl firepower" but, in my experience, no
matter how you phrase the above points, people always assume perl is older
technology and therefore inferior. PHP is modern, popular and therefore
better:

"Don't try to confuse me with the facts!" - LOL


Jamie

OTOH, perl is older and therefore more stable and mature. Newer is not
always better. Each language needs to be evaluated in the context it
will be used. A language which fits well into one project may not fit
well into another.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
(e-mail address removed)
==================
 
J

Jamie

In said:
True, if you're running 1K inserts per second it can make a difference.
For most perl and PHP isn't doing nearly that much. And if you need
the performance boost, a compiled language would provide a greater boost
than just using placeholders.

The thing is, this cache happens server-side (server relative to the DBM
server)

In the case of n-number of web server processes, it matters because you'll
likely have fewer DBM resources. Compiled web programs won't even help
you here as this stage happens on the database.

But, this is pretty much a moot point for mysql, unless mysql has implemented
placeholders and I haven't heard about it.

I will only add that placeholders also shield you from many SQL injection
bugs. Something that is nice to have around.
Obviously you don't understand PHP very well. I've been programming
since the early 4.0.x and currently am at 5.2.1. I can't remember when
I've had to chance any code for a new release. Use good programming
practices and upgrades are painless.

It depends on who installed the PHP binary server-side. I've smacked my
head into this wall a number of times. PHP 5.n looked really promising,
I ported my stuff to it.. and soon discovered almost all the ISP's out
there are still using PHP 4... and continue to use PHP 4.
Now that doesn't mean I haven't rewritten code. But it's been to take
advantage of new features rather than because a new version required a
change.

Like, if they had compiled php without POSIX support for example? Thats
the kind of stuff I'm talking about. One place has posix, the other doesn't
and you end up coding around whichever environment is available.

As far as actual code crashing.. I've had numerous problems with PHP reference
handling, looking in apache error logs reveal a segfault depending of
course on your particular copy of php. (PHP5 gets reference handling "right"
in my view, too bad it's flaky)
This is not to say that sometimes changes may not be needed. The PHP
developers are doing their best to clean up what has been a mess of a
language - poorly planned and executed. It's changing, but sometimes
change is painful.

Oh, I agree. Not a slam against the PHP developers. PHP is great for small
things where you only need a quick mysql query or access to the include()
functions.

Heckuva lot easier than installing perl template modules.
Fortunately perl was better planned and implemented.

Yea, Larry Wall.. a linguist. You can tell by the way it was designed that
it actually /was/ designed.
Wrong on both count. I've seen both used in some quite large projects.
- one perl project I know about was over 250K LOC, for instance. And
that's perl code - not html.

Spam assassin is another that is quite impressive. It's just that with perl (or
PHP) the rules aren't as "enforced". Perl6 I hear is trying to address this.
Yep, exception handling was added to PHP, so like with any new addition,
you can use it in later versions but not in earlier ones. Gee - you
know, the same thing is true in any product. It's hard to use something
before it was added.

True. It's just that with the large installed base of PHP4, you'll find
this fact is a real problem.
No, that's different implementations. But I've never seen this as a
problem in either PHP or perl. The input document should be arriving
pretty much all at once, anyway.

As I recall (it's been awhile) doing something like, say, a PUT handler
involved first the PHP engine accepting the request, spooling the data
to a temp file or memory or what have you, setting up it's environment
and THEN calling your script.

Perl on the other hand, not being web centric, gives you the opportunity
to read in from standard input directly if you want, so you could
build an XML tree from the source document /as/ the document is being
posted.

Why not take advantage of the IO-bound time to build a data structure as data
arives? Especially considering you were going to read it anyway?
(caveats about DOS attacks, existing frameworks and the like assumed)
True, but if you really have to, there are ways around it in PHP. But
if you really want speed in processing, I suggest you implement it in
C/C++. That is much faster.

C/C++ will always be faster than perl. (but you could just as well say
that if you need it fast, you should write it in assembly)

One reason for doing it in perl (or PHP) is that you don't need to compile
it for the platform.

Being able to store a tree in memory (with perl mod_perl/FCGI) is a balance
between cross-platform and performance. An option you don't really have
with PHP. (and actually, don't have with perl either, unless FCGI or mod_perl
is available)

Shared heavy objects are a significant issue with XSLT transformations.
OTOH, perl is older and therefore more stable and mature. Newer is not
always better. Each language needs to be evaluated in the context it
will be used. A language which fits well into one project may not fit
well into another.

In the context of XML and databases, I think perl is the winner. In the context
of popularity contests and quick projects, PHP is the winner.

Popularity is significant if you need to tie in with other products, as they
will most likely be PHP.

Jamie
 
P

Peter J. Holzer

Seems to me, perl DBI is a bit more intuitive. Last time I looked,
php didn't really have placeholders (well, it did, depending on which
version of php was available where)

This was awhile ago, but.. in mysql, placeholders didn't help much.

That depends on what they should help you with. They always help making
your code more robust and secure. "SQL injection" just isn't an issue
if you use placeholders. They may or may not help improving performance.
(Recent versions of MySQL do support placeholders, BTW, and DBD::mysql
can use them (the default is still to interpolate them in the DBD)).

hp
 
J

Jerry Stuckle

Jamie said:
The thing is, this cache happens server-side (server relative to the DBM
server)

In the case of n-number of web server processes, it matters because you'll
likely have fewer DBM resources. Compiled web programs won't even help
you here as this stage happens on the database.

But, this is pretty much a moot point for mysql, unless mysql has implemented
placeholders and I haven't heard about it.

I will only add that placeholders also shield you from many SQL injection
bugs. Something that is nice to have around.

So this is a database issue and not a language issue then.
It depends on who installed the PHP binary server-side. I've smacked my
head into this wall a number of times. PHP 5.n looked really promising,
I ported my stuff to it.. and soon discovered almost all the ISP's out
there are still using PHP 4... and continue to use PHP 4.

Not all. There are a number which are using PHP 5. But I run my own
servers, so I use the version *I* want - not some semi-technical number
cruncher which has no idea what peoples' needs are.
Like, if they had compiled php without POSIX support for example? Thats
the kind of stuff I'm talking about. One place has posix, the other doesn't
and you end up coding around whichever environment is available.

But this has nothing to do with the version of PHP - just the compile
options. You need posix support - go with a host which has it
installed. Hosts are a dime a dozen.
As far as actual code crashing.. I've had numerous problems with PHP reference
handling, looking in apache error logs reveal a segfault depending of
course on your particular copy of php. (PHP5 gets reference handling "right"
in my view, too bad it's flaky)

Yes, unfortunately PHP has a bug now and then. But the PHP team is real
good at fixing them when they have the right information.
Oh, I agree. Not a slam against the PHP developers. PHP is great for small
things where you only need a quick mysql query or access to the include()
functions.

And it's good for large projects as well. I've seen some pretty big ones.
Heckuva lot easier than installing perl template modules.


Yea, Larry Wall.. a linguist. You can tell by the way it was designed that
it actually /was/ designed.


Spam assassin is another that is quite impressive. It's just that with perl (or
PHP) the rules aren't as "enforced". Perl6 I hear is trying to address this.

Yep, it is. But neither perl nor PHP were intended to be strictly typed
languages. That's both good and bad - depending on how you look at it.
True. It's just that with the large installed base of PHP4, you'll find
this fact is a real problem.

I don't find it to be a problem at all.
As I recall (it's been awhile) doing something like, say, a PUT handler
involved first the PHP engine accepting the request, spooling the data
to a temp file or memory or what have you, setting up it's environment
and THEN calling your script.

Perl on the other hand, not being web centric, gives you the opportunity
to read in from standard input directly if you want, so you could
build an XML tree from the source document /as/ the document is being
posted.

Why not take advantage of the IO-bound time to build a data structure as data
arives? Especially considering you were going to read it anyway?
(caveats about DOS attacks, existing frameworks and the like assumed)

I've never found this to be a problem. Unless you're posting 10Mb
forms, any i/o time is minimal. And the time to process it is minimal.

But how often is PUT used, anyway? Not very often. I know I've never
used it in any language - and I don't know of many people who have. So
again - no biggie.

And while your thread is i/o bound, the server may be out doing
something else anyway.
C/C++ will always be faster than perl. (but you could just as well say
that if you need it fast, you should write it in assembly)

One reason for doing it in perl (or PHP) is that you don't need to compile
it for the platform.

Big deal. I've compiled C/C++ modules for multiple platforms before -
it's pretty simple if you write non-OS specific code.
Being able to store a tree in memory (with perl mod_perl/FCGI) is a balance
between cross-platform and performance. An option you don't really have
with PHP. (and actually, don't have with perl either, unless FCGI or mod_perl
is available)

How often to you need to store a tree in memory? Not often, I suspect.
Shared heavy objects are a significant issue with XSLT transformations.

Yep. But again - if you want more performance, use a compiled language.
In the context of XML and databases, I think perl is the winner. In the context
of popularity contests and quick projects, PHP is the winner.

I disagree with you here, also. PHP does quite well with both XML and
databases.
Popularity is significant if you need to tie in with other products, as they
will most likely be PHP.

Jamie

And it's popular because it's good.

Before you make such broad statements, I suggest you learn more about
the language. You're lack of knowledge of PHP's capabilities shows.

But I know you won't. You've made up your mind, and nobody's going to
confuse you with the facts.

As for perl - I'm not saying it's a bad language. I actually find it
the opposite. And I'll use the one more appropriate for the task at
hand. Sometimes it's perl. sometimes PHP.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
(e-mail address removed)
==================
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,983
Messages
2,570,187
Members
46,747
Latest member
jojoBizaroo

Latest Threads

Top