HTTP Filtering and Threads...

D

Dan

Hi,

That's my first program and my first question in perl programming..
lolll! Any kind of help is appreciated. :)

1) I have a code in perl which is doing a HTTP request and getting a
response and saving in a variable, so I want to filter a specific
value of a field. My code is more or less like this one:

next unless /^<input name/i;
my ($name, $value) = $_ =~ /input name="(.*)Name" type=.*
value="(.*)">/i;
if ((length($value)) > 1){
$MiddleName = $value;
#Some Stuff Code...
print "$MiddleName";<br><br>
}

However the HTTP request return a HTML code that is more or less like
this:

#Some non relevante HTML stuff...
<input name="$mdName" type="hidden" value="Silva">
#Some non relevante HTML stuff...
<input name="Name" type="hidden" value="Silva">
<input name="mdName" type="hidden" value="Daniel">
#Some non relevante HTML stuff...<code>

The problem is that my code is getting the value of "mdName" which is
"Daniel" and I want it get the value of "$mdName" which is "Silva" and
if it is missing (blank) I want to get the value of "Name" which in
the example also is "Silva". But I never want to get the value of
"mdName" which is "Daniel" and is what always is happening. :(

Obs.: I also tried (without sucess) use:

* my ($name, $value) = $_ =~ input name="\"\$mdName\" type=.*
value="(.*)">/i;

* my ($name, $value) = $_ =~ m/input name=\"\$mdName\" type=.*
value="(.*)">/i;

* my ($name, $value) = $_ =~ input name="\/$mdName\/" type=.*
value="(.*)">/i;

* my ($name, $value) = $_ =~ m/input name="\/$mdName\/" type=.*
value="(.*)">/i;


Someone can give me a snippet of code of how to fix it? :)

2) In the some program I have a piece of code which list all users and
do a loop for call the function which will get detailed information of
each user (the code in question 1 is part of this function). The
snippet is like this one:

# Some irrelevant code stuff...

(my $ruid, @userIDs) = &GetUserList($start, $end);
if ($userIDs[0] == -1) { exit(0); }

foreach $userID (@userIDs) {
&GetUserData($name, $middlename, $lname, $bdate);

print "$userID\t: $name, $middlename, $lname, $bdate";

# Some irrelevant code stuff...

}

# Some irrelevant code stuff...

The function GetUserData() is really slow, it do HTTP Request, parse
some HTML stuff and the amount of users is big. So I would like to add
thread support to it, in a fashion that I could have for example 8
instances of this code running in paralel. :)

I had looked at http://perldoc.perl.org/threads.html, but it doesn't
helped so much. I belive I should add the thread support in a fashion
that it work directly with the foreach loop instruction and
GetUserData(), right?

However I want to take care to doesn't overwrite data (in C when we
deal with threads we have some unsafe functions that can overwrite
values - which is not good)... also take care that each print will be
in the correct sequence...

Can someone give me a snippet of code based in mine for that? I know
that read documentation is better, but documentation doesn't helped
much, I appreciate practical examples...

3) The Perl2exe (http://www.indigostar.com/perl2exe.htm) is the best
option to convert Perl code to Executables? It really work well? Even
with complicated and sophisticated code (using thread, raw sockets,
windows registry access, etc)?

Well, that's my first code in perl, so sorry for ugly/bad code (and
also I'm not a programmer, just a curious:). hehe

Thank you and sorry for amount (of dumb and off-topic) questions.

Cheers,
 
B

Ben Morrow

Quoth Dan said:
1) I have a code in perl which is doing a HTTP request and getting a
response and saving in a variable, so I want to filter a specific
value of a field. My code is more or less like this one:

next unless /^<input name/i;

You are trying to parse HTML with regular expressions. This is a very
bad idea. I would strongly recommend using HTML::parser, or another
module capable of actually parsing HTML.
my ($name, $value) = $_ =~ /input name="(.*)Name" type=.*
value="(.*)">/i;

This will fail because the * regex operator is 'greedy': it always takes
as much text as it can. This is why you are always getting the last
value in your example below: the first .* matches everything from the
first 'name="' all the way to the middle of the last said:
if ((length($value)) > 1){
$MiddleName = $value;
#Some Stuff Code...
print "$MiddleName";<br><br>
^^^^^^^^
This is not Perl. Please post the *actual* code you ran. It make things
simpler :).
}

However the HTTP request return a HTML code that is more or less like
this:

#Some non relevante HTML stuff...
<input name="$mdName" type="hidden" value="Silva">
#Some non relevante HTML stuff...
<input name="Name" type="hidden" value="Silva">
<input name="mdName" type="hidden" value="Daniel">
#Some non relevante HTML stuff...<code>

The problem is that my code is getting the value of "mdName" which is
"Daniel" and I want it get the value of "$mdName" which is "Silva" and
if it is missing (blank) I want to get the value of "Name" which in
the example also is "Silva". But I never want to get the value of
"mdName" which is "Daniel" and is what always is happening. :(

Obs.: I also tried (without sucess) use:

* my ($name, $value) = $_ =~ input name="\"\$mdName\" type=.*
value="(.*)">/i;

* my ($name, $value) = $_ =~ m/input name=\"\$mdName\" type=.*
value="(.*)">/i;

* my ($name, $value) = $_ =~ input name="\/$mdName\/" type=.*
value="(.*)">/i;

* my ($name, $value) = $_ =~ m/input name="\/$mdName\/" type=.*
value="(.*)">/i;

Uh, why? Don't just randomly try things hoping one will work; instead,
understand what is going wrong and fix it.
2) In the some program I have a piece of code which list all users and
do a loop for call the function which will get detailed information of
each user (the code in question 1 is part of this function). The
snippet is like this one:

# Some irrelevant code stuff...

(my $ruid, @userIDs) = &GetUserList($start, $end);

Don't call subs with &. It was a Perl 4 practice, and has some strange
side-effects in Perl 5.
if ($userIDs[0] == -1) { exit(0); }

foreach $userID (@userIDs) {
&GetUserData($name, $middlename, $lname, $bdate);

Your sub GetUserData seems to be directly updating the variables pased
to it. This is a bad idea as it is not what someone reading the code
will expect. It would be better to return a list and call like

my ($name, $middlename, $lname, $bdate) = GetUserData;

Also, it seems to be getting the value of the user ID from a global
variable: again, it would be better to pass it to the function.
print "$userID\t: $name, $middlename, $lname, $bdate";

# Some irrelevant code stuff...

}

# Some irrelevant code stuff...

The function GetUserData() is really slow, it do HTTP Request, parse
some HTML stuff and the amount of users is big. So I would like to add
thread support to it, in a fashion that I could have for example 8
instances of this code running in paralel. :)

Note that this may well not make it run faster. Unless you have 8
processors (lucky you ;) ), it will just make things slower.

One thing that may be slowing things down is if you are fetching and
parsing the same page many times. You may want to look at the Memoize
module as an easy way of avoiding that.
I had looked at http://perldoc.perl.org/threads.html, but it doesn't
helped so much. I belive I should add the thread support in a fashion
that it work directly with the foreach loop instruction and
GetUserData(), right?

The simplest way to multi-thread the above is something like

use threads;

foreach $userID (@userIDs) {
async {
my ($name, $middlename, $lname, $bdate) =
GetUserData($userID);

print "$userID\t: $name, $middlename, $lname, $bdate";

# Some irrelevant code stuff...
}
}

This will run each request in a new thread; but as you have identified,
the output will come out any which way. If you really want to use
threads, you want to use something like Thread::Queue to pass the
results back to the parent thread, which can then deal with printing
them.
However I want to take care to doesn't overwrite data (in C when we
deal with threads we have some unsafe functions that can overwrite
values - which is not good)...

This is not an issue in Perl. Threads have completely separate
variables: threads in Perl are more like Unix' fork than like
traditional C threading.
3) The Perl2exe (http://www.indigostar.com/perl2exe.htm) is the best
option to convert Perl code to Executables? It really work well? Even
with complicated and sophisticated code (using thread, raw sockets,
windows registry access, etc)?

I've never used perl2exe (I understand it's not free?), but I have had
success with PAR, which you can install from CPAN.
Well, that's my first code in perl, so sorry for ugly/bad code (and
also I'm not a programmer, just a curious:). hehe

That's fine: there's nothing wrong with writing bad code when you are
first learning :). The code you posted isn't half as bad as some we see
in this group, anyway...
Thank you and sorry for amount (of dumb and off-topic) questions.

Not off-topic at all, and not dumb neither.

Ben
 
D

Dan

Hi Ben,

How are you?

First of all, thank you for the help and fast reply. :)
You are trying to parse HTML with regular expressions. This is a very
bad idea. I would strongly recommend using HTML::parser, or another
module capable of actually parsing HTML.

Hummm, thank you for the suggestion. I will look at it.

Anyway, I fixed the problem, it was so dumb. To fix I just done it:

my ($value) = $_ =~ /input name="$mdName" type=.* value="(.*)">/i;

And all worked. My problem was:

- Using two variables to recieve responses (my ($value, $name)...).

- Use the * regex operator as you spoted. :)
Don't call subs with &. It was a Perl 4 practice, and has some strange side-effects in Perl 5.

Fixed. Thank you for tip.

Anw what about if someone try to run this script in a old box with
perl 4? It will fail without the & before call stubs?
Also, it seems to be getting the value of the user ID from a global
variable: again, it would be better to pass it to the function.

Ok, I fixed it. Now the variables are local and passed as arguments
via functions and values returned via functions.:)
Note that this may well not make it run faster. Unless you have 8
processors (lucky you ;) ), it will just make things slower.
One thing that may be slowing things down is if you are fetching and
parsing the same page many times. You may want to look at the Memoize
module as an easy way of avoiding that.

Hehehe, my slow problem is not exactly related with processing power,
but with only do one connection by each time, and it take much time to
do all the job. So I would like to do some (6, 7, 8) connections in
parallel, I'm sure my machine with 1 processor will not decept me in
work just with 8 simultaneous connections. :)
The simplest way to multi-thread the above is something like
use threads;
foreach $userID (@userIDs) {
async {
my ($name, $middlename, $lname, $bdate) =
GetUserData($userID);

print "$userID\t: $name, $middlename, $lname, $bdate";

# Some irrelevant code stuff...
}

}
This will run each request in a new thread; but as you have identified,
the output will come out any which way.

Humm... but in this example, how I define the number of threads (6, 7,
8) to spawn? :)
If you really want to use threads, you want to use something like Thread::Queue to pass the
results back to the parent thread, which can then deal with printing them.

This code appear veryyy simple, I loved it. :)

Unhaply it doesn`t meet the needs, I will look for this
Thread::Queue.

Wait, maybe I can use it. There are exist lock()/mutex() in perl?

So I can do something like this:

use threads;

foreach $userID (@userIDs) {
async { # I need to learn define number of threads :)
my ($name, $middlename, $lname, $bdate) =
GetUserData($userID);
}


lock($name);
lock($middlename);
lock($lname);
lock($bdate);

# Some irrelevant code stuff...

print "$userID\t: $name, $middlename, $lname, $bdate";

unlock($name);
unlock($middlename);
unlock($lname);
unlock($bdate);

# Some irrelevant code stuff...

}

Is it possible? Or something equivalent?

In this way, I can grant the output will not be printed out of order,
but I can grant the value of variables will not be overwritten by
other threads before I manipulate and print it. :)

This is not an issue in Perl. Threads have completely separate
variables: threads in Perl are more like Unix' fork than like
traditional C threading.
Humm...

I've never used perl2exe (I understand it's not free?),


Don't know what is the license, but everybody can download from:

http://www.indigostar.com/perl2exe.htm#Download

Maybe a freeware?
but I have had success with PAR, which you can install from CPAN.

Nice, I never had seen it before. Really good. :)

I installed it via cpan, but it doesn't installed the "pp" binary used
to convert packages.

I tryed via Perl as documentation say:

perl -MPAR=packed.exe other.pl

But it doesn`t work. Instead of generate a packed.exe, it only execute
my perl script (exactly as if I had called perl other.pl).

Well, anyway, Debian have a package (libpar-perl) with "pp"
included. :)

So I generated a executable from it:

pp -o packed.exe source.pl

Lol! My code in perl is around 10k and the output .exe is around
2.7MB, insane. hehehe

Does exist any optimization option for PAR? ;)

I belive it should had generated a executable for Windows (PE), right?
How I define if the output is for Linux (ELF) or Windows (PE)?

Couldn't locate it in documentation....

My binary definitive is for Linux, as file say:

Other.exe: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV),
for GNU/Linux 2.4.1, dynamically linked (uses shared libs), for GNU/
Linux 2.4.1, stripped

And in Linux it run well. If I copy to windows, as expected it doesn't
work. :/

E:\>other.exe
Program too big to fit in memory

Any idea how to solve it?
That's fine: there's nothing wrong with writing bad code when you are
first learning :). The code you posted isn't half as bad as some we see
in this group, anyway...

Thank you! You are really friendly.
Not off-topic at all, and not dumb neither.
:)

Ben

Thank you again.

Cheers
 
B

Ben Morrow

Quoth Dan said:
[Ben Morrow said:
You are trying to parse HTML with regular expressions. This is a very
bad idea. I would strongly recommend using HTML::parser, or another
module capable of actually parsing HTML.

Hummm, thank you for the suggestion. I will look at it.

Anyway, I fixed the problem, it was so dumb. To fix I just done it:

my ($value) = $_ =~ /input name="$mdName" type=.* value="(.*)">/i;

And all worked. My problem was:

- Using two variables to recieve responses (my ($value, $name)...).

Err... this works just fine. You obviously need the same number of ()
groups as you have variables, but even if you don't the extras just get
assigned undef.
- Use the * regex operator as you spoted. :)

Yes, but what you have above will still fail. Say your input looks like
(please excuse the overlong line, it is necessary... :) )

<input name="foo" type=one value="bar"><input name="baz" type=two value="quux">
^----------------------------------------^

and you try the match with $mdName = 'foo'. The .* after type= will
match everything from the 'o' of 'one' to the 'o' of 'two', and $value
will be set to 'quux', not 'bar'. The only reason it is working at the
moment is because (by default) . does not match a newline, so the fact
your <input>s are on separate lines is keeping them apart.

Also, and more importantly, if the structure of the input changes at all
(say, the attributes appear in a different order), the pattern won't
match at all.
side-effects in Perl 5.

Fixed. Thank you for tip.

Anw what about if someone try to run this script in a old box with
perl 4? It will fail without the & before call stubs?

Err... I don't know. I've never had the pleasure of using Perl 4... In
any case, there's likely to be a lot more that causes it to fail than
that: for instance, if you're using threads you will require perl 5.8.
If you really need to write Perl 4 (and you almost certainly *don't*),
you need to write Perl 4 from the beginning.
[threading]

Note that this may well not make it run faster. Unless you have 8
processors (lucky you ;) ), it will just make things slower.
One thing that may be slowing things down is if you are fetching and
parsing the same page many times. You may want to look at the Memoize
module as an easy way of avoiding that.

Hehehe, my slow problem is not exactly related with processing power,
but with only do one connection by each time, and it take much time to
do all the job. So I would like to do some (6, 7, 8) connections in
parallel, I'm sure my machine with 1 processor will not decept me in
work just with 8 simultaneous connections. :)

Sorry, yes of course, if you're blocking on the network performing
several fetches in parallel can help. Stupid of me... :)
Humm... but in this example, how I define the number of threads (6, 7,
8) to spawn? :)

You don't... it's only a trivial example :).
Wait, maybe I can use it. There are exist lock()/mutex() in perl?

Err... yes. See threads::shared.
So I can do something like this:

use threads;

foreach $userID (@userIDs) {
async { # I need to learn define number of threads :)
my ($name, $middlename, $lname, $bdate) =
GetUserData($userID);
}

lock($name);
lock($middlename);
lock($lname);
lock($bdate);

# Some irrelevant code stuff...

print "$userID\t: $name, $middlename, $lname, $bdate";

unlock($name);
unlock($middlename);
unlock($lname);
unlock($bdate);

# Some irrelevant code stuff...

}

Is it possible? Or something equivalent?

This, as written, will not work at all. Perl threads are not like C
threads: variables are *not* shared between threads by default. So the
above will fail for four reasons:

You need to 'use threads::shared', to get the lock routine; and
there is no unlock, you just let the lock fall out of scope.

$name isn't in scope at the point where you try to lock it. It only
exists inside the async block.

Even if it were (if you moved the 'my $name' outside the async {}),
the new thread gets a completely new copy of the variable. If you
want the variable shared between threads, you need to say so: again,
see threads::shared. In fact, you can't even attempt to lock a
variable that isn't shared: there'd be no point.

If the variables *were* shared, you would need to lock them inside
the async as well (otherwise you'd just be printing blank values
from before they got set); in which case (assuming you managed to
avoid deadlock) only one thread would run at once.
In this way, I can grant the output will not be printed out of order,

(did you mean 'I cannot grant'?)
but I can grant the value of variables will not be overwritten by
other threads before I manipulate and print it. :)

You don't need to worry about this. Read perldoc perlthrtut, for
starters: Perl guarantees this for you, usually by giving each thread
its own copy of the variable.

You may want to try the Thread::pool module from CPAN, which looks like
it does what you want.
Nice, I never had seen it before. Really good. :)

I installed it via cpan, but it doesn't installed the "pp" binary used
to convert packages.

Err... it seems that pp has been moved into the PAR-Packer dist.
I tryed via Perl as documentation say:

perl -MPAR=packed.exe other.pl

But it doesn`t work. Instead of generate a packed.exe, it only execute
my perl script (exactly as if I had called perl other.pl).

Yes, that's correct. This syntax says you want to run other.pl using the
modules inside the PAR file packed.exe (which doesn't exist, so this has
no effect).
Well, anyway, Debian have a package (libpar-perl) with "pp"
included. :)

So I generated a executable from it:

pp -o packed.exe source.pl

Lol! My code in perl is around 10k and the output .exe is around
2.7MB, insane. hehehe

Does exist any optimization option for PAR? ;)

Read the FAQ... basically, no. Everything is already zipped, and the
reason the output is so large is because there's a lot of stuff to fit
in. I strongly suspect perl2exe produces files of a similar size.

However, if you know the machine you will be deploying to has
<something>, you can ask PAR to omit <something> from the archive: for
instance, if you know your target machine will have libperl.so/
perl58.dll installed, you can reduce the size of the executable by using
pp -d.
I belive it should had generated a executable for Windows (PE), right?
How I define if the output is for Linux (ELF) or Windows (PE)?

You can't. pp will only generate archives for the arch it is running on,
so if you want a Win32 executable you have to create it on Win32. This
is possibly an advantage of perl2exe.

Ben
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top