URI:: Find & Vertical bar ('|') in URL

R

Rosina Bignall

I'm using URI::Find to get all the URLs in some text. Some of the URLs have
a vertical bar ('|') in them. They do work correctly in the browser, but,
URI::Find cuts off the URL when it finds one. Is there a way to get
URI::Find to recognize the | as a valid character in the URL?

Thanks for the help!

Rosina
 
P

Paul Lalli

Rosina said:
I'm using URI::Find to get all the URLs in some text. Some of the URLs have
a vertical bar ('|') in them. They do work correctly in the browser, but,
URI::Find cuts off the URL when it finds one.

That's because a vertical bar is not a valid URI character. Please see
http://www.ietf.org/rfc/rfc2396.txt
The fact that "the browser" (and which one, by the way?) allows it, or
even guesses what you meant by automatically converting it to its hex
representation, does not mean that a module which does follow the
specifications should allow it.
Is there a way to get URI::Find to recognize the | as a valid character in the URL?

I have no idea. I've never used the module before. From a cursory
examination of its documentation, perhaps you can subclass it and use
the decruft and cruft_set methods? I don't know, that's a guess.

The correct solution, however, is to fix whatever is generating these
URLs to make it generate valid URLs.

Paul Lalli
 
R

Rosina Bignall

While I agree that the generation of these URLs should be fixed,
unfortunately, I have no control over the generation of the URI as it comes
from another system altogether. I will suggest a change in the URL format,
but I doubt the service will listen to one user to get it changed
especially since it can be clicked in the mail reader and start up the
browser correctly. Kmail recognizes the URLs and passes the whole thing to
Firefox which also accepts them as is. I suspect IE does as well (although
as a linux user, I haven't tried it, but I'm sure there are plenty of IE
users for whom it works fine).

Thanks for the pointer to the cruft_set and decruft. Subclassing the module
and changing the uric_set does the trick:

package MyURIFind;

use URI::Find;
@MyURIFind::ISA = qw( URI::Find );

sub uric_set {
return '\;\/\?\:\@\&\=\+\$\,\[\]A-Za-z0-9\-_\.\!\~\*\'\(\)%\|';
}

So, now my question is, is there a way to just add characters to the
original uric_set, something like

sub uric_set {
my $uric_set = URI::Find->uric_set;
$uric_set.='\|';
return $uric_set;
}

Obviously, this doesn't work as is and needs somehow to properly reference
the original uric_set from the URI::Find parent class. I'm sure this is a
really stupid question and comes out of my lack of knowledge about Perl,
but my searching hasn't revealed the answer...

Thanks
Rosina
 
P

Paul Lalli

Rosina said:
So, now my question is, is there a way to just add characters to the
original uric_set, something like

sub uric_set {
my $uric_set = URI::Find->uric_set;
$uric_set.='\|';
return $uric_set;
}

Obviously, this doesn't work as is

That's not at all obvious. I would have expected that to work. Can
you be more specific about how it "doesn't work"? What errors does it
generate?
and needs somehow to properly reference
the original uric_set from the URI::Find parent class.

Without looking at the source of the original module, I'd simply try:
sub uric_set {
my $obj = shift;
my $uric_set = $obj->SUPER::uric_set();
$uric_set .= '\|';
return $uric_set;
}

Paul Lalli
 
R

Rosina Bignall

Paul said:
That's not at all obvious. I would have expected that to work. Can
you be more specific about how it "doesn't work"? What errors does it
generate?

I sincerely apologize. I must have had something else messed up when I
thought it wasn't working. You are right, it does indeed work. Thanks for
all your help!!

Rosina
http://RosinaBignall.com
 
J

John Bokma

Rosina Bignall said:
So, now my question is, is there a way to just add characters to the
original uric_set, something like

sub uric_set {
my $uric_set = URI::Find->uric_set;
$uric_set.='\|';
return $uric_set;
}


http://search.cpan.org/src/ROSCH/URI-Find-0.16/lib/URI/Find.pm

shows that:
my($uricSet) = $URI::uric;

sub uric_set {

return $URI::uric . your characters here
}

Or maybe better:

sub uric_set {


my $uric_set = shift->SUPER::uric_set;

# extend it
return $uric_set;
}

All untested, and from top of my head.
 
G

grocery_stocker

John said:
http://search.cpan.org/src/ROSCH/URI-Find-0.16/lib/URI/Find.pm

shows that:
my($uricSet) = $URI::uric;

sub uric_set {

return $URI::uric . your characters here
}

Or maybe better:

sub uric_set {


my $uric_set = shift->SUPER::uric_set;

# extend it
return $uric_set;
}

All untested, and from top of my head.

--


I'm not seeing how something like:
sub uric_set {

my $uric_set = shift->SUPER::uric_set;

# extend it
return $uric_set;
}


Could be better than:
my($uricSet) = $URI::uric;

sub uric_set {

return $URI::uric . your characters here
}


Chad
 
J

John Bokma

grocery_stocker said:
I'm not seeing how something like:
sub uric_set {

my $uric_set = shift->SUPER::uric_set;

# extend it
return $uric_set;
}


Could be better than:
my($uricSet) = $URI::uric;

That bit is not needed, it's just what's done in URI::Find
sub uric_set {

return $URI::uric . your characters here
}

The latter relies on what URI considers the set, the former on what the
module you directly subclass thinks about it.
 
R

Rosina Bignall

John Bokma wrote:

Or maybe better:

sub uric_set {


my $uric_set = shift->SUPER::uric_set;

# extend it
return $uric_set;
}

All untested, and from top of my head.

Thanks! Yes, it works perfectly.

Rosina
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,994
Messages
2,570,222
Members
46,809
Latest member
moe77

Latest Threads

Top