Strange behavior with strings...

Z

zeroaffinity

I have an array of strings. Each string (called $line) has various html
formatting removed (with s//) to leave a substring that is basically a
concatenated name-value pair.

I'll show the code in a sec, but here is what is strange. When I print
the string and try to append a character, it actually PREpends the
character and overwrites the first character of the string in the
process.

My code:
1 ($name,$value) = split('#',$line);
2 print $value . "*\n";

So $line had a # in it that was a delimiter. I just split it up and
attempted to print the string. This is the output...

*une 6, 2006

If I change line 2 to this: print $value . "**\n"; then I
get the following output:

**ne 6, 2006

The clincher. If I swap $value for $name, this problem goes away. In
fact, the results would be Date* and Date** in the cases above,
respectively. It seems like the data in $value is affecting the
behavior.

What could be causing this?
 
I

it_says_BALLS_on_your forehead

I have an array of strings. Each string (called $line) has various html
formatting removed (with s//) to leave a substring that is basically a
concatenated name-value pair.

I'll show the code in a sec, but here is what is strange. When I print
the string and try to append a character, it actually PREpends the
character and overwrites the first character of the string in the
process.

My code:
1 ($name,$value) = split('#',$line);
2 print $value . "*\n";

So $line had a # in it that was a delimiter. I just split it up and
attempted to print the string. This is the output...

*une 6, 2006

If I change line 2 to this: print $value . "**\n"; then I
get the following output:

**ne 6, 2006

The clincher. If I swap $value for $name, this problem goes away. In
fact, the results would be Date* and Date** in the cases above,
respectively. It seems like the data in $value is affecting the
behavior.

What could be causing this?

Could you post all of the code, and also, try printing $line before you
split. It'll help debug.
 
G

Guest

: I'll show the code in a sec

Yes, but that's not enough. Please show the data as well.

: When I print
: the string and try to append a character, it actually PREpends the
: character and overwrites the first character of the string in the
: process.

No, I can't believe it.

: My code:
: 1 ($name,$value) = split('#',$line);
: 2 print $value . "*\n";

And this is my code. Note that there are warnings and strictures enabled so
you make sure your code is bullet-proof.

#!/usr/bin/perl
use strict;
use warnings;
my $line="nonsense,#June 12, 2006";
my ($name,$value)=split('#',$line);
print $value."*\n";

And I get this result:

June 12, 2006*

As expected.

: What could be causing this?

Show us your $line - not the one you retype in your editor window, but
the real one copied from the source (and, please! no copying and pasting
by mouse!). I assume you have some unorthodox sequence of \n and|or \r
somewhere in your data.

Oliver.
 
Z

zeroaffinity

Thanks for looking at this. Here is more of the code.

#Get the web content
$html = get($url);

#Break it up into lines
@html = split(/\n/,$html);

#Operate on each line
foreach $line(@html) {
$count++;

#Yum
chomp($line);

#Clean up the formatting
$line =~ s/\'/\\\'/g;

#There is some IF-logic here to only do something
#with lines containing keywords of interest

#Get rid of <A HREF...>
$line =~ s/<a(\s*|%|&|_|\.|\?|=|"|\w*|[0-9]*)*>//ig;

#Blank space is separating the pair. Replace it
#with something that's a single char
$line =~ s/:&nbsp;/#/ig;

#Now split on that char...
($name,$value) = split('#',$line);

#Print the name
print $name . "**\n";

#Print the value
print $value . "**\n";

#End IF-logic
}
===== end of code =====
This prints the following:

Date**
**ne 6, 2006
 
Z

zeroaffinity

Okay. Here you go, Oliver. To test it, run from the command line like
so ...

C:\Perl\examples>DataGrabber.pl
http://www.dailyreportonline.com/Public_Notice/Consumer_Alerts/new_listCA.asp

=== Code ===
#!/usr/bin/perl
use strict;
use warnings;
use LWP::Simple;

print "Content-type: text/html\n\n";
my($httpData);
my($line);
my($url);
my($html);
my($count);
my(@html);
my($prepend_address);
my($name,$value);

$url = $ARGV[0];

if($url eq "") {
print "You need to provide a URL.\n";
exit;
}


$html = get($url);
@html = split(/\n/,$html);

if($html =~ /<frame/i) {
print "Frames! Can't do it. Get the URL for the specific frame
containing data you want.\n";
exit;
}



$count = 0;
foreach $line(@html) {
$count++;
chomp($line);
$line =~ s/\'/\\\'/g;

#Below is a filter for a specific web site, the Fulton County Daily
Report.
#I need to make this block executable based on command line parameters
since
#other web pages would require different filtering.
if(
($line =~ /individual_SQL/i) ||
($line =~ /Publication Date/i) ||
($line =~ /Auction Date/i) ||
($line =~ /Deed Book/i) ||
($line =~ /Original Mortgage/i) ||
($line =~ /Borrower/i) ||
($line =~ /Lender/i) ||
($line =~ /Contact/i))
{
if($line =~ /individual_SQL/i)
{ $prepend_address = 1; }
else { $prepend_address = 0; }

$line =~ s/<table(\s*|=|"|\w*|[0-9]*)*>//ig; #Gets the
<table...
$line =~ s/<a(\s*|%|&|_|\.|\?|=|"|\w*|[0-9]*)*>//ig;
#Gets the <a link...
$line =~ s/<td width="[0-9]+">//ig; #Gets the <td
width...>
$line =~ s/<\w*>//ig; #Gets the <whatever>
$line =~ s/<\/\w*>//ig; #Gets the </whatever>
if($prepend_address == 1)
{ $line = "Address#" . $line;}
else { $line =~ s/:&nbsp;/#/ig;}
($name,$value) = split('#',$line);
print $name . "**\n";
print $value . "**\n";
}
$prepend_address = 0;
}


if($count eq 0) {
print "I couldn't find anything here: $url \n";
}
 
Z

zeroaffinity

Yep, you're right. I added one more substitution line to globally
replace all \r characters with nothing. When I saw the "\r" comment in
your post, I knew exactly what dumb thing I had been tangling with for
the past 2 hours.

I should have guessed that since it's pretty obvious now that I was
printing after a \r, which doesn't send you on to the next line (thus
the purpose of \n) and so I was overwriting my string from the
beginning.

Thanks.
 
I

it_says_BALLS_on_your forehead

Thanks for looking at this. Here is more of the code.

--8 said:
#Now split on that char...
($name,$value) = split('#',$line);

#Print the name
print $name . "**\n";

#Print the value
print $value . "**\n";

This prints the following:

Date**
**ne 6, 2006


this is the only relevant piece for now. can you reproduce the 'strange
behavior' with only the below code? Also, can you provide real data?

my $line = q{<you paste a line here>};

my ( $name, $value ) = split('#', $line);
print $name . "**\n";
print $value . "**\n";


Here is a sample from me:

use strict; use warnings;

my $line = 'name#bob';

my ($name, $value) = split( '#', $line );

print $name . "**\n";
print $value . "**\n";

__OUTPUT__
name**
bob**


(also, you should favor printing a list over concat in this instance.
i.e.:
print $name, "**\n";
print $value, "**\n";
)
 
J

J. Gleixner

#Get rid of <A HREF...>
$line =~ s/<a(\s*|%|&|_|\.|\?|=|"|\w*|[0-9]*)*>//ig;

#Blank space is separating the pair. Replace it
#with something that's a single char
$line =~ s/:&nbsp;/#/ig;

You do know that there are modules to parse HTML, don't you?
 
H

hymie!

In our last episode, the evil Dr. Lacto had captured our hero,
My code:
1 ($name,$value) = split('#',$line);
2 print $value . "*\n";

So $line had a # in it that was a delimiter. I just split it up and
attempted to print the string. This is the output...

*une 6, 2006
What could be causing this?

Your data isn't chomp'd.

hymie! http://www.smart.net/~hymowitz (e-mail address removed)
===============================================================================
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,992
Messages
2,570,220
Members
46,807
Latest member
ryef

Latest Threads

Top