Any Idea why this code doesn't remove all the blank lines?

J

Jack Wang

This is the code I've written so far.

#!/usr/bin/perl
my $result = "";
while (<>){
if (/---START---/../--END\s---/){
$result.=$_;
}
}
$text="";
$result=~m/^---START---(.*)--END\s---$/s;
$text.=$1;
$text =~ s/\n+/\n/g;
print $text;

This is the text that it should handle (shortened, ........ represents
more data).

---START---

1342A 1O B10/B11
1003 1O B45/Z46
1094 1O F39/F40
1416 1O G37/G38
1007 1O Z33/A34
..........................

.............................
.............................
.....stuff here..........
.....................

4105 4L F31/F32
.......................
......................

--END ---


I want to extract the data betweeen ---START--- and --END ---,
removing any blanklines. However, the above mentioned program would
outputs everything correctly except it leaves a blank line at the top
and I can't figure out why. Thanks for any help!
 
J

John W. Krahn

Jack said:
This is the code I've written so far.

#!/usr/bin/perl

use warnings;
use strict;
my $result = "";
while (<>){
if (/---START---/../--END\s---/){

next unless /\S/;
next if /---START---/ || /--END\s---/;
$result.=$_;
}
}
$text="";
$result=~m/^---START---(.*)--END\s---$/s;
$text.=$1;
$text =~ s/\n+/\n/g;
print $text;



John
 
X

xhoster

$result=~m/^---START---(.*)--END\s---$/s;
$text.=$1;
$text =~ s/\n+/\n/g;
....

However, the above mentioned program would
outputs everything correctly except it leaves a blank line at the top
and I can't figure out why. Thanks for any help!

You get a blank line either when there are two \n in a row, or when
the string has a single \n at the beginning. Your regex captures one,
but not the other.

Either don't capture them in the first place:

$result=~m/^---START---\n*(.*)--END\s---$/s;

Or remove it particularly:

$text =~ s/\n+/\n/g;
$text =~ s/^\n+//;

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 
U

Uri Guttman

JWK> use warnings;
JWK> use strict;

JWK> next unless /\S/;
JWK> next if /---START---/ || /--END\s---/;

you can use the return value of .. to eliminate the redundancy of those
regexes:

if ( my $range_num = /---START---/ .. /--END\s---/ ) {

next if $range_num == 1 || $range_num =~ /e/i ;
}


i would even drop the block:

my $range_num = /---START---/ .. /--END\s---/ ) {
next unless $range_num ;
next if $range_num == 1 || $range_num =~ /e/i ;
next unless /\S/ ;

but my favorite way is so much faster and shorter (untested):

use File::Slurp ;

my $text = read_file( \*STDIN ) ;
while( my( $result ) = $text =~ m/^---START---(.+)--END\s---$/msg ) {

# do newline and other cleanup here

$result =~ tr/\n//s ;

print $result ;
}

can't get much simpler than that.

uri
 
M

Martijn Lievaart

I want to extract the data betweeen ---START--- and --END ---,
removing any blanklines. However, the above mentioned program would
outputs everything correctly except it leaves a blank line at the top
and I can't figure out why. Thanks for any help!

Because you ask it to?

Your problem can be shortened to:
$ perl -e '$t="\ntest\n\ntest\n"; $t=~ s/^\n+/\n/g; print "t=$t\n"'

This does exactly the same thing, it leaves the first empty line. Why?
Because you replace the newline there wit a newline.

Try:
$ perl -e '$t="\ntest\n\ntest\n"; $t=~ s/\n+/x/g; print "t=$t\n"'

And you'll see what I mean.

You probably want to add:
$text =~ s/^\n//;
to achieve what you want.

Some stylistic issues:
#!/usr/bin/perl

use strict;
use warnings;
my $result = "";
while (<>){
if (/---START---/../--END\s---/){
$result.=$_;
}
}

Indentation helps for readability.
$text="";
$result=~m/^---START---(.*)--END\s---$/s;
$text.=$1;

Useless use of concatenation, Change to:

$result=~m/^---START---(.*)--END\s---$/s;
my $text = $1;
$text =~ s/\n+/\n/g;
print $text;

HTH,
M4
 
M

Mario D'Alessio

Try this:

while(<>)
{
#
# Grab the lines between these two lines (exclusive)
#
my $sequence = /---START---/.../--END\s---/;
next unless $sequence > 1; # Excludes left-hand pattern
next if $sequence =~ /E0$/; # Excludes right-hand pattern

next if /^\s*$/; # Skip blank lines
print;
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,989
Messages
2,570,207
Members
46,782
Latest member
ThomasGex

Latest Threads

Top