formating data

R

robic0

This has nothing to to with specifications.
Its not a what if game its accident prevention.
Anything involving a loop controlled by a counter
should never have a single breakout value
especially when it grows a variable in each pass.
Unless it is absolutely known that this condition
will occurr every time, it should not be done.

until (tr/\n/\n/ == 8);

will hang the program if the number of newlines in the
record are 7 or more (you add 1, pluss the eor pickup).
Looking at the randomness of the OPs data sample makes
this extremely likely to happen. Single line white spaced
characters don't count as \n\n eor in this case $/="".

Its obvious his data is poorly generated and unformatted
so he will miss some labels because of garbled information.
BUT thats no reason to NOT throw in a RANGE check in the
loop so the machine doesn't crash is it? Yeah, he didn't
say "I don't want my fucking machine to crash!", but I'm
just taking a wild assed guess he doesn't.

The OP sounds like he doesn't wan't his printer to "bomb"
when printing labels. Adding a few lines of error checking
will not only prevent the OS from crashing (so then he
can feel comfortible using the program in batch mode) but
flag the rejects to a file to be fixed later. I don't think
that the OP is looking for a speed label printer.

I also don't like the use of $/="" here given that
the natural eor is \n in this case, and without
parametric checking you have made the looping dependent
upon an exact count of CRLFs contained in the data record,
a record that is obviously flawwed, that has NO guarantee
to contain anything. If the data weren't flawwed and he
could control it, he would already have done it when the
data was generated, why would he need a Perl program?

Theres no guarantee of "\n\n" using $/="", it only takes
7 sequential ^\n/newline to send this code into a loop
that expands $_, 100k/second until the OS crashes.

Might as well have:

$cnt = 9;
while(1) {
last if ($cnt == 8);
$_ .= "\n";
$cnt++;
}

When trying to format ascii/unicode text from a file, you can't
rely on anything in that file to be in a guaranteed
order or contain guaranteed text. Triggers can be used to line
up a frame. This positively cannot be done by the use of counting
newlines or blank lines as the sole mechanism.

So basically I would not recommend this method at all.
There's nothing about it that would inspire confidence
if this were production code. If the DATA file is consistent
in its formation and only sometimes jerky in known intervals
and the OP just wanted a 1-time squirt to print some labels,
then yeah, the code is about as primative and raw a method as you
can use.

No matter what the usage, ">=" aviods a nasty out of memory
crash that could corrupt open files (filesystem), registry,
etc.. For this reason, I would "overrule" the design specifications,
("==") or suggest he find someone else that has liability insurance
for these cases.
 
R

robic0

[--snip--]
I am not the one who is stupid.

Some enjoy moving into depth of response, enjoy
writing page after page of possibilities. That is not
my style and a violation of my personal rule,

"Stay strictly within parameters, nothing more, nothing less."

Incidently, your comment about " >= 8 " that will get you
into trouble. The author never states any records are over
eight lines in length.

" >= 8 "
The greater than is just a breakout mechanism incase there
are 7 or more newlines in the record.
It does not send the record to the printer, instead just
lines up the next record. It doesen't violate anything
in the OP's spec. When the potential for damage is so
big and the possiblity the data could be in this form is big.
What the hell, throw in an extra key in the code....

In this case, its a phalanx, radar controlled, 2,000 rounds/sec,
armor piercing, 50mm gattling gun.... for the price of a
sling-shot. Gives a little more bang for the buck than the
mini-gun.
 
E

eeb4u

robic0 said:
As soon as you go over 7 newlines in the data (you add 1
pluss the eor has one)
without having a end of record $/ = "", (blank line),
the until ($_ =~ tr/\n/\n/ == 8) the tr/// will already be 9
and put it into an endless loop adding \n to $_ and eventually
crashing the os. So thats why I recommended a ">= 8" for
safety. Also, when using a blank (\n\n) eor, a line with
just spaces is not counted as a eor. Good to be a little
bullet proof.


$/ = "";

while (<DATA>)
{
my $dd = 0;
do {
$_ = "$_\n";
$dd = $_ =~ tr/\n/\n/;
print $dd,"\n";
} until ($dd >= 10);
print $_;
}


__DATA__
"************** 123456 ** B-009"
"ip address"
"SERVER 1"
"some server info"
"******************************"
" "
" "

----------------------
output:

9
10
"************** 123456 ** B-009"
"ip address"
"SERVER 1"
"some server info"
"******************************"
" "
" "

I thought I posted a reply from home, but maybe not. Pure gibberish or
not, the solution came from the original responder, David Filmer. I
used his suggestion and modified it to fit my requirements. I thank
all who replied with helpful comments.

Mike D
 
R

robic0

I thought I posted a reply from home, but maybe not. Pure gibberish or
not, the solution came from the original responder, David Filmer. I
used his suggestion and modified it to fit my requirements. I thank
all who replied with helpful comments.

Mike D

Hey Mike,
yeah I didn't see that. His is the preferred way.
Keys on the start block via =~ /^\** /, takes all to next,
prints up to 7 lines, padds if less. Overall, printer won't
bomb and your os won't crash.

/^\*+ .*(?:[0-9]{3})?$/

gluck
 
F

foo bar baz qux

A PG clone with a large vocabulary!

Let's test this theory out using the PG index:

Love of inappropriate benchmarking? ... yes.[1]
Inability to write grammatical English? ... no.
Long paranoid digressions? ... no.
Mutual love of Xah? ... not observed (yet?)

I think we can skip the rest and pronounce RG to be nearly normal.


[1] I'm assuming this printer operates at less than 10^6 lpm.
 
T

Tad McClellan

foo bar baz qux said:
Let's test this theory out using the PG index:


You forgot a couple more.

Love of inappropriate benchmarking? ... yes.[1]
Inability to write grammatical English? ... no.


Uses Archie Bunkerisms with wild abandon? ... no

eg:

respectfully => respectively

bald face liar => bold faced liar

just deserves => just desserts

All that, while still claiming to be an English Professor? ... no

Long paranoid digressions? ... no.
Mutual love of Xah? ... not observed (yet?)


It is hot for robic0 too.

I think we can skip the rest and pronounce RG to be nearly normal.


Right.
 
R

Richard Gration

Love of inappropriate benchmarking? ... yes.[1]

Now, see, I learned something. I understand now why the benchmarking I
showed was inappropriate. Thank you.

I had a very clear reason for twitting DF in my post: As stated I
think in this case PG's code is superior. No big deal in a 5 line
throwaway posted to usenet, but DF was copping such an attitude that I
felt compelled to point it out (and by inference the inappropriate
attitude). It's a shame I randomly decided to benchmark because it allowed
to ignore what I thought was the more important aspect of my post: PG's
code is more *succinct*, regardless of speed. The sequence of events was
roughly:

1. DF posts some code which, while functional, would not win any beauty
contests.

2. PG posts code which I consider way more elegant.

3. DF posts saying the code is broken because he didn't run it[1] and
didn't parse it properly by sight.

4. DF gets called on this by PG (I think the expression is "bare faced
liar", at least here in England)

5. DF chokes on his humble pie.

It was at this point I jumped in because *in*this*particular*case* the
vitriol aimed at PG was unwarranted. There is generally too much invective
in this NG, and not just towards PG. I see why it comes about, but after
all's said and done, it only serves to lower the signal to noise ratio. I
know there are other, valued, contributors here who agree. They are
conspicuous by their absence on the polemic threads.

I know people are trying to fight the good fight and protect
the traditions of usenet against the influx of boorish ignorance, but ...
it doesn't seem to be working. Maybe it will in the long run, maybe one
day the seemingly infinite supply of cluebaits will have been exhausted
and politeness will prevail on usenet. I'm not holding my breath.
During the meanwhile, I think the best defence is a sense of humour, sadly
something not nearly as evident here as the willingness to sharpen knives ...

Rich

[1] I too couldn't believe that it would work at first sight, so I did
run it, then I went digging through perldoc perlvar to find out why it
*does* work.
 
A

Anno Siegel

Richard Gration said:
Love of inappropriate benchmarking? ... yes.[1]

Now, see, I learned something. I understand now why the benchmarking I
showed was inappropriate. Thank you.

I had a very clear reason for twitting DF in my post: As stated I
think in this case PG's code is superior. No big deal in a 5 line
throwaway posted to usenet, but DF was copping such an attitude that I
felt compelled to point it out (and by inference the inappropriate
attitude). It's a shame I randomly decided to benchmark because it allowed
to ignore what I thought was the more important aspect of my post: PG's
code is more *succinct*, regardless of speed.

However, it isn't quite correct either. Here it is again:

$/ = "";

while (<DATA>)
{
do
{ $_ = "$_\n"; }
until ($_ =~ tr/\n/\n/ == 8);
print $_;
}

Reading in paragraph mode assumes that each block is indeed terminated
by an empty line. If a data block happens to have 8 data lines from the
start, the program will read it, and the next paragraph until an empty
line is found. It will then go on and add another empty line to that
over-full paragraph because the "do { ... } while ..." executes at least
once, even if the condition is initially false.

The latter problem (but not the former) is avoided by this variant:

$/ = '';
print $_ . "\n" x ( 8 - tr/\n//) while <DATA>;

This will pass paragraphs of 8 or more lines unchanged.

Anno

Anno
 
R

robic0

Love of inappropriate benchmarking? ... yes.[1]

Now, see, I learned something. I understand now why the benchmarking I
showed was inappropriate. Thank you.

I had a very clear reason for twitting DF in my post: As stated I
think in this case PG's code is superior.
That code is broken. Its proven broken. Read the rest of the
thread cumquat!!!
 
U

usenet

Richard said:
I had a very clear reason for twitting DF in my post: As stated I
think in this case PG's code is superior. No big deal in a 5 line
throwaway posted to usenet, but DF was copping such an attitude that I
felt compelled to point it out (and by inference the inappropriate
attitude).
What attitude? The post you replied to (http://tinyurl.com/btgpk) was
one in which I admitted that I was wrong and that PG was right. My
reference to "when pigs fly" was not regarding her code in this
particular post (which I admitted was good) but to her absolutely CRAP
code which she has posted in innumerable other threads (Google her
handle to find MANY examples!). But, in this case, I admitted that
PG's code was functional, and that my criticism of her code had been
out of line. This was an APOLOGY, you dimwit!
1. DF posts some code which, while functional, would not win any beauty contests.

And DF (me) said he didn't like it.
2. PG posts code which I consider way more elegant.
3. DF posts saying the code is broken because he didn't run it and
didn't parse it properly by sight.
4. DF gets called on this by PG (I think the expression is "bare faced
liar", at least here in England)
5. DF chokes on his humble pie.

It was at this point I jumped in because *in*this*particular*case* the
vitriol aimed at PG was unwarranted.

DF (me) posted a _retraction_. I'm sorry you are too dim-witted to see
it as such. It was not "vitriol" - it was an admission of error on my
part. If you want to call it "choking on my humble pie," that's your
prerogative (you can post whatever vitriol you wish). I consider it
simply a correction of an innocent mistake - setting the record
straight and retracting my incorrect comments.
There is generally too much invective

Ah, there's that vocabulary again.
it only serves to lower the signal to noise ratio.

And it does math and electrical engineering as well.
maybe one day the seemingly infinite supply of cluebaits will have been
exhausted and politeness will prevail on usenet. I'm not holding my breath.
During the meanwhile, I think the best defense is a sense of humor, sadly
something not nearly as evident here as the willingness to sharpen knives ...

"cluebaits"? "During the meanwhile"? Well, I'm hardly one to pick on
one for vocabulary (or even grammar). But I think I've learned my
lesson - if I overlook something and post an incorrect reply, don't
bother to post a correction. This dimwit (Richard Gration) may not
consider it polite.

So, Richard Gration, you dimwit, please killfile me NOW so you won't be
exposed to any of my future rude corrections or apologies.
 
R

robic0

On Mon, 28 Nov 2005 11:44:02 +0000, Richard Gration

Ok, so lets see what we learned by the OP's thread:

1. Perl Gerl's code started off original and a working
model that benched very well.
2. Perl Gerl's code was proven very dangerous and
discarded by the OP.
3. David Fillmere's code, posted before PG's code
with a different approach proved very accurate but
slower (the one the OP went with).
4. "superior" is a word reserved for historians

Well boys and gurls, thats what we learnt by this
thread...... may it rest in peace (RIP)
Love of inappropriate benchmarking? ... yes.[1]

Now, see, I learned something. I understand now why the benchmarking I
showed was inappropriate. Thank you.

I had a very clear reason for twitting DF in my post: As stated I
think in this case PG's code is superior. No big deal in a 5 line
throwaway posted to usenet, but DF was copping such an attitude that I
felt compelled to point it out (and by inference the inappropriate
attitude). It's a shame I randomly decided to benchmark because it allowed
to ignore what I thought was the more important aspect of my post: PG's
code is more *succinct*, regardless of speed. The sequence of events was
roughly:

1. DF posts some code which, while functional, would not win any beauty
contests.

2. PG posts code which I consider way more elegant.

3. DF posts saying the code is broken because he didn't run it[1] and
didn't parse it properly by sight.

4. DF gets called on this by PG (I think the expression is "bare faced
liar", at least here in England)

5. DF chokes on his humble pie.

It was at this point I jumped in because *in*this*particular*case* the
vitriol aimed at PG was unwarranted. There is generally too much invective
in this NG, and not just towards PG. I see why it comes about, but after
all's said and done, it only serves to lower the signal to noise ratio. I
know there are other, valued, contributors here who agree. They are
conspicuous by their absence on the polemic threads.

I know people are trying to fight the good fight and protect
the traditions of usenet against the influx of boorish ignorance, but ...
it doesn't seem to be working. Maybe it will in the long run, maybe one
day the seemingly infinite supply of cluebaits will have been exhausted
and politeness will prevail on usenet. I'm not holding my breath.
During the meanwhile, I think the best defence is a sense of humour, sadly
something not nearly as evident here as the willingness to sharpen knives ...

Rich

[1] I too couldn't believe that it would work at first sight, so I did
run it, then I went digging through perldoc perlvar to find out why it
*does* work.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,176
Messages
2,570,947
Members
47,501
Latest member
Ledmyplace

Latest Threads

Top