[newbie] chomp acting weird (or me not understanding how it works??)

H

Hendrik Maryns

Hi,

I'm reading lines from an inputfile which contains on every line a word,
a description of it, and the word again. I want to reconstruct the
sentence these words formed. "LET" is the sign for punctuation, thus
means the end of the sentence is reached. Whend i use chomp like this:

#add this when testing
open(INFILE, "test.txt");
#

do{
my @zinwoorden;
chomp($lijn=<INFILE>);
while ($lijn!~/LET/){
$lijnnr++;
push(@zinwoorden,$lijn);
$lijn=<INBESTAND>;
} #enduntil
for (@zinwoorden){
s/(\w+).*/$1/;
}
my $zin=join (" ", @zinwoorden);
print "$lijnnr \n$zin \n";
} until eof(INFILE);

it doesn't work, just prints 0.

If I do this, on the other hand:

do{
my @zinwoorden;
$lijn=<INFILE>;
while ($lijn!~/LET/){
$lijnnr++;
push(@zinwoorden,$lijn);
$lijn=<INBESTAND>;
} #enduntil
for (@zinwoorden){
s/(\w+).*\n/$1/; # <-- workaround
}
my $zin=join (" ", @zinwoorden);
print "$lijnnr \n$zin \n";
} until eof(INFILE);

it works. I don't understand this. I'd like to use chomp, as it is
safer, in case the \n would, by some strange event, not be there...
I tried to change $/="\n", but as I don't really know what that does,
I'm a bit afraid there, it didn't work anyway.

An easy example of INFILE would be:

## start of test.txt
dames N(soort,mv,basis) dame
en VG(neven) en
heren N(soort,mv,basis) heer
meneer N(soort,ev,basis,zijd,stan) meneer
de LID(bep,stan,rest) de
rector N(soort,ev,basis,zijd,stan) rector
een LID(onbep,stan,agr) een
hele ADJ(prenom,basis,met-e,stan) heel
mooie ADJ(prenom,basis,met-e,stan) mooi
avond N(soort,ev,basis,zijd,stan) avond
en VG(neven) en
een LID(onbep,stan,agr) een
hartelijk ADJ(vrij,basis,zonder) hartelijk
welkom N(soort,ev,basis,onz,stan) welkom
aan VZ(init) aan
alle VNW(onbep,det,stan,prenom,met-e,agr) al
aanwezigen ADJ(nom,basis,met-e,mv-n) aanwezig
in VZ(init) in
de LID(bep,stan,rest) de
zaal N(soort,ev,basis,zijd,stan) zaal
.. LET() .
## end of test.txt
and a few more of these sentences would follow.

Any comment on my code welcome too!

Cheers, Hendrik
 
C

Chris Mattern

Hendrik said:
Hi,

I'm reading lines from an inputfile which contains on every line a word,
a description of it, and the word again. I want to reconstruct the
sentence these words formed. "LET" is the sign for punctuation, thus
means the end of the sentence is reached. Whend i use chomp like this:

#add this when testing
open(INFILE, "test.txt");
#

do{
my @zinwoorden;
chomp($lijn=<INFILE>);
while ($lijn!~/LET/){
$lijnnr++;
push(@zinwoorden,$lijn);
$lijn=<INBESTAND>;
} #enduntil
for (@zinwoorden){
s/(\w+).*/$1/;
}
my $zin=join (" ", @zinwoorden);
print "$lijnnr \n$zin \n";
} until eof(INFILE);

it doesn't work, just prints 0.

You attempted to translate this code and then posted it without
running the translation--the inner loop still reads from
INBESTAND instead of INFILE. Post problem code you have
actually run (and cut and paste it so you don't have typos).
In any case, your problem is that you forgot the chomp for
reading INFILE/INBESTAND in the inner loop.


--
Christopher Mattern

"Which one you figure tracked us?"
"The ugly one, sir."
"...Could you be more specific?"
 
A

A. Sinan Unur

Hi,

I'm reading lines from an inputfile which contains on every line a
word, a description of it, and the word again. I want to reconstruct
the sentence these words formed.

Since I do not speak the language in which these words are supposed to
form a sentence, I cannot figure out what your program is supposed to
do.
"LET" is the sign for punctuation, thus means the end of the sentence
is reached. Whend i use chomp like this:

use strict;
use warnings;
#add this when testing
open(INFILE, "test.txt");

You should always, with no exceptions, check if open succeeded.
do{
my @zinwoorden;
chomp($lijn=<INFILE>);

The canonical way to read a file line by line until the end is to use:

while(my $lijn = <INFILE>) {
chomp $lijn;


}
while ($lijn!~/LET/){

What do you think this does? Why are you not using an if statement here?

The rest snipped.

Please read the posting guidelines and post codes others can run by
copying and pasting in an editor. The guidelines explain how to do that.

#! /usr/bin/perl

use strict;
use warnings;

my @words;

while(<DATA>) {
chomp;
next unless $_;
last if 0 <= index $_, 'LET';
if( /^(\w+)\s+/ ) {
push @words, $1;
}
}

print "@words.\n";

__DATA__
dames N(soort,mv,basis) dame
en VG(neven) en
heren N(soort,mv,basis) heer
meneer N(soort,ev,basis,zijd,stan) meneer
de LID(bep,stan,rest) de
rector N(soort,ev,basis,zijd,stan) rector
een LID(onbep,stan,agr) een
hele ADJ(prenom,basis,met-e,stan) heel
mooie ADJ(prenom,basis,met-e,stan) mooi
avond N(soort,ev,basis,zijd,stan) avond
en VG(neven) en
een LID(onbep,stan,agr) een
hartelijk ADJ(vrij,basis,zonder) hartelijk
welkom N(soort,ev,basis,onz,stan) welkom
aan VZ(init) aan
alle VNW(onbep,det,stan,prenom,met-e,agr) al
aanwezigen ADJ(nom,basis,met-e,mv-n) aanwezig
in VZ(init) in
de LID(bep,stan,rest) de
zaal N(soort,ev,basis,zijd,stan) zaal
.. LET() .
 
A

A. Sinan Unur

Ok, I humbly beg for forgiveness on my bare knees for not testing
before I posted. And promise solemnly to never do it again.

That attitude will not get you anywhere.
So here is a second try, and I did test those two, trimmed some lines
of it too.

You might want to go and read the posting guidelines as well.
You really don't have to care about the meaning of the
words, it's just that in the first script, there is a \n at the end of
each word, whereas in the second, there isn't.

You have only posted one set of data that consists of lines that seem to
have three components: a word, some grammatical classification of the
word, and the plural of the word. Then, a newline terminates the line.
What I see and what you wrote above are not compatible.

Remember that thing I said about posting a short but complete script
that others can run just by pasting it in their favorite editor?

Please do that!
I didn't understand your explanations in the two answers up to now, so
could you please be more specific,
No.

I really only started with Perl last week.

All the more reason to follow advice that serves to help you help others
help you.
Attachment decoded: untitled-2.txt
--------------070504050109010601000209
dames N(soort,mv,basis) dame
en VG(neven) en

There is no need to post attachments.

Sinan.
 
H

Hendrik Maryns

A. Sinan Unur schreef:
Ok, I'll give it a last try.
Since I do not speak the language in which these words are supposed to
form a sentence, I cannot figure out what your program is supposed to
do.

Basically, just taking the first word of each line, and joining them all
into a sentence, in the same order.
use strict;
use warnings;

Yes, I read about that and am using it now, thanks.

You should always, with no exceptions, check if open succeeded.
I normally do, but forgot here.
The canonical way to read a file line by line until the end is to use:

while(my $lijn = <INFILE>) {
chomp $lijn;


}
I know that, but I didn't see a way to do this _and_ do the other stuff
I want to do with the reconstructed sentence afterwards, so I tried it
another way.
What do you think this does? Why are you not using an if statement here?

Because of the way I'm reading the file: read a line until you encounter
punctuation, which is indicated by "LET" occuring in the line. This
_does_ work fine by the way.
The rest snipped.

Please read the posting guidelines and post codes others can run by
copying and pasting in an editor. The guidelines explain how to do that.
Ok, and where can I find these guidelines, please?

<snipped code>
This indeed does exactly what I was looking for! (And you don't even
need chomp here ;-)
My problem still remains: how can I do this, sentence per sentence (as,
in the real DATA, more of those follow each other), and manipulate each
line of the data in another way after each sentence.
But don't bother answering, I'll try for myself some more.

So, thank you,
Hendrik

PS: the start of my other post was meant ironic, but I suppose your
answer was too :p
 
M

Michele Dondi

I'm reading lines from an inputfile which contains on every line a word,
a description of it, and the word again. I want to reconstruct the
sentence these words formed. "LET" is the sign for punctuation, thus
means the end of the sentence is reached. Whend i use chomp like this:

#add this when testing
open(INFILE, "test.txt");

Please, _don't_! Even if this is meant to be only a minimal example
typing something like

open my $in, '<', 'test.txt' or die $!

takes only a few more keystrokes.
[snip all code]
I tried to change $/="\n", but as I don't really know what that does,

perldoc perlvar
An easy example of INFILE would be:

## start of test.txt
dames N(soort,mv,basis) dame
en VG(neven) en
heren N(soort,mv,basis) heer [snip]
. LET() .
## end of test.txt

Please note that this doesn't _strictly_ match the description above
(not that it matters, IMHO). Also, it _seems_ that '.' in the first
field ends a sentence just as much 'LET()' in the second one does.
Any comment on my code welcome too!

Personally I found it overly clumsy and hard to follow, and being
particularly lazy in this moment I didn't even try to understand it or
to test it either.

All in all, if I understand correctly the description of your task
(and if you descripted it correctly in the first place!) I suppose you
may want something along these lines:

#!/usr/bin/perl -ln

use strict;
use warnings;

push our @wordz, (split)[0];
print "@wordz" and @wordz=()
if $wordz[-1] eq '.';

__END__

(Of course most probably you do _not_ want that whitespace before the
period... ;-)


Michele
 
A

A. Sinan Unur

A. Sinan Unur schreef:

Because of the way I'm reading the file: read a line until you
encounter punctuation, which is indicated by "LET" occuring in the
line. This _does_ work fine by the way.

You must have an interesting definition of 'fine'. The above body of the
while statement above will either not be executed or will be excuted
exactly once.

Also, the data you posted does not match the verbal description you give
above. Looking at the data, I see no lines terminated by LET but rather
a sequence of lines terminated by a line containing LET. Why would you
lie to people whose help you are asking for.
Ok, and where can I find these guidelines, please?

They are posted here several times every month. You did read the FAQ
list and lurk a little before posting, didn't you.

Even if you hadn't, there is no excuse for asking _this_ question.

Even the most dimwitted among us are able to type

comp.lang.perl.misc guidelines in

in the searchbox at http://www.google.com/ .
This indeed does exactly what I was looking for! (And you don't even
need chomp here ;-)
My problem still remains: how can I do this, sentence per sentence
(as, in the real DATA, more of those follow each other), and
manipulate each line of the data in another way after each sentence.

So, you did not show real data and therefore wasted our time? That's
nice.
But don't bother answering,
OK.

PS: the start of my other post was meant ironic, but I suppose your
answer was too :p

ITYM sarcastic rather than ironic.

Sinan.
 
H

Hendrik Maryns

Michele Dondi schreef:
Please, _don't_! Even if this is meant to be only a minimal example
typing something like

open my $in, '<', 'test.txt' or die $!

takes only a few more keystrokes.

Uh, ok, but as I don't even understand what this does, you can't expect
me to type it... I'll study the docs.

[snip all code]

I tried to change $/="\n", but as I don't really know what that does,


perldoc perlvar

An easy example of INFILE would be:

## start of test.txt
dames N(soort,mv,basis) dame
en VG(neven) en
heren N(soort,mv,basis) heer
[snip]

. LET() .
## end of test.txt


Please note that this doesn't _strictly_ match the description above
(not that it matters, IMHO). Also, it _seems_ that '.' in the first
field ends a sentence just as much 'LET()' in the second one does.
It does, but sometimes, it is not '.' but '?' or '...', whereas there
will always be 'LET()' in the second column.

Any comment on my code welcome too!


Personally I found it overly clumsy and hard to follow, and being
particularly lazy in this moment I didn't even try to understand it or
to test it either.

All in all, if I understand correctly the description of your task
(and if you descripted it correctly in the first place!) I suppose you
may want something along these lines:

#!/usr/bin/perl -ln

use strict;
use warnings;

push our @wordz, (split)[0];
print "@wordz" and @wordz=()
if $wordz[-1] eq '.';

__END__

(Of course most probably you do _not_ want that whitespace before the
period... ;-)
I'll try this out, thanks.

H.
 
H

Hendrik Maryns

A. Sinan Unur schreef:
You must have an interesting definition of 'fine'. The above body of the
while statement above will either not be executed or will be excuted
exactly once.

If I execute my testfile, I get the whole sentence (i.e. the first word
of every line) as output, so I deduce from that it does iterate it
several times??
I see no other loop in there which could cause this.
Also, the data you posted does not match the verbal description you give
above. Looking at the data, I see no lines terminated by LET but rather
a sequence of lines terminated by a line containing LET. Why would you
lie to people whose help you are asking for.

Sorry, if you misunderstand me. I meant: a line, containing 'LET'. Of
course, as I answered to Michele, you could use '.' as an
end-of-sentence token, but others are possible too, that's why I use
'LET', which is there in any case.
They are posted here several times every month. You did read the FAQ
list and lurk a little before posting, didn't you.

Even if you hadn't, there is no excuse for asking _this_ question.

Even the most dimwitted among us are able to type

comp.lang.perl.misc guidelines in

in the searchbox at http://www.google.com/ .

You're right about that said:
So, you did not show real data and therefore wasted our time? That's
nice.

I did, I only took an excerpt, so as to not waste bandwidth on
irrelevant text.

Whatever, you can put me in your kikkfile now.
 
A

Arndt Jonasson

A. Sinan Unur said:
That attitude will not get you anywhere.

It tells me that he is acknowledging the advice about testing before
posting. I think that's fine. A "solemn promise" on Usenet is worth
the paper it's written on, but so what.
 
M

Michele Dondi

#add this when testing
open(INFILE, "test.txt");
[snip]
open my $in, '<', 'test.txt' or die $!

takes only a few more keystrokes.

Uh, ok, but as I don't even understand what this does, you can't expect
me to type it... I'll study the docs.

It doesn't do anything much different from what your statement did.
Only it uses a lexical filehandle, that you can use later just like
any other filehandle e.g. like

while (<$in>) { #...

and it separately specifies the open() mode, i.e. 'for reading', even
if this wouldn't be strictly necessary (but IMHO is a good practice).
Oh, and last but not least it prints a minimal but descriptive error
message if anything goes wrong.
It does, but sometimes, it is not '.' but '?' or '...', whereas there
will always be 'LET()' in the second column.

Hmm, this is one of those cases in which I would like to have Perl6's
junctions ready. (Yes: I know there are already suitable modules,
etc.)

Alternatively you can use a regex, e.g.

/^(?:\.|\Q...\E|\?)$/

but I agree that it's better to whatch the second field. So a possible
solution can be


#!/usr/bin/perl -ln

use strict;
use warnings;

our @w;
my ($w,$c)=split;
print("@w$w"), @w=(), next
if $c eq 'LET()';
push @w, $w;

__END__


Please note that just as much as the similar code pasted in my
previous post, this is meant to be a minimal working example. Unless
you need only a quick hack in a realistic app you'd probably
explicitly write the loop and add bells and whistles...


Michele
 
A

Arndt Jonasson

Hendrik Maryns said:
# test.pl (not on Unix, so no /user/bin/perl?)
use warnings;
use strict;
open(INFILE, "test.txt")||die("open didn't succeed");
my @zinwoorden;
do{
chomp(my $lijn=<INFILE>);
while ($lijn!~/LET/){
push(@zinwoorden,$lijn);
$lijn=<INFILE>;
} #enduntil
for (@zinwoorden){
s/(\w+).*/$1/;
}
my $zin=join (" ", @zinwoorden);
print "$zin \n";
} until eof(INFILE);

As someone already told you in the first response to your original
question, you need to chomp all lines you read:

while ($lijn!~/LET/){
push(@zinwoorden,$lijn);
chomp($lijn=<INFILE>);
} #enduntil
 
H

Hendrik Maryns

Michele Dondi schreef:
#add this when testing
open(INFILE, "test.txt");
[snip]
open my $in, '<', 'test.txt' or die $!

takes only a few more keystrokes.

Uh, ok, but as I don't even understand what this does, you can't expect
me to type it... I'll study the docs.


It doesn't do anything much different from what your statement did.
Only it uses a lexical filehandle, that you can use later just like
any other filehandle e.g. like

while (<$in>) { #...

and it separately specifies the open() mode, i.e. 'for reading', even
if this wouldn't be strictly necessary (but IMHO is a good practice).
Oh, and last but not least it prints a minimal but descriptive error
message if anything goes wrong.

Interesting & very useful, thanks.
Hmm, this is one of those cases in which I would like to have Perl6's
junctions ready. (Yes: I know there are already suitable modules,
etc.)

Alternatively you can use a regex, e.g.

/^(?:\.|\Q...\E|\?)$/

but I agree that it's better to whatch the second field. So a possible
solution can be


#!/usr/bin/perl -ln

use strict;
use warnings;

our @w;
my ($w,$c)=split;
print("@w$w"), @w=(), next
if $c eq 'LET()';
push @w, $w;

__END__


Please note that just as much as the similar code pasted in my
previous post, this is meant to be a minimal working example. Unless
you need only a quick hack in a realistic app you'd probably
explicitly write the loop and add bells and whistles...

I think I will, as I don't think my professor will believe I wrote this
myself...

Thanks!
H.
 
H

Hendrik Maryns

Arndt Jonasson schreef:
As someone already told you in the first response to your original
question, you need to chomp all lines you read:

while ($lijn!~/LET/){
push(@zinwoorden,$lijn);
chomp($lijn=<INFILE>);
} #enduntil

How stupid of me, of course!
I'm ashamed to have bothered you for such a triviality, sorry for that.
Thanks, H.
 
H

Hendrik Maryns

Michele Dondi schreef:
So this was homework...

Even more: part of the exam :-(

while generally hackers and experienced users
spot it at a glance, I don't think it's so bad to post questions about
it, provided that you point out so in order to allow people to reply
taking it into account.

Yes, you're right I should've mentioned, but I did try myself first and
was actually asking about something that didn't work. I really didn't
count on someone making it for me here. This was only a tiny bit of it.
Also, this depends on your professor's point of view, but I see
nothing wrong a priori in asking for help say here. Just make sure you
understand all of the suggestions you've been given and use this
knowledge, and anything else you may learn in the meantime to cook up
your own solution.

That's exactly what I'm doing! (Not that I understand all the
suggestions yet...)

Cheers, H.
 

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,817
Latest member
DicWeils

Latest Threads

Top