Giving back

D

Default

Hi, I'm totally new to programming and ive been reading the ng for a week or so now.
The things ive read in here have been very helpfull.

I just finished this little program that fixes letter case, im not sure if its bulletproof yet.
Anyhow it was kind of a pain as I've never delt with RegEx before, so if this is usefull to anyone
please by all means have it.

If there are errors please let me know. Thanks everybody you all are very cool.

#!
use strict;
use warnings;
print "\n" . " " . "="x78 . "\n";
print " Reads a text file and capitalizes the first letter of each sentence.\n\n";
print "\tUSAGE:\t perl 15_1.plx <inputfile> <outputfile> [options]\n";
print "\tNOTE:\t wildcards are not allowed.\n";
print "\tOPTIONS: -l Lowercases everything before doing the capitalization.\n";
print "\t\t -d Display work being done on screen.\n";
print "\n\tEXAMPLE: perl 15_1.plx fixcase.txt casefixed.txt\n";
print " " . "="x78 . "\n\n";
our $lines = "";
our $options1 = "";
our $options2 = "";
our $inputfile = shift || die "\n";
our $outputfile = shift || die "\n";
$options1 = shift || $options1 eq "";
$options2 = shift || $options2 eq "";
$options1 =~ tr/A-Z/a-z/;
$options2 =~ tr/A-Z/a-z/;
open_files($inputfile, $outputfile);
while (our $line = <INPUT>)
{
$lines = "$lines $line";
}
print "$lines\n\n" if ($options1 eq "-d" || $options2 eq "-d");
$lines =~ s/(\w+)/\L$1/g if ($options1 eq "-l" || $options2 eq "-l");
print "$lines\n\n" if ($options1 eq "-d" || $options2 eq "-d");
$lines =~ s/(\w)/\u$1/;
print "$lines\n\n" if ($options1 eq "-d" || $options2 eq "-d");
$lines =~ s/([\.\?\!]+\s+)(\w+)/$1\u\L$2/g;
print "$lines\n\n" if ($options1 eq "-d" || $options2 eq "-d");
print OUTPUT $lines;
print "\nDone.\n";
close (INPUT) || die "Can't close $inputfile $!\n";
close (OUTPUT) || die "Cant't close $outputfile $!\n";
#Subroutines#
sub open_files
{
open(INPUT, $inputfile) || die "\nError:\nCould not open $inputfile $!\n";
if (-e $outputfile)
{
die "\nError:\nFile already exists.\t$outputfile\n";
}
else
{
open(OUTPUT,">$outputfile") || die "Could not create $outputfile $!\n";
}
}
 
T

Tad McClellan

[Please limit your line lengths to the conventional 70-72 characters]


I just finished this little program that fixes letter case, im not
sure if its bulletproof yet.


Using tr/// for letter case will never be bulletproof, it is not
the Right Tool for the job. It does not respect locales (perllocale.pod).

You should use uc()/lc() etc... instead, because they _do_
respect locales.

Anyhow it was kind of a pain as I've never delt with RegEx before,
so if this is usefull to anyone
please by all means have it.


If you had used the Right Tool, you wouldn't even have had to mess
with regexes you know. :)

print "\n" . " " . "="x78 . "\n";
print " Reads a text file and capitalizes the first letter of each sentence.\n\n";
print "\tUSAGE:\t perl 15_1.plx <inputfile> <outputfile> [options]\n";
print "\tNOTE:\t wildcards are not allowed.\n";
print "\tOPTIONS: -l Lowercases everything before doing the capitalization.\n";
print "\t\t -d Display work being done on screen.\n";
print "\n\tEXAMPLE: perl 15_1.plx fixcase.txt casefixed.txt\n";
print " " . "="x78 . "\n\n";


I like code that makes output to look like the output
that it is making, so I would replace all of those prints
with:

-------------------------
print "\n ", '='x78, "\n";
print <<ENDTEXT; # a "here document"
Reads a text file and capitalizes the first letter of each sentence.

USAGE:\t perl 15_1.plx <inputfile> <outputfile> [options]
NOTE:\t wildcards are not allowed.
OPTIONS: -l Lowercases everything before doing the capitalization.
-d Display work being done on screen.

EXAMPLE: perl 15_1.plx fixcase.txt casefixed.txt
ENDTEXT
print ' ', '='x78, "\n";
-------------------------

See the "Quote and Quote-like Operators" section in perlop.pod.

our $lines = "";
our $options1 = "";
our $options2 = "";


You should always prefer lexical (my) variables over dynamic (our)
variables, except when you can't.

And you can, so these should all be my() instead of our().

$options1 = shift || $options1 eq "";


I'm pretty sure that that does not do what you think it does.

What are you trying to accomplish there?

I think you might have been trying for:

$options1 = shift || "";

$options1 =~ tr/A-Z/a-z/;

my $options1 = lc(shift) || ''; # fetch arg and lower case it

open_files($inputfile, $outputfile);


Here you act like you are passing arguments to the function...

sub open_files
{
open(INPUT, $inputfile) || die "\nError:\nCould not open $inputfile $!\n";
if (-e $outputfile)


.... but the function uses global variables rather than its arguments.


sub open_files
{
my($inputfile, $outputfile) = @_; # copy the arguments

# same code here as you had before
 
T

Tore Aursand

print "\n" . " " . "="x78 . "\n";
print " Reads a text file and capitalizes the first letter of each sentence.\n\n";
print "\tUSAGE:\t perl 15_1.plx <inputfile> <outputfile> [options]\n";
print "\tNOTE:\t wildcards are not allowed.\n";
print "\tOPTIONS: -l Lowercases everything before doing the capitalization.\n";
print "\t\t -d Display work being done on screen.\n";
print "\n\tEXAMPLE: perl 15_1.plx fixcase.txt casefixed.txt\n";
print " " . "="x78 . "\n\n";

print() isn't very nice, so please don't feel free to use it this much,
although it's nothing wrong with the code above. I would rather have put
this text in a subroutine, and outputted it using only one (or just a few)
print() statements;

sub display_usage {
print qq|Your
text
here|;
}
our $lines = "";

Sure you want to use 'our' instead of 'my'? 'perldoc -f our' and 'perldoc
-f my' might be of help.
our $options1 = "";
our $options2 = "";
$options1 = shift || $options1 eq "";
$options2 = shift || $options2 eq "";

This could have been written like this:

my $options1 = shift || '';
my $options2 = shift || '';
$options1 =~ tr/A-Z/a-z/;
$options2 =~ tr/A-Z/a-z/;

'tr' _is_ nice, but don't use it to lowercase/uppercase strings, as it
doesn't take the locale in mind. uc() and lc() does that;

perldoc -f lc
perldoc -f uc
perldoc perllocale
while (our $line = <INPUT>)
{
$lines = "$lines $line";
}

Personally, I prefer this style;

while ( <INPUT> ) {
$lines .= ' ' . $_;
}

Could easily have been written on one line, too.
 
B

Ben Morrow

(e-mail address removed) (Tad McClellan) wrote in message
[Please limit your line lengths to the conventional 70-72 characters]

Really Tad,

[Please use a news reader from this century]..

Tad appears to use slrn, the latest release of which was made on
2003-08-25.
This convention arose from punchcards which had, if I recall, 80
columns, 8 of which were special? I'd like to believe we've progressed
a bit since then.

The convention arises from those of us who use 80x25 consoles or
terminal windows, to allow for a few levels of '> > ' before lines
wrap.

*PLONK*
 
A

Alan J. Flavell

(e-mail address removed) (Sara) wrote:
[Please use a news reader from this century]..
The convention arises from those of us who use 80x25 consoles or
terminal windows, to allow for a few levels of '> > ' before lines
wrap.

It doesn't *matter* where it comes from, or how relevant it is to
newly-arrived wannabe-newsreaders: it's part of the accepted
netiquette (and of the draft USEFOR interworking specification, the
closest thing we currently have to a grandson-of-RFC1036): it's a
courtesy to other users to follow that netiquette.

If "Sara" wants to draft great-grandson-of-RFC1036, in which posting
to usenet is done according to some other set of interworking rules,
she'd have every right to do so, or at least to try to - on the
appropriate forum or group (if it's on usenet, it'll have "news." in
its top level hierarchy).

Meantime, it's appropriate to follow the accepted conventions; not to
unilaterally try to impose one's own. I'd recommend a somewhat-
conservative application of the USEFOR draft.

You too? I had already quietly done that on reading Sara's posting,
but since you thought it worth discussing, I decided to chip in
anyway. Ho hum.

all the best

NOTE: The extreme irritation caused to other readers by such
violations is not to be underestimated; however, enforcement of
such rules is more a matter of sensible design or of social
pressure (whose effectiveness should not be underestimated, even
though it cannot be prescribed).

Wise words, indeed.
 
J

James Willmore

On 14 Nov 2003 06:49:22 -0800
(e-mail address removed) (Tad McClellan) wrote in message
[Please limit your line lengths to the conventional 70-72
characters]
Really Tad,

[Please use a news reader from this century]..

This convention arose from punchcards which had, if I recall, 80
columns, 8 of which were special? I'd like to believe we've
progressed a bit since then.

I know traintracks use their width as a legacy handed down from
Roman Wagons (at least that's the conventional wisdom), but let's
TRY to be a little more progressive in our industry?

I work on a S390 and use the ISPF editor to edit job cards. The ISPF
editor only shows 72 characters at a time and makes for really hard
reading when you have to use PF11 to get a lousy 8 characters.

In fact, COBOL (which has been around for a good many years and is
still in use today) *requires* that you not exceed 72 characters for
your code. Maybe this is where the 8 "special" characters comes from?
Or maybe it's because COBOL programmers don't want to keep using that
darn PF11 key on a regular basis to read what they coded? Maybe it's
a combination of both.

Bottom line is this - _we_ don't want to keep scrolling back and forth
to read a message and ask that when you post you limit each line to 72
characters.

To further show *why* we ask this, pull up a few posts off Google.
Some are nothing more than one *huge* line. If the person making the
post had followed the 72 character rule, then the post they made would
be more legiable.

But hey, what do I know, right? :)

--
Jim

Copyright notice: all code written by the author in this post is
released under the GPL. http://www.gnu.org/licenses/gpl.txt
for more information.

a fortune quote ...
Every improvement in communication makes the bore more terrible.
-- Frank Moore Colby
 
T

Tad McClellan

Sara said:
[Please limit your line lengths to the conventional 70-72 characters]

[Please use a news reader from this century]..


What makes you think that my newsreader is not from this century?

This convention arose from

[snip]


Doesn't matter where it came from, the fact remains that
it _is_ the convention.

If that is what folks like, then why not simply give it to them?

Be nice.

Socially unacceptable behavior may lead to the OP getting less
answers to future questions, and I didn't want that to happen.

I do not have that same conviction regarding your postings.

So long.
 
T

Tad McClellan

James Willmore said:
In fact, COBOL (which has been around for a good many years and is
still in use today) *requires* that you not exceed 72 characters for
your code. Maybe this is where the 8 "special" characters comes from?


No, it is so that it will still fit in an 80-column terminal after
being quoted a couple of times.
 
S

Sam Holden

[Please limit your line lengths to the conventional 70-72 characters]
.
.
.

Really Tad,

[Please use a news reader from this century]..

This convention arose from punchcards which had, if I recall, 80
columns, 8 of which were special? I'd like to believe we've progressed
a bit since then.

No, but a significant number of people use terminals which are around 80
characters wide (mine are 84, for example, since that allows two of them
to fit side by side on my small monitor with a readable font). The 70-72
comes from leaving room for a few levels of quoting.
I know traintracks use their width as a legacy handed down from Roman
Wagons (at least that's the conventional wisdom), but let's TRY to be
a little more progressive in our industry?

You can post so that your posts are hard to read for a large number of
people. After all they really don't mind not reading your posts.
 
J

James Willmore

In fact, COBOL (which has been around for a good many years and is
still in use today) *requires* that you not exceed 72 characters
for your code. Maybe this is where the 8 "special" characters
comes from?


No, it is so that it will still fit in an 80-column terminal after
being quoted a couple of times.[/QUOTE]

<ot>
Are you refering to columns 73-80 (inclusive) in COBOL? They are
optional in COBOL (now). ISPF will only do 72 columns in length. In
the days of keypunch cards, 73-80 is where the program name would go -
so if you droped the cards, you at least had a fighting chance to get
them back in some order. IBM mainframes (S390, 370, etc.) will only
do jobnames in 8 alpha-numeric characters, AFAIK. So, it only makes
sense that ISPF only shows columns 1-72 - especially since columns
73-80 are optional now and keypunch cards are a thing of the past.

I should have not even mentioned ISPF or COBOL at all. Sorry :-(
</ot>

Basically, I was trying to support the idea of having posters post in
the length of 72 columns. I can't stand scrolling back and forth to
read a message and it pisses me off when someone trys to justify
posting any old way they please.

Sorry for the confussion.

--
Jim

Copyright notice: all code written by the author in this post is
released under the GPL. http://www.gnu.org/licenses/gpl.txt
for more information.

a fortune quote ...
You might have mail
 
T

Tad McClellan

James Willmore said:
No, it is so that it will still fit in an 80-column terminal after
being quoted a couple of times.

<ot>
Are you refering to columns 73-80 (inclusive) in COBOL?[/QUOTE]


No, I'm referring the most common character width of terminals.

Basically, I was trying to support the idea of having posters post in
the length of 72 columns.


Yes, I know.

I can't stand scrolling back and forth to
read a message


Me either, so I don't read it at all (usually).

and it pisses me off when someone trys to justify
posting any old way they please.


Me too, so I killfile them (always):


% whiners
Score:: -9998
[snip about 40 addresses]
 
T

Trent Curry

Ben said:
(e-mail address removed) (Tad McClellan) wrote in message
[Please limit your line lengths to the conventional 70-72
characters]

Really Tad,

[Please use a news reader from this century]..

Tad appears to use slrn, the latest release of which was made on
2003-08-25.
This convention arose from punchcards which had, if I recall, 80
columns, 8 of which were special? I'd like to believe we've
progressed
a bit since then.

The convention arises from those of us who use 80x25 consoles or
terminal windows, to allow for a few levels of '> > ' before lines
wrap.

Just a little question, why do newer readers use > > > instead of >>> which
gives more room for quoted text? It seems to me more and more readers use
this for now instead of the spaced out version. Just wondering.

--
Trent Curry

perl -e
'($s=qq/e29716770256864702379602c6275605/)=~s!([0-9a-f]{2})!pack("h2",$1)!eg
;print(reverse("$s")."\n");'
 
G

Gunnar Hjalmarsson

Trent said:
Just a little question, why do newer readers use > > > instead of
more readers use this for now instead of the spaced out version.
Just wondering.

Is that so? In that case, fewer and fewer developers read and/or care
to comply with applicable standards, such as RFC2646. Sad!
 
A

Alan J. Flavell

Is that so? In that case, fewer and fewer developers read and/or care
to comply with applicable standards, such as RFC2646. Sad!

I'm getting increasingly impatient with myself for following-up this
off-topic thread, but ...

Automatic re-flowing of text is surely only permissible according to
this RFC when the item's content type has been suitably declared in
the headers? That means, first and foremost, conformance with MIME
specifications. See e.g the USEFOR draft, section 3.1.2.2, keeping in
mind of course that USEFOR is still just a draft of a best-practice
recommendation. But it's the nearest that we have to a viable spec,
since rfc1036 is positively geriatric but its successors never quite
made it to official status.

The people who were making allegations about other contributors'
software being outdated, were themselves posting in Stone-age formats
which pre-dated MIME, and thus their postings were were required to be
treated as literally text/plain US-ASCII in the meaning of RFC2046
(i.e format=fixed in the terms of RFC2646).

But I doubt that they are sufficiently well-informed to appreciate
how silly that makes them look. Now I really must stop doing this.

all the best
 
T

Trent Curry

Alan said:
I'm getting increasingly impatient with myself for following-up this
off-topic thread, but ...

Automatic re-flowing of text is surely only permissible according to
this RFC when the item's content type has been suitably declared in
the headers? That means, first and foremost, conformance with MIME
specifications. See e.g the USEFOR draft, section 3.1.2.2, keeping in
mind of course that USEFOR is still just a draft of a best-practice
recommendation. But it's the nearest that we have to a viable spec,
since rfc1036 is positively geriatric but its successors never quite
made it to official status.

The people who were making allegations about other contributors'
software being outdated, were themselves posting in Stone-age formats
which pre-dated MIME, and thus their postings were were required to be
treated as literally text/plain US-ASCII in the meaning of RFC2046
(i.e format=fixed in the terms of RFC2646).

But I doubt that they are sufficiently well-informed to appreciate
how silly that makes them look. Now I really must stop doing this.

all the best

Ok I understand this more nad more doesnt belong here, but I do recall
seeing once or twice a regular saying "anything having to do with usenet is
on topic" or ot that effect. I'll be quick anyways though. RFC aside,
compare the following:

[Example 1 - condensed way]

---------1---------2---------3---------4---------5---------6---------7--
Ok this is level 1 quoted text that can be a long line la la la la

Un quoted.

[Example 2 - standard way]

---------1---------2---------3---------4---------5---------6---------7--
Ok this is level 1 quoted text that can be a long line la la la la

Un quoted.

[End examples]

This should should my main point, that the condensed form allows more text
in a line in deep quoted text than the standard spaced-out version.

That's all I'm saying. I'm not saying we should eschew one for the other,
just making a point from a formatting perspective :)

--
Trent Curry

perl -e
'($s=qq/e29716770256864702379602c6275605/)=~s!([0-9a-f]{2})!pack("h2",$1)!eg
;print(reverse("$s")."\n");'
 
T

Trent Curry

Alan said:
I'm getting increasingly impatient with myself for following-up this
off-topic thread, but ...

Automatic re-flowing of text is surely only permissible according to
this RFC when the item's content type has been suitably declared in
the headers? That means, first and foremost, conformance with MIME
specifications. See e.g the USEFOR draft, section 3.1.2.2, keeping
in mind of course that USEFOR is still just a draft of a
best-practice recommendation. But it's the nearest that we have to
a viable spec, since rfc1036 is positively geriatric but its
successors never quite
made it to official status.

The people who were making allegations about other contributors'
software being outdated, were themselves posting in Stone-age formats
which pre-dated MIME, and thus their postings were were required to
be treated as literally text/plain US-ASCII in the meaning of RFC2046
(i.e format=fixed in the terms of RFC2646).

But I doubt that they are sufficiently well-informed to appreciate
how silly that makes them look. Now I really must stop doing this.

all the best


[Reposted becuase my quote fix reformatted my examples...]

Ok I understand this more nad more doesnt belong here, but I do recall
seeing once or twice a regular saying "anything having to do with
usenet is on topic" or ot that effect. I'll be quick anyways though.
RFC aside, compare the following:

[Example 1 - condensed way]

---------1---------2---------3---------4---------5---------6---------7--
Ok this is level 1 quoted text that can be a long line la la la la

Un quoted.

[Example 2 - standard way]

---------1---------2---------3---------4---------5---------6---------7--
Ok this is level 1 quoted text that can be a long line la la la la

Un quoted.

[End examples]

This should should my main point, that the condensed form allows more text
in a line in deep quoted text than the standard spaced-out version.

That's all I'm saying. I'm not saying we should eschew one for the other,
just making a point from a formatting perspective :)

--
Trent Curry

perl -e
'($s=qq/e29716770256864702379602c6275605/)=~s!([0-9a-f]{2})!pack("h2",$1)!eg
;print(reverse("$s")."\n");'
 
G

Gunnar Hjalmarsson

Purl said:
Instead of constantly bickering about how others post, which wastes
time of readers, why don't you boys behave as I do?

My behavior is highly logical and considerate of readers. I simply
reformat articles to be acceptable, to be neat and tidy, as I have
done with your article.

If top posting, I simple cut and paste "stuff" where it should be.
In the case of Randal, Uri, Abigail and others, I remove their
unacceptable quoting style and replace it with standard issue.

I do so, too. But I do think this group would be a better place if
there was no need to do it.
 
J

Jacob Heider

Alan J. Flavell wrote:

[Example 1 - condensed way]

---------1---------2---------3---------4---------5---------6---------7--
Ok this is level 1 quoted text that can be a long line la la la la

Un quoted.

[Example 2 - standard way]

---------1---------2---------3---------4---------5---------6---------7--
Ok this is level 1 quoted text that can be a long line la la la la

Un quoted.

[End examples]

This should should my main point, that the condensed form allows more text
in a line in deep quoted text than the standard spaced-out version.

That's all I'm saying. I'm not saying we should eschew one for the other,
just making a point from a formatting perspective :)

I suspect the main reason for the standard style over the condensed one is
that the rule for quoting then changes from:

prepend "> " to all lines from original message

which is simple to implement, to

if the first character of a message is ">", prepend a ">";
else, prepend a "> "

which is slightly more time an resource consuming to implement. The moral
is that sometimes investing a little more effort gets you something a
little more usable, I guess.

Jacob
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,818
Latest member
Brigette36

Latest Threads

Top