Why is this sub removing newlines??

R

Rainer Weikusat

Janek Schleicher said:
Am 06.12.2013 15:29, schrieb Rainer Weikusat:

So, you also prefer to write
s/\r?\n$// instead of oversimplifying chomp; ?

Since you apparently missed this: While there's doubtlessly many a
developer who is convinced to have invented something comparable to 'the
wheel', ie, a basic design which will remain in use for a few thousand
years, as soon as he managed to tack three lines of code together doing
something other than 'crash immediately', possibly even more so on CPAN,
using this comparison is either a case of hybris bordering serious
megalomania or just someone babbling along without spending much effort
on thinking about what he's actually saying, not the least because this
simile is actually wrong: Wheels come in many different kinds and even a
seriously reality-impaired mathead should have noticed the difference
between, say, tanks, push chairs, racing cars and pottery wheels. "But
can't you see they all round and rotate!" isn't much of a similarity at
the technical level.
 
C

Charles DeRykus

Sorry, more redemption is needed.

But I'm not quite ready to declare it unredeemable:

$string =~ s/ ^\s+ | \s+(?=\n)$ | \s*[^\n\S]+$ //gx;


[ depending on flavor of white space you want ]
 
R

Rainer Weikusat

Rainer Weikusat said:
Charles DeRykus said:
34:44 AM UTC-8, Ben Bacarisse wrote:

On Thursday, December 5, 2013 1:56:46 PM UTC-8, John Black wrote:>

keep this in mind - I had wanted that trim function to not strip the
newlines (and not add any either if there wasn't one). Should not be

Another option: a regex that'd handle any trailing newline:
$string =~ s/ ^\s+ | \s+(?=\n|)$ //gx;

Surely this strips the newline?

Indeed. I was slipping off the end... I think,hope a redemptive tweak will do it:
$string =~ s/ ^s+ | \s++(?=\n) /gx;

Sorry, more redemption is needed.

But I'm not quite ready to declare it unredeemable:

$string =~ s/ ^\s+ | \s+(?=\n)$ | \s*[^\n\S]+$ //gx;

I may be missing something here, but what about

s/\s+?(?=\n)?$//

One thinhg I missed was a trailing newline without other whitespace in
front of it. Making this

s/\s*?(?=\n)?$//;

instead works with that as well (although this should surely be called a
questionable construct, given the number of ?s ...).
 
J

John Black

I've been using \s as a shortcut for spaces or tabs.

See also perlrecharclass, look for [[:blank:]] and \h.

Thanks. Looks like what I really wanted in most cases was \h. [[:black:]] sounds like it
would work too but its just too bulky to put into regexs since it can be easily avoided with
\h.

John Black
 
C

Charles DeRykus

[remove whitespace at end of string but keep \n if it is there]
$string =~ s/ ^\s+ | \s+(?=\n)$ | \s*[^\n\S]+$ //gx;

[this also deal with whitespace at the beginning]
s/\s*?(?=\n)?$//;

Maybe logically simpler:

s/\s*?(\n)?$/$1/;

(this will likely a result in a warning when there's no newline at the
end of the line and runtime warnings are enabled).


Hm, as you note though, doesn't handle initial w/s and coughs an
'uninitialized" warning if no ending \n. However, you could take a cough
suppressant:

s{\s*?(\n)?$}{$1 // ''}e;


But, more significantly, doesn't handle multiple ending newlines, eg,
"foo \n\n\n"
[which of course may not be an issue for the OP]
 
R

Rainer Weikusat

Charles DeRykus said:
Rainer Weikusat said:
Charles DeRykus <[email protected]> writes:

[remove whitespace at end of string but keep \n if it is there]
$string =~ s/ ^\s+ | \s+(?=\n)$ | \s*[^\n\S]+$ //gx;

[this also deal with whitespace at the beginning]
s/\s*?(?=\n)?$//;

Maybe logically simpler:

s/\s*?(\n)?$/$1/;

(this will likely a result in a warning when there's no newline at the
end of the line and runtime warnings are enabled).


Hm, as you note though, doesn't handle initial w/s and coughs an
uninitialized" warning if no ending \n. However, you could take a
cough suppressant:

s{\s*?(\n)?$}{$1 // ''}e;

It wasn't supposed to handle initial whitespace because that's not
really related to the \n-issue (also true for the first) ...
But, more significantly, doesn't handle multiple ending newlines, eg,
"foo \n\n\n"
[which of course may not be an issue for the OP]

.... and it certainly wasn't supposed to do that, either: When processing
something line-by-line which I assumed to be the case here, "foo \n\n\n"
will be the three lines

"foo \n"
"\n"
"\n"

and assuming that handling "foo \n bla\n \n" should result in
"foo\n blah\n \n", ie the purpose is to remove leading whitespace at
the beginning of a multi-line text but not leading whitespace on the
individual lines seems rather bizarre to me. Or that processing
"a \n " should remove the \n given that newlines are not supposed to
be removed. And what about " a b \n bbb\n\n c \n"?
 
R

Rainer Weikusat

Ben Morrow said:
Quoth Rainer Weikusat said:
Rainer Weikusat said:
Charles DeRykus <[email protected]> writes:

[remove whitespace at end of string but keep \n if it is there]
$string =~ s/ ^\s+ | \s+(?=\n)$ | \s*[^\n\S]+$ //gx;

[this also deal with whitespace at the beginning]
s/\s*?(?=\n)?$//;

Maybe logically simpler:

s/\s*?(\n)?$/$1/;

If you're willing to rely on \s*? finding all the whitespace (it does,
because the 'start earlier in the string' rule trumps the 'match
minimally' rule, but IMHO it's confusing), you just need

s/\s*?$//

What's confusing here is that $ matches two different things depending
on the context: Apparently, if it is preceded by \s*?, it matches
immediately before \n at the end of the line and if that is \s*, it
matches after the \n. But that's certainly good to know.
 
C

Charles DeRykus

Charles DeRykus said:
[remove whitespace at end of string but keep \n if it is there]

$string =~ s/ ^\s+ | \s+(?=\n)$ | \s*[^\n\S]+$ //gx;
...

It wasn't supposed to handle initial whitespace because that's not
really related to the \n-issue (also true for the first) ...

Yes, I was over-generalizing. But, imo, a one-liner handles both goals
without being rocket science[1].

But, more significantly, doresn't handle multiple ending newlines, eg,
"foo \n\n\n"
[which of course may not be an issue for the OP]

... and it certainly wasn't supposed to do that, either: When processing
something line-by-line which I assumed to be the case here, "foo \n\n\n"
will be the three lines

That wasn't really specified though. The goal was to remove trailing
whitespace from the beginning and end of "strings" [rather than just
well-behaved lines] without clobbering a trailing newline.

"foo \n"
"\n"
"\n"
and assuming that handling "foo \n bla\n \n" should result in
"foo\n blah\n \n", ie the purpose is to remove leading whitespace at
the beginning of a multi-line text but not leading whitespace on the
individual lines seems rather bizarre to me. Or that processing
"a \n " should remove the \n given that newlines are not supposed to
be removed. And what about " a b \n bbb\n\n c \n"?

Yes, agreed. Without more certainty about original intent, it becomes
bizarre. But less bizarrely, a string might easily have multiple
newlines on the end with the reasonable goal of removing all but the
final one.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,093
Messages
2,570,607
Members
47,227
Latest member
bluerose1

Latest Threads

Top