split inconsistency- why?

S

Sara

split /,/,'cat,mouse,,eel,fish';

yields
0 'cat'
1 'mouse'
2 ''
3 'eel'
4 'fish'

Fine. Likewise:

split /,/,',,,cat,,mouse,eel,fish';

yields
0 ''
1 ''
2 ''
3 'cat'
4 ''
5 'mouse'
6 'eel'
7 'fish'

Everybody is happy. But

split /,/,'cat,mouse,eel,fish,,,';

yields
0 'cat'
1 'mouse'
2 'eel'
3 'fish'

Huh? Where did the trailing items go?

I work around this inconsistency by adding in "placeholders" like:

s/,,/,#,/g; s/,,/,#,/g;

then do the split,then remove the #'s. What a treat. Thanks Larry!

But I can't help but wonder- what programming advantage does this
offer and why was split designed to ignore some split candidates such
as these trailing items? And why only omit trailing items, and not
leading? Was there some presumption made about leading ones being more
meaningful than trailing? Very odd presumption if so!

To the programmer, it would be easier to make split consistent, and in
those cases when the programmer doesn't want empty trailing items he
can easily prepare the scalar to get rid of them:

s/,+$//;

which is a lot easier than identifiying a unique placeholder-
inserting it, splitting, then removing it.

Perl is pretty much self-consistent, in fact this is one of very few
cases I've encountered which lacks consistency. I'd be interested
though in hearing the arguments on why this was a beneficial language
design choice?
 
T

Thomas Kratz

Sara said:
split /,/,'cat,mouse,,eel,fish';

yields
0 'cat'
1 'mouse'
2 ''
3 'eel'
4 'fish'

Fine. Likewise:

split /,/,',,,cat,,mouse,eel,fish';

yields
0 ''
1 ''
2 ''

one '' too many
3 'cat'
4 ''
5 'mouse'
6 'eel'
7 'fish'

Everybody is happy. But

split /,/,'cat,mouse,eel,fish,,,';

yields
0 'cat'
1 'mouse'
2 'eel'
3 'fish'

Huh? Where did the trailing items go?

perldoc -f split

Look for the description of LIMIT
I work around this inconsistency by adding in "placeholders" like:

s/,,/,#,/g; s/,,/,#,/g;

then do the split,then remove the #'s. What a treat. Thanks Larry!

That is uncalled for. You just have to read the docs.

[rest snipped]

Thomas
 
J

John W. Krahn

Sara said:
split /,/,'cat,mouse,,eel,fish';

yields
0 'cat'
1 'mouse'
2 ''
3 'eel'
4 'fish'

Fine. Likewise:

split /,/,',,,cat,,mouse,eel,fish';

yields
0 ''
1 ''
2 ''
3 'cat'
4 ''
5 'mouse'
6 'eel'
7 'fish'

Everybody is happy. But

split /,/,'cat,mouse,eel,fish,,,';

yields
0 'cat'
1 'mouse'
2 'eel'
3 'fish'

Huh? Where did the trailing items go?

If you had read the documentation for split you would see that that
is the defined behavior for zero, one or two argument versions of split.

perldoc -f split
split /PATTERN/,EXPR,LIMIT
split /PATTERN/,EXPR
split /PATTERN/
split Splits a string into a list of strings and returns that
list. By default, empty leading fields are preserved,
and empty trailing ones are deleted.


You will also see that there is a third argument "LIMIT" which will solve
your problem.



John
 
G

Gunnar Hjalmarsson

Sara said:
split /,/,'cat,mouse,eel,fish,,,';

yields
0 'cat'
1 'mouse'
2 'eel'
3 'fish'

Huh? Where did the trailing items go?

Inconsistency or not, it's documented in the first para in

perldoc -f split
I work around this inconsistency by adding in "placeholders" like:

s/,,/,#,/g; s/,,/,#,/g;

then do the split,then remove the #'s. What a treat. Thanks Larry!

Use grep() to exclude any empty fields:

grep length, split /,/,',,,cat,,mouse,eel,fish';
To the programmer, it would be easier to make split consistent,

Are you suggesting a change of this well documented behaviour? Seems
not advisable to me.
 
J

Jeff 'japhy' Pinyan

But I can't help but wonder- what programming advantage does this
offer and why was split designed to ignore some split candidates such
as these trailing items? And why only omit trailing items, and not
leading? Was there some presumption made about leading ones being more
meaningful than trailing? Very odd presumption if so! [snip]
Perl is pretty much self-consistent, in fact this is one of very few
cases I've encountered which lacks consistency. I'd be interested
though in hearing the arguments on why this was a beneficial language
design choice?

You seem to be very argumentative, or at least passionate, about this
issue, but as has been explained to you, you did not research the problem
at all. The answer was simply in the documentation of the function you
are using. Please try not to be so inflammatory in the future.

--
Jeff "japhy" Pinyan % How can we ever be the sold short or
RPI Acacia Brother #734 % the cheated, we who for every service
RPI Corporation Secretary % have long ago been overpaid?
http://japhy.perlmonk.org/ %
http://www.perlmonks.org/ % -- Meister Eckhart
 
T

Tad McClellan

Sara said:
split /,/,'cat,mouse,,eel,fish';

Huh? Where did the trailing items go?


Right where the contract (documentation) says they will go.

Why are you acting so surprised?

I work around this inconsistency by adding in "placeholders" like:

s/,,/,#,/g; s/,,/,#,/g;

then do the split,then remove the #'s. What a treat. Thanks Larry!


If you sign a contract without reading it, you deserve any pain
that you get.

The corollary then is to read it before you sign it
(ie. read the docs for a function before you use the function)
 
S

Sara

John W. Krahn said:
If you had read the documentation for split you would see that that
is the defined behavior for zero, one or two argument versions of split.

perldoc -f split
split /PATTERN/,EXPR,LIMIT
split /PATTERN/,EXPR
split /PATTERN/
split Splits a string into a list of strings and returns that
list. By default, empty leading fields are preserved,
and empty trailing ones are deleted.


You will also see that there is a third argument "LIMIT" which will solve
your problem.



John

Another incorrect presumption. I did read the documentation. I'm
talking about the choice of the default operation of split. I'm not
suggesting that it didn't work with additional arguments (which I
predect few if any of my programmers would recognize). I'm suggesting
it was a poor language design choice, and asking what advantage(s)
that choice has for the programmer. An entirely different question.

Sometimes the discussions here are not unlike the fundies argument
against evolution. You miss the whole point, because you believe Perl
is a divine doctrine that can't be questioned. It shouldn't upset you
because someone questions a language design choice.
 
S

Sara

Tad McClellan said:
Right where the contract (documentation) says they will go.

Why are you acting so surprised?




If you sign a contract without reading it, you deserve any pain
that you get.

The corollary then is to read it before you sign it
(ie. read the docs for a function before you use the function)

Contract huh? With my legal-eagle hat on, a contract requires a
"meeting of the minds". I never agreed to this design choice, so there
is no contract, at least not with me.

I have yet to hear any valid reasons why this was a good design
choice. Just a lot of the usual robotic "read the docs" jabber. Tell
ya what-

"read the post"
 
U

Uri Guttman

S> I have yet to hear any valid reasons why this was a good design
S> choice. Just a lot of the usual robotic "read the docs" jabber. Tell
S> ya what-

learn awk. split's default behavior follows awk's.

and awk was a major influence on perl's design (hashes, -p loop, scalar
range, etc.)

uri
 
T

Tad McClellan

Sara said:
Contract huh? With my legal-eagle hat on, a contract requires a
"meeting of the minds".


If you agree to call the function, then there _has_ been a
meeting of the minds.

If you didn't want done what the function does, then you
wouldn't call that function.

I never agreed to this design choice,


When you call the function, you are agreeing that it will do what
its docs say it will do.

so there
is no contract, at least not with me.

I have yet to hear any valid reasons why this was a good design
choice.


Pass a -1 as a 3rd argument, and that design choice will become moot.

Just a lot of the usual robotic "read the docs" jabber.


The docs tell you how to disable the design choice that
you object to. Simply disable it and move on, no biggie.

Tell
ya what-

"read the post"


Tell you what, meet the killfile.
 
P

Peter Hickman

Sara said:
Everybody is happy. But

split /,/,'cat,mouse,eel,fish,,,';

yields
0 'cat'
1 'mouse'
2 'eel'
3 'fish'

Huh? Where did the trailing items go?

If you had read the documentation as you have claimed then why are you asking
where the trailing items went? You would know, are you trolling?
I work around this inconsistency by adding in "placeholders" like:

s/,,/,#,/g; s/,,/,#,/g;

then do the split,then remove the #'s. What a treat. Thanks Larry!

Larry created Perl just to piss you off.
But I can't help but wonder- what programming advantage does this
offer and why was split designed to ignore some split candidates such
as these trailing items? And why only omit trailing items, and not
leading? Was there some presumption made about leading ones being more
meaningful than trailing? Very odd presumption if so!

Because if you are processing several lines from comma separated file for
example then trailing blank cells are of no interest but the leading blank cells
are because they help line up data with the previous line. Gosh that was hard.
To the programmer, it would be easier to make split consistent, and in
those cases when the programmer doesn't want empty trailing items he
can easily prepare the scalar to get rid of them:

If most people require 'keep the leading, lose the trailing' then it would be
better if this were the default behaviour rather than everyone having to write
two lines of code where they could write one. Only academics think the
programmer should be a slave to the language, Larry likes to make the language
work for the programmer.

By the way you are the first 'programmer' (I give you the benefit of the doubt
here) that has had issue with the behaviour of split. They rest of us find it
very useful.
Perl is pretty much self-consistent, in fact this is one of very few
cases I've encountered which lacks consistency. I'd be interested
though in hearing the arguments on why this was a beneficial language
design choice?

It is realy realy useful. Boy are you going to have fun with OOPerl.
 
G

Glenn Jackman

At 2004-08-10 11:06AM said:
S> I have yet to hear any valid reasons why this was a good design
S> choice. Just a lot of the usual robotic "read the docs" jabber. Tell
S> ya what-

learn awk. split's default behavior follows awk's.

Does it? a2p begs to differ:

echo '
BEGIN {
s="this,is,a,comma,delimited,string,,"
split(s,a,/,/)
for (n in a) {print n ": " a[n]}
exit
}
' | a2p -

produces:
....
$S = 'this,is,a,comma,delimited,string,,';
@a = split(/,/, $S, 9999);
foreach $n ($[ .. $#a) {
print $n . ': ' . $a[$n];
}

Not Perl's default split behaviour.
 
C

ctcgag

....

Another incorrect presumption. I did read the documentation.

Bullshit. Had you read the documentation, you would not be asking where
the trailing items went, you would know where they went. Furthermore, you
would have used the proper solution (the limit argument) rather than that
god-aweful hack you included in your first post.
I'm
talking about the choice of the default operation of split.

No, you lying through your teeth in an effort to save face.

Xho
 
B

Brian McCauley

Bullshit. Had you read the documentation, you would not be asking where
the trailing items went, you would know where they went. Furthermore, you
would have used the proper solution (the limit argument) rather than that
god-aweful hack you included in your first post.

That's rather irrefutable isn't it?

By all means question the reasoning behind design decision. I happen to
agree that the default operation of split is counterintuative and not
what I'd have chosen.

Do not claim to have read the documentation when you did not know how to
select the alternate behaviour.

If you make a dumb mistake because you didn't read the documentation
properly people here will deride you. Join in, play along, deride
yourself. Do not attempt to defend you error or deny that you made the
mistake. This will only increase the derision.
 
U

Uri Guttman

S> I have yet to hear any valid reasons why this was a good design
S> choice. Just a lot of the usual robotic "read the docs" jabber. Tell
S> ya what-
GJ> Does it? a2p begs to differ:

GJ> echo '
GJ> BEGIN {
GJ> s="this,is,a,comma,delimited,string,,"
GJ> split(s,a,/,/)
GJ> for (n in a) {print n ": " a[n]}
GJ> exit
GJ> }
GJ> ' | a2p -

GJ> produces:
GJ> ...
GJ> $S = 'this,is,a,comma,delimited,string,,';
GJ> @a = split(/,/, $S, 9999);
GJ> foreach $n ($[ .. $#a) {
GJ> print $n . ': ' . $a[$n];
GJ> }

GJ> Not Perl's default split behaviour.

you seem to be right there. my serious awk days are many moons ago. but
i did test awk (gawk really) just now and it does trim leading fields
which is what split ' ' does too. perldoc -f split does say that
split ' ' emulates awk and deletes leading empty fields.

in any case the OP wanted to know why the default is to delete trailing
empty fields and few addressed the why and most said rtfm. so there was
a major communucation breakdown here and moronzilla wasn't involved for
a change. the docs don't address the why of this (other than the ' '
emulation of awk).

uri
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was sent to
Tad McClellan
If you agree to call the function, then there _has_ been a
meeting of the minds.

If you didn't want done what the function does, then you
wouldn't call that function.

I think what you say does not make a lot of sense. It assumes that
the documentation is correct and complete, and one is able to grok
what the documentation says. Neither is true in the case of split().

During one phase of cleanup of Perl REx' semantic I managed to remove
about a dozen different quirks of split(). The remaining ones could
not be removed without a major change of semantic.

I tried to fix them nevertheless (by introduction of a pragma which
would make split() documentable and grokable), but this patch was not
accepted by the pumpking of the time... Moreover, later it turned
out that I missed several more quirks...

Hope this helps,
Ilya
 
A

Alan J. Flavell

I think what you say does not make a lot of sense.

That depends on whether you are trying to design the language, or to
use it.
During one phase of cleanup of Perl REx' semantic I managed to remove
about a dozen different quirks of split(). The remaining ones could
not be removed without a major change of semantic.

Nevertheless, if the function is doing what it currently says on the
tin, then any considerations of future cleanups are ultra vires, as
far as a /user/ of the language is concerned.

No matter that you /could/ see a way to make it better/simpler/
more rational in the future. With which I wouldn't presume to argue;
but meantime, the users have what they're offered, with its relevant
documentation, no?
 
Y

Yechie Labay

guys, cut the bullshit off, if you dont wan to use Perl, dont, just dont nag
the others who do... it looks like your understanding in design in general
very poor, remember, having write some scripts does not make you designer...
 
J

Joe Smith

Sara said:
I'm talking about the choice of the default operation of split. I'm not
suggesting that it didn't work with additional arguments (which I
predect few if any of my programmers would recognize). I'm suggesting
it was a poor language design choice, and asking what advantage(s)
that choice has for the programmer. An entirely different question.

You've got to remember that Perl was designed to make life easier for
programmers processing data files. The documented behavior of split()
is very handy, especially when writing one-liners. Perl was not
designed to be a strict, straight-jacket type of language.

-Joe
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was sent to
Alan J. Flavell
That depends on whether you are trying to design the language, or to
use it.

Do not think so: split() is not very usable if what you want is that
it does *exactly* what you want.
Nevertheless, if the function is doing what it currently says on the
tin, then any considerations of future cleanups are ultra vires, as
far as a /user/ of the language is concerned.

Do you claim that you, as a /user/, know exactly what split() will do
in each situation?

Do you claim that you, as a /user/, will be able to extract from the
docs (without experiments!) what split() will do in a particular
situation?

All I wanted was a way to *force* split() to behave without any quirk,
so that this particular mode of operation may be explained in one
short paragraph. If you want 'diagonality', it may be much easier to
implement it in your code, than to understand/guess which quirks
happen in which situation.
No matter that you /could/ see a way to make it better/simpler/
more rational in the future. With which I wouldn't presume to argue;
but meantime, the users have what they're offered, with its relevant
documentation, no?

No. As I said, at some moment I updated the docs so that they
reflected my "current" understanding of the behaviour of split().
Unfortunately, even afer reading/cleaninig the source code, I could
not recognize all the quirks (boundary conditions!); so the
documentation is not fully "relevant".

Hope this helps,
Ilya
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,145
Messages
2,570,826
Members
47,371
Latest member
Brkaa

Latest Threads

Top