Best way to check that you are at the beginning (the end) of an iterable?

L

Laurent

Hi there,

What is the simplest way to check that you are at the beginning or at the end of an iterable? I'm using enumerate with Python 3.2 (see below) but I'm wondering if there would be a better way.


l = ['a', 'b', 'a', 'c']

for pos, i in enumerate(l):
if pos == 0:
print("head =", i)
else:
print(i)


I know that Python is not exactly a functional language but wouldn't something like "ishead()" or "istail()" be useful?
 
C

Cameron Simpson

| What is the simplest way to check that you are at the beginning or at
| the end of an iterable? I'm using enumerate with Python 3.2 (see below)
| but I'm wondering if there would be a better way.
|
| l = ['a', 'b', 'a', 'c']
|
| for pos, i in enumerate(l):
| if pos == 0:
| print("head =", i)
| else:
| print(i)
|
| I know that Python is not exactly a functional language but wouldn't
| something like "ishead()" or "istail()" be useful?

There are a few reasons these do not exist out of the box (quite aside
from how easy it is to do on the occasions you actually want it).
Tackling ishead and istail in turn...

The "ishead()" would need to be a top level function (like "len()")
because if it were an iterator method, every iterator would need to have
it implemented; currently the number of methods needed to roll your own
iterator is just two (iter and next). ishead() could be done as a top
level function, though it would need the storage cost of an additional
state value to every iterator (i.e. a "first" boolean or equivalent). So
you'd be proposing more memory cost and possibly a retrospective code
change for all the existing planetwide code, for a minor convenient. As
you note, enumerate gets you a pos value, and it is easy enough to write
a for loop like this:

first = True
for i in iterable_thing:
if first:
print "head =", i
else:
print i
first = False

Your istail() is much worse.

A generator would need to do lookahead to answer istail() in the general
case. Consider iterating over the lines in a file, or better still the
lines coming from a pipeline. Or iteraing over packets received on a
network connection. You can't answer "istail()" there until you have
seen the next line/packet (or EOF/connection close). And that may be an
arbitrary amount of time in the future. You're going to stall your whole
program for such a question?

You can do this easily enough for yourself as an itertools-like thing:
write a wrapper generator that answers ishead() and istail() for
arbitrary iterators. Completely untested example code:

class BoundSensitiveIterator(object):
def __init__(self, subiter):
self.sofar = 0
self.subiter = subiter
self.pending = ()
def iter(self):
return self
def next(self):
self.sofar += 1
if self.pending is None:
raise StopIteration
if self.pending:
nxt = self.pending[0]
self.pending = ()
return nxt
return self.subiter.next()
def ishead(self):
# maybe <= 1, depending on what you want it to mean
return self.sofar == 1
def istail(self):
if self.pending is None:
return True
if self.pending:
return False
try:
nxt = self.subiter.next()
except StopIteration:
self.pending = None
return True
else:
self.pending = (nxt,)
return False

I = BoundSensitiveIterator(other_iterable)
for n in I:
print n, "ishead =", I.ishead(), "istail =", I.istail()

You can see it adds some performance and storage overhead, and of course
may stall if you every ask istail() of an "on demand" iterable.

About the only time I do this is my personal "the()" convenience
function:

def the(list, context=None):
''' Returns the first element of an iterable, but requires there to be
exactly one.
'''
icontext="expected exactly one value"
if context is not None:
icontext=icontext+" for "+context

first=True
for elem in list:
if first:
it=elem
first=False
else:
raise IndexError, "%s: got more than one element (%s, %s, ...)" \
% (icontext, it, elem)

if first:
raise IndexError, "%s: got no elements" % icontext

return it

Which I use as a definite article in places where an iterable _should_
yield exactly one result (eg SQL SELECTs that _ought_ to get exactly
one hit). I can see I wrote that a long time ago - it could do with some
style fixes. And a code scan shows it sees little use:)

Cheers,
--
Cameron Simpson <[email protected]> DoD#743
http://www.cskk.ezoshosting.com/cs/

Electronic cardboard blurs the line between printed objects and the virtual
world. - overhead by WIRED at the Intelligent Printing conference Oct2006
 
L

Laurent

I totally understand the performance issue that an hypothetical "istail" would bring, even if I think it would just be the programmer's responsibility not to use it when it's not certain that an end can be detected.
But I don't see why *adding* something like "ishead" would be so bad (at worse by using a boolean somewhere as you mentioned).

Anyway I was just asking if there is something better than enumerate. So the answer is no? The fact that I have to create a tuple with an incrementing integer for something as simple as checking that I'm at the head just sounds awfully unpythonic to me.
 
L

Laurent

I totally understand the performance issue that an hypothetical "istail" would bring, even if I think it would just be the programmer's responsibility not to use it when it's not certain that an end can be detected.
But I don't see why *adding* something like "ishead" would be so bad (at worse by using a boolean somewhere as you mentioned).

Anyway I was just asking if there is something better than enumerate. So the answer is no? The fact that I have to create a tuple with an incrementing integer for something as simple as checking that I'm at the head just sounds awfully unpythonic to me.
 
T

Tim Chase

Anyway I was just asking if there is something better than
enumerate. So the answer is no? The fact that I have to create
a tuple with an incrementing integer for something as simple
as checking that I'm at the head just sounds awfully
unpythonic to me.

I've made various generators that are roughly (modulo
edge-condition & error checking) something like

def with_prev(it):
prev = None
for i in it:
yield prev, i
i = prev

def with_next(it):
prev = it.next()
for i in it:
yield prev, i
prev = i
yield prev, None

which can then be used something like your original

for cur, next in with_next(iterable):
if next is None:
do_something_with_last(cur)
else:
do_regular_stuff_with_non_last(cur)

for prev, cur in with_prev(iterable):
if prev is None:
do_something_with_first(cur)
else:
do_something_with_others(cur)

If your iterable can return None, you could create a custom
object to signal the non-condition:

NO_ITEM = object()

and then use NO_ITEM in place of "None" in the above code.

-tkc
 
C

Cameron Simpson

| I totally understand the performance issue that an hypothetical
| "istail" would bring, even if I think it would just be the programmer's
| responsibility not to use it when it's not certain that an end can
| be detected.

The trouble with these things is that their presence leads to stallable
code, often in libraries. Let the programmer write code dependent on
istail() without thinking of the stall case (or even the gratuitous
execution case, as in a generator with side effects in calling .next())
and have that buried in a utilities function.

Facilities like feof() in C and eof in Pascal already lead to lots of
code that runs happily with flat files and behaves badly in interactive
or piped input. It is _so_ easy to adopt a style like:

while not eof(filehandle):
line = filehandle.nextline()
...

that is it often thought that having offered the eof() function is a
design error. (Of course in the example above the usual python idiom
would win out from existing habit, but there are plenty of other
situations where is would just be _easy_ to rely of istail() in whatever
form.)

| But I don't see why *adding* something like "ishead" would be so bad
| (at worse by using a boolean somewhere as you mentioned).

It is not awful, but as remarked:
- extra storage cost to _every_ iterable, for a rarely used facility
- extra runtime cost to maintain the state
- _retroactive_ burden on _every_ iterator implementation presently
existing; every iterator sudden needs to implement and offer this
extra facility to be generate purpose use
- it is easy to provide the facility on the rare occasions when it is
needed

Personally, I think point 3 above is the killer and 1 and 2 are serious
counter arguments.

| Anyway I was just asking if there is something better than enumerate. So
| the answer is no? The fact that I have to create a tuple with an
| incrementing integer for something as simple as checking that I'm at
| the head just sounds awfully unpythonic to me.

You can just use a boolean if you like. I have plent of loops like:

first = true
for i in iterable:
if first:
blah ...
...
first = False

Cheap and easy. Cheers,
 
M

Miki Tebeka

I guess enumerate is the best way to check for first argument. Note that if someone passes you the iterator as argument you have now way of checking if the consumed items from it.

istail can be implemented using itertools.chain, see https://gist.github.com/1202260
 
S

Steven D'Aprano

Laurent said:
Hi there,

What is the simplest way to check that you are at the beginning or at the
end of an iterable?


I don't think this question is meaningful. There are basically two
fundamental types of iterables, sequences and iterators.

Sequences have random access and a length, so if the "start" and "end" of
the sequence is important to you, just use indexing:

beginning = sequence[0]
end = sequence[-1]
for i, x in enumerate(sequence):
if i == 0: print("at the beginning")
elif i == len(sequence)-1: print("at the end")
print(x)


Iterators don't have random access, and in general they don't have a
beginning or an end. There may not be any internal sequence to speak of:
the iterator might be getting data from a hardware device that provides
values continuously, or some other series of values without a well-defined
beginning or end. Example:

def time():
from time import asctime
while True:
yield asctime()

it = time()

What would it even mean to say that I am at the beginning or end of it?

Iterators have no memory, so in one sense you are *always* at the beginning
of the iterator: next() always returns the next item, and the previous item
is lost forever. So the answer to the question "Am I at the beginning of an
iterator?" is always "You are now".

For sequences, the question is best handled differently. For iterators, the
question doesn't make sense in general. If you need an iterator that can
report its internal state, write your own:

import random, time
class MyIter(object):
def __init__(self):
self.start = True
self.end = False
def __next__(self):
if self.start:
self.start = False
if self.end:
raise StopIteration
if random.random() < 0.01:
self.end = True
return time.asctime()
def __iter__(self):
return self
 
L

Laurent

Yes of course the use of a boolean variable is obvious but I'm mixing python code with html using Mako templates. In Mako for code readability reasonsI try to stick to simple "for" and "if" constructions, and I try to avoid variables declarations inside the html, that's all. Thanks anyway.
 
L

Laurent

Yes of course the use of a boolean variable is obvious but I'm mixing python code with html using Mako templates. In Mako for code readability reasonsI try to stick to simple "for" and "if" constructions, and I try to avoid variables declarations inside the html, that's all. Thanks anyway.
 
L

Laurent

I don't think this question is meaningful. There are basically two
fundamental types of iterables, sequences and iterators.

Sequences have random access and a length, so if the "start" and "end" of
the sequence is important to you, just use indexing:

beginning = sequence[0]
end = sequence[-1]
for i, x in enumerate(sequence):
if i == 0: print("at the beginning")
elif i == len(sequence)-1: print("at the end")
print(x)


Iterators don't have random access, and in general they don't have a
beginning or an end. There may not be any internal sequence to speak of:
the iterator might be getting data from a hardware device that provides
values continuously, or some other series of values without a well-defined
beginning or end.

Maybe I should have said "best way to check that you didn't start the iteration process yet" but you see what I mean.

Well I guess I have to unlearn my bad lisp/scheme habits...
 
L

Laurent

Yes, I was just hoping for something already included that I wouldn't know (i'm new to Python).
 
T

Terry Reedy

| I totally understand the performance issue that an hypothetical
| "istail" would bring, even if I think it would just be the programmer's
| responsibility not to use it when it's not certain that an end can
| be detected.

The trouble with these things is that their presence leads to stallable
code, often in libraries. Let the programmer write code dependent on
istail() without thinking of the stall case (or even the gratuitous
execution case, as in a generator with side effects in calling .next())
and have that buried in a utilities function.

Facilities like feof() in C and eof in Pascal already lead to lots of
code that runs happily with flat files and behaves badly in interactive
or piped input. It is _so_ easy to adopt a style like:

while not eof(filehandle):
line = filehandle.nextline()
...

that is it often thought that having offered the eof() function is a
design error. (Of course in the example above the usual python idiom
would win out from existing habit, but there are plenty of other
situations where is would just be _easy_ to rely of istail() in whatever
form.)

| But I don't see why *adding* something like "ishead" would be so bad
| (at worse by using a boolean somewhere as you mentioned).

It is not awful, but as remarked:
- extra storage cost to _every_ iterable, for a rarely used facility
- extra runtime cost to maintain the state
- _retroactive_ burden on _every_ iterator implementation presently
existing; every iterator sudden needs to implement and offer this
extra facility to be generate purpose use
- it is easy to provide the facility on the rare occasions when it is
needed

Personally, I think point 3 above is the killer and 1 and 2 are serious
counter arguments.

The iterator protocol is intentionally as simple as sensibly possible.
| Anyway I was just asking if there is something better than enumerate. So
| the answer is no? The fact that I have to create a tuple with an
| incrementing integer for something as simple as checking that I'm at
| the head just sounds awfully unpythonic to me.

You can just use a boolean if you like. I have plent of loops like:

first = true
for i in iterable:
if first:
blah ...
...
first = False

Cheap and easy. Cheers,

Or grab and process the first item separately from the rest.

it = iter(iterable)
try:
first = next(it)
<process first item>
except StopIteration:
raise ValueError("Empty iterable not allowed")
for i in it:
<process non-first item>
 
T

Terry Reedy

I don't think this question is meaningful. There are basically two
fundamental types of iterables, sequences and iterators.

And non-sequence iterables like set and dict.
Sequences have random access and a length, so if the "start" and "end" of
the sequence is important to you, just use indexing:

beginning = sequence[0]
end = sequence[-1]
for i, x in enumerate(sequence):
if i == 0: print("at the beginning")
elif i == len(sequence)-1: print("at the end")
print(x)

And finite non-sequences can be turned into sequences with list(iterable).
 
C

Chris Rebert

I guess enumerate is the best way to check for first argument. Note that if someone passes you the iterator as argument you have now way of checking if the consumed items from it.

istail can be implemented using itertools.chain, see https://gist.github.com/1202260

For the archives, if Gist ever goes down:

from itertools import chain

def istail(it):
'''Check if iterator has one more element. Return True/False and
iterator.'''
try:
i = next(it)
except StopIteration:
return False, it

try:
j = next(it)
return False, chain([i, j], it)
except StopIteration:
return True, chain(, it)


t, it = istail(iter([]))
print t, list(it)
t, it = istail(iter([1]))
print t, list(it)
t, it = istail(iter([1, 2]))
print t, list(it)
 
C

Chris Torek

Facilities like feof() in C and eof in Pascal already lead to lots of
code that runs happily with flat files and behaves badly in interactive
or piped input. It is _so_ easy to adopt a style like:

while not eof(filehandle):
line = filehandle.nextline()
...

Minor but important point here: eof() in Pascal is predictive (uses
a "crystal ball" to peer into the future to see whether EOF is is
about to occur -- which really means, reads ahead, causing that
interactivity problem you mentioned), but feof() in C is "post-dictive".
The feof(stream) function returns a false value if the stream has
not yet encountered an EOF, but your very next attempt to read from
it may (or may not) immediately encounter that EOF.

Thus, feof() in C is sort of (but not really) useless. (The actual
use cases are to distinguish between "EOF" and "error" after a
failed read from a stream -- since C lacks exceptions, getc() just
returns EOF to indicate "failed to get a character due to end of
file or error" -- or in some more obscure cases, such as the
nonstandard getw(), to distinguish between a valid -1 value and
having encountered an EOF. The companion ferror() function tells
you whether an earlier EOF value was due to an error.)
 
C

Cameron Simpson

| In article <[email protected]>
| >Facilities like feof() in C and eof in Pascal already lead to lots of
| >code that runs happily with flat files and behaves badly in interactive
| >or piped input. It is _so_ easy to adopt a style like:
| >
| > while not eof(filehandle):
| > line = filehandle.nextline()
| > ...
|
| Minor but important point here: eof() in Pascal is predictive (uses
| a "crystal ball" to peer into the future to see whether EOF is is
| about to occur -- which really means, reads ahead, causing that
| interactivity problem you mentioned), but feof() in C is "post-dictive".
| The feof(stream) function returns a false value if the stream has
| not yet encountered an EOF, but your very next attempt to read from
| it may (or may not) immediately encounter that EOF.

Thanks. I had forgotten this nuance. Cheers,
--
Cameron Simpson <[email protected]> DoD#743
http://www.cskk.ezoshosting.com/cs/

"Where am I?"
"In the Village."
"What do you want?"
"Information."
"Whose side are you on?"
"That would be telling. We want information. Information. Information!"
"You won't get it!"
"By hook or by crook, we will."
"Who are you?"
"The new number 2."
"Who is number 1?"
"You are number 6."
"I am not a number, I am a free man!"
[Laughter]
 
P

Peter Otten

Cameron said:
About the only time I do this is my personal "the()" convenience
function:

def the(list, context=None):
''' Returns the first element of an iterable, but requires there to be
exactly one.
'''
icontext="expected exactly one value"
if context is not None:
icontext=icontext+" for "+context

first=True
for elem in list:
if first:
it=elem
first=False
else:
raise IndexError, "%s: got more than one element (%s, %s, ...)" \
% (icontext, it, elem)

if first:
raise IndexError, "%s: got no elements" % icontext

return it

Which I use as a definite article in places where an iterable should
yield exactly one result (eg SQL SELECTs that ought to get exactly
one hit). I can see I wrote that a long time ago - it could do with some
style fixes. And a code scan shows it sees little use:)

A lightweight alternative to that is unpacking:
Traceback (most recent call last):
File said:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: too many values to unpack
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,226
Members
46,815
Latest member
treekmostly22

Latest Threads

Top