generate tuples from sequence

W

Will McGugan

Hi,

I'd like a generator that takes a sequence and yields tuples containing
n items of the sqeuence, but ignoring the 'odd' items. For example

take_group(range(9), 3) -> (0,1,2) (3,4,5) (6,7,8)

This is what I came up with..

def take_group(gen, count):
i=iter(gen)
while True:
yield tuple([i.next() for _ in xrange(count)])

Is this the most efficient solution?

Regards,

Will McGugan
 
W

Will McGugan

Will said:
Hi,

I'd like a generator that takes a sequence and yields tuples containing
n items of the sqeuence, but ignoring the 'odd' items. For example

Forgot to add, for my purposes I will always have a sequence with a
multiple of n items.


Will
 
N

Neil Cerutti

Hi,

I'd like a generator that takes a sequence and yields tuples containing
n items of the sqeuence, but ignoring the 'odd' items. For example

take_group(range(9), 3) -> (0,1,2) (3,4,5) (6,7,8)

This is what I came up with..

def take_group(gen, count):
i=iter(gen)
while True:
yield tuple([i.next() for _ in xrange(count)])

Is this the most efficient solution?

This is starting to seem like an FAQ. ;)

The Python library contains a recipe for this in the itertools
recipes in the documentation (5.16.3).

def grouper(n, iterable, padvalue=None):
"grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
return izip(*[chain(iterable, repeat(padvalue, n-1))]*n)

It's more general and cryptic than what you asked for, though.
 
P

Peter Otten

Alan said:
Peter Otten said:
I like
items = range(9)
N = 3
zip(*[iter(items)]*N)
[(0, 1, 2), (3, 4, 5), (6, 7, 8)]

Except that it is considered implementation dependent:
http://mail.python.org/pipermail/python-list/2005-February/307550.html

Use itertools.izip() then instead of zip(). I think that the occurence of
this idiom on the examples page counts as a blessing :)

But still, to help my lack of fantasy -- what would a sane zip()
implementation look like that does not guarantee the above output?

Peter
 
D

Duncan Booth

Peter Otten said:
But still, to help my lack of fantasy -- what would a sane zip()
implementation look like that does not guarantee the above output?
Hypothetically?

The code in zip which builds the result tuples looks (ignoring error
handling) like:

// inside a loop building each result element...
PyObject *next = PyTuple_New(itemsize);

for (j = 0; j < itemsize; j++) {
PyObject *it = PyTuple_GET_ITEM(itlist, j);
PyObject *item = PyIter_Next(it);
PyTuple_SET_ITEM(next, j, item);
}

For fixed size tuples you can create them using PyTuple_Pack. So imagine
some world where the following code works faster for small tuples:

// Outside the loop building the result list:
PyObject *a, *b, *c;
if (itemsize >= 1 && itemsize <= 3) a = PyTuple_GET_ITEM(...);
if (itemsize >= 2 && itemsize <= 3) b = PyTuple_GET_ITEM(...);
if (itemsize == 3) c = PyTuple_GET_ITEM(...);
...

// inside the result list loop:
PyObject *next;
if (itemsize==1) {
next = PyTuple_Pack(1,
PyIter_Next(a));
} else if (itemsize==2) {
next = PyTuple_Pack(2,
PyIter_Next(a),
PyIter_Next(b));
} else if (itemsize==2) {
next = PyTuple_Pack(3,
PyIter_Next(a),
PyIter_Next(b),
PyIter_Next(c));
} else {
next = PyTuple_New(itemsize);

for (j = 0; j < itemsize; j++) {
PyObject *it = PyTuple_GET_ITEM(itlist, j);
PyObject *item = PyIter_Next(it);
PyTuple_SET_ITEM(next, j, item);
}
}

If compiled on a system where the stack grows downwards (as it often does)
the C compiler is very likely to evaluate function arguments in reverse
order.

(BTW, this also assumes that it's an implementation which uses exceptions
or something for error handling otherwise you probably can't get it right,
but maybe something like IronPython could end up with code like this.)

Or maybe if someone added PyTuple_Pack1, PyTuple_Pack2, PyTuple_Pack3
functions which grab their memory off a separate free list for each tuple
length. That might speed up the time to create the tuples as you might be
able to just reset the content not rebuild the object each time. Again that
could make code like the above run more quickly.
 
P

Peter Otten

Duncan said:
Hypothetically?

The code in zip which builds the result tuples looks (ignoring error
handling) like:

// inside a loop building each result element...
PyObject *next = PyTuple_New(itemsize);

for (j = 0; j < itemsize; j++) {
PyObject *it = PyTuple_GET_ITEM(itlist, j);
PyObject *item = PyIter_Next(it);
PyTuple_SET_ITEM(next, j, item);
}

For fixed size tuples you can create them using PyTuple_Pack. So imagine
some world where the following code works faster for small tuples:

// Outside the loop building the result list:
PyObject *a, *b, *c;
if (itemsize >= 1 && itemsize <= 3) a = PyTuple_GET_ITEM(...);
if (itemsize >= 2 && itemsize <= 3) b = PyTuple_GET_ITEM(...);
if (itemsize == 3) c = PyTuple_GET_ITEM(...);
...

// inside the result list loop:
PyObject *next;
if (itemsize==1) {
next = PyTuple_Pack(1,
PyIter_Next(a));
} else if (itemsize==2) {
next = PyTuple_Pack(2,
PyIter_Next(a),
PyIter_Next(b));
} else if (itemsize==2) {
next = PyTuple_Pack(3,
PyIter_Next(a),
PyIter_Next(b),
PyIter_Next(c));
} else {
next = PyTuple_New(itemsize);

for (j = 0; j < itemsize; j++) {
PyObject *it = PyTuple_GET_ITEM(itlist, j);
PyObject *item = PyIter_Next(it);
PyTuple_SET_ITEM(next, j, item);
}
}

If compiled on a system where the stack grows downwards (as it often does)
the C compiler is very likely to evaluate function arguments in reverse
order.

(BTW, this also assumes that it's an implementation which uses exceptions
or something for error handling otherwise you probably can't get it right,
but maybe something like IronPython could end up with code like this.)

Or maybe if someone added PyTuple_Pack1, PyTuple_Pack2, PyTuple_Pack3
functions which grab their memory off a separate free list for each tuple
length. That might speed up the time to create the tuples as you might be
able to just reset the content not rebuild the object each time. Again
that could make code like the above run more quickly.

Special-casing small tuples meets my sanity criterion :)

Let's see if I understand the above: In C a call

f(g(), g())

may result in machine code equivalent to either

x = g()
y = g()
f(x, y)

or

y = g()
x = g()
f(x, y)

Is that it?

Peter
 
D

Duncan Booth

Peter Otten said:
Let's see if I understand the above: In C a call

f(g(), g())

may result in machine code equivalent to either

x = g()
y = g()
f(x, y)

or

y = g()
x = g()
f(x, y)

Is that it?

Yes, or changing one of the calls to h() and compiling with
"cl -Fat.asm -Ox -c t.c":

------ t.c --------
extern int f(int a, int b);
extern int g();
extern int h();

int main(int argc, char **argv) {
return f(g(), h());
}
-------------------

The output file:
------- t.asm -----
; Listing generated by Microsoft (R) Optimizing Compiler Version 13.10.3077

TITLE t.c
.386P
include listing.inc
if @Version gt 510
..model FLAT
else
_TEXT SEGMENT PARA USE32 PUBLIC 'CODE'
_TEXT ENDS
_DATA SEGMENT DWORD USE32 PUBLIC 'DATA'
_DATA ENDS
CONST SEGMENT DWORD USE32 PUBLIC 'CONST'
CONST ENDS
_BSS SEGMENT DWORD USE32 PUBLIC 'BSS'
_BSS ENDS
$$SYMBOLS SEGMENT BYTE USE32 'DEBSYM'
$$SYMBOLS ENDS
_TLS SEGMENT DWORD USE32 PUBLIC 'TLS'
_TLS ENDS
FLAT GROUP _DATA, CONST, _BSS
ASSUME CS: FLAT, DS: FLAT, SS: FLAT
endif

INCLUDELIB LIBC
INCLUDELIB OLDNAMES

PUBLIC _main
EXTRN _f:NEAR
EXTRN _g:NEAR
EXTRN _h:NEAR
; Function compile flags: /Ogty
_TEXT SEGMENT
_argc$ = 8 ; size = 4
_argv$ = 12 ; size = 4
_main PROC NEAR
; File c:\temp\t.c
; Line 6
call _h
push eax
call _g
push eax
call _f
add esp, 8
; Line 7
ret 0
_main ENDP
_TEXT ENDS
END
-------------------------
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,150
Members
46,697
Latest member
AugustNabo

Latest Threads

Top