list.pop(0) vs. collections.dequeue

Arnaud Delobelle · Jan 27, 2010

Steve Howell said:
[...]

My algorithm does exactly N pops and roughly N list accesses, so I
would be going from N*N + N to N + N log N if switched to blist.

Click to expand...

Can you post your algorithm? Â It would be interesting to have a concrete
use case to base this discussion on.

Click to expand...

I just realized you meant the Python code itself. It is here:

https://bitbucket.org/showell/shpaml_website/src/tip/shpaml.py

Hi - I didn't have the time to look at it before today. You scan
through a list called prefix_lines, and you use pop(0) as a means to
keep track of your position in this list. It seems to me that it would
be more effective to explicitely keep track of it - and it would remove
the need to the numerous copies of sublists of indent_lines that you
have to create.

I have rewritten part of the three relevant functions (html_block_tag,
get_indented_block and recurse, within indent_lines) to show you what I
mean (see below). I have tried to keep the changes to a minimum - you
can see that the code is no more complex like this. The advantage is
that only one list is created, prefix_lines, and there is no need to
mutate it or copy parts of it during the algorithm. I have not had the
time to test it though (only on one of the examples on the examples
webpage).

Code follows:

[...]

def html_block_tag(output, block, start, end, recurse):
append = output.append
prefix, tag = block[start]
if RAW_HTML.regex.match(tag):
append(prefix + tag)
recurse(block, start + 1, end)
else:
start_tag, end_tag = apply_jquery_sugar(tag)
append(prefix + start_tag)
recurse(block, start + 1, end)
append(prefix + end_tag)

[...]

def get_indented_block(prefix_lines, start, end):
prefix, line = prefix_lines[start]
len_prefix = len(prefix)
i = start + 1
while i < end:
new_prefix, line = prefix_lines
if line and len(new_prefix) <= len_prefix:
break
i += 1
while i-1 > start and prefix_lines[i-1][1] == '':
i -= 1
return i

[...]

def indent_lines(lines,
output,
branch_method,
leaf_method,
pass_syntax,
flush_left_syntax,
flush_left_empty_line,
indentation_method,
get_block,
):
append = output.append
def recurse(prefix_lines, start, end):
while start < end:
prefix, line = prefix_lines[start]
if line == '':
start += 1
append('')
else:
block_end = get_block(prefix_lines, start, end)
if block_end == start + 1:
start += 1
if line == pass_syntax:
pass
elif line.startswith(flush_left_syntax):
append(line[len(flush_left_syntax):])
elif line.startswith(flush_left_empty_line):
append('')
else:
append(prefix + leaf_method(line))
else:
branch_method(output, prefix_lines, start, block_end, recurse)
start = block_end
return
prefix_lines = map(indentation_method, lines)
recurse(prefix_lines, 0, len(prefix_lines))

Steve Howell · Jan 27, 2010

Steve Howell said:
Steve Howell said:

[...]
My algorithm does exactly N pops and roughly N list accesses, so I
would be going from N*N + N to N + N log N if switched to blist.
Can you post your algorithm? It would be interesting to have a concrete
use case to base this discussion on.

Click to expand...

Click to expand...

I just realized you meant the Python code itself. It is here:

Click to expand...

https://bitbucket.org/showell/shpaml_website/src/tip/shpaml.py

Click to expand...

Hi - I didn't have the time to look at it before today. You scan
through a list called prefix_lines, and you use pop(0) as a means to
keep track of your position in this list. It seems to me that it would
be more effective to explicitely keep track of it - and it would remove
the need to the numerous copies of sublists of indent_lines that you
have to create.

I have rewritten part of the three relevant functions (html_block_tag,
get_indented_block and recurse, within indent_lines) to show you what I
mean (see below). I have tried to keep the changes to a minimum - you
can see that the code is no more complex like this. The advantage is
that only one list is created, prefix_lines, and there is no need to
mutate it or copy parts of it during the algorithm. I have not had the
time to test it though (only on one of the examples on the examples
webpage).

Code follows:

[...]

def html_block_tag(output, block, start, end, recurse):
append = output.append
prefix, tag = block[start]
if RAW_HTML.regex.match(tag):
append(prefix + tag)
recurse(block, start + 1, end)
else:
start_tag, end_tag = apply_jquery_sugar(tag)
append(prefix + start_tag)
recurse(block, start + 1, end)
append(prefix + end_tag)

[...]

def get_indented_block(prefix_lines, start, end):
prefix, line = prefix_lines[start]
len_prefix = len(prefix)
i = start + 1
while i < end:
new_prefix, line = prefix_lines
if line and len(new_prefix) <= len_prefix:
break
i += 1
while i-1 > start and prefix_lines[i-1][1] == '':
i -= 1
return i

[...]

def indent_lines(lines,
output,
branch_method,
leaf_method,
pass_syntax,
flush_left_syntax,
flush_left_empty_line,
indentation_method,
get_block,
):
append = output.append
def recurse(prefix_lines, start, end):
while start < end:
prefix, line = prefix_lines[start]
if line == '':
start += 1
append('')
else:
block_end = get_block(prefix_lines, start, end)
if block_end == start + 1:
start += 1
if line == pass_syntax:
pass
elif line.startswith(flush_left_syntax):
append(line[len(flush_left_syntax):])
elif line.startswith(flush_left_empty_line):
append('')
else:
append(prefix + leaf_method(line))
else:
branch_method(output, prefix_lines, start, block_end, recurse)
start = block_end
return
prefix_lines = map(indentation_method, lines)
recurse(prefix_lines, 0, len(prefix_lines))

I think your rewrite makes sense for the way that Python is
implemented today.

Instead of mutating the list as you consume elements, you propose to
advance the "start" variable, which is essentially a pointer.

There are only two disadvantage of the "start" approach that I can
think of:

1) You have to pass around two parameters where before there was
only one. (Aside: for stylistic concerns, would you bundle them in a
tuple, since they kind of go hand in hand?)
2) You hold on to memory from the elements longer.

Pushing this complexity down into CPython of course would have similar
disadvantages:

1) you have to store two pointers where before there was only one
2) you hold on to memory from pointers to orphaned elements longer

Disadvantage #1 is more than offset by the fact that you would have
had to waste memory with the "start" in the Python program.

Disadvantage #2 is offset by the fact that you would have been holding
on to the elements themselves in the Python program.

Admittedly this a pretty narrow context where pushing the logic down
into CPython gains overall simplicity and performance. Disadvantage
#1 is the sticky point when you consider the patch in the broader
context of all programs.

Thanks for looking at the code, by the way.

python -- question with list.pop(0)	0	Jun 7, 2004
Calculate rang and derang of ordering of subsets	0	Feb 6, 2024
Calculate rang and derang of ordering of subsets	1	Feb 6, 2024
Performance of list vs. set equality operations	19	Apr 6, 2010
Running Scripts vs Interactive mode	8	Feb 23, 2011
An empty initializer is invalid for an array with unspecified bound	0	Jul 1, 2020
List comprehension vs filter()	6	Apr 20, 2011
Lowest Value in List	5	Oct 2, 2013

list.pop(0) vs. collections.dequeue

Arnaud Delobelle

Steve Howell

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads