list.pop(0) vs. collections.dequeue

A

Arnaud Delobelle

Steve Howell said:
[...]
My algorithm does exactly N pops and roughly N list accesses, so I
would be going from N*N + N to N + N log N if switched to blist.

Can you post your algorithm?  It would be interesting to have a concrete
use case to base this discussion on.

I just realized you meant the Python code itself. It is here:

https://bitbucket.org/showell/shpaml_website/src/tip/shpaml.py

Hi - I didn't have the time to look at it before today. You scan
through a list called prefix_lines, and you use pop(0) as a means to
keep track of your position in this list. It seems to me that it would
be more effective to explicitely keep track of it - and it would remove
the need to the numerous copies of sublists of indent_lines that you
have to create.

I have rewritten part of the three relevant functions (html_block_tag,
get_indented_block and recurse, within indent_lines) to show you what I
mean (see below). I have tried to keep the changes to a minimum - you
can see that the code is no more complex like this. The advantage is
that only one list is created, prefix_lines, and there is no need to
mutate it or copy parts of it during the algorithm. I have not had the
time to test it though (only on one of the examples on the examples
webpage).

Code follows:

[...]

def html_block_tag(output, block, start, end, recurse):
append = output.append
prefix, tag = block[start]
if RAW_HTML.regex.match(tag):
append(prefix + tag)
recurse(block, start + 1, end)
else:
start_tag, end_tag = apply_jquery_sugar(tag)
append(prefix + start_tag)
recurse(block, start + 1, end)
append(prefix + end_tag)

[...]

def get_indented_block(prefix_lines, start, end):
prefix, line = prefix_lines[start]
len_prefix = len(prefix)
i = start + 1
while i < end:
new_prefix, line = prefix_lines
if line and len(new_prefix) <= len_prefix:
break
i += 1
while i-1 > start and prefix_lines[i-1][1] == '':
i -= 1
return i

[...]

def indent_lines(lines,
output,
branch_method,
leaf_method,
pass_syntax,
flush_left_syntax,
flush_left_empty_line,
indentation_method,
get_block,
):
append = output.append
def recurse(prefix_lines, start, end):
while start < end:
prefix, line = prefix_lines[start]
if line == '':
start += 1
append('')
else:
block_end = get_block(prefix_lines, start, end)
if block_end == start + 1:
start += 1
if line == pass_syntax:
pass
elif line.startswith(flush_left_syntax):
append(line[len(flush_left_syntax):])
elif line.startswith(flush_left_empty_line):
append('')
else:
append(prefix + leaf_method(line))
else:
branch_method(output, prefix_lines, start, block_end, recurse)
start = block_end
return
prefix_lines = map(indentation_method, lines)
recurse(prefix_lines, 0, len(prefix_lines))
 
S

Steve Howell

Steve Howell said:
[...]
My algorithm does exactly N pops and roughly N list accesses, so I
would be going from N*N + N to N + N log N if switched to blist.
Can you post your algorithm?  It would be interesting to have a concrete
use case to base this discussion on.
I just realized you meant the Python code itself.  It is here:

Hi - I didn't have the time to look at it before today.  You scan
through a list called prefix_lines, and you use pop(0) as a means to
keep track of your position in this list.  It seems to me that it would
be more effective to explicitely keep track of it - and it would remove
the need to the numerous copies of sublists of indent_lines that you
have to create.

I have rewritten part of the three relevant functions (html_block_tag,
get_indented_block and recurse, within indent_lines) to show you what I
mean (see below).  I have tried to keep the changes to a minimum - you
can see that the code is no more complex like this.  The advantage is
that only one list is created, prefix_lines, and there is no need to
mutate it or copy parts of it during the algorithm.  I have not had the
time to test it though (only on one of the examples on the examples
webpage).

Code follows:

[...]

def html_block_tag(output, block, start, end, recurse):
    append = output.append
    prefix, tag = block[start]
    if RAW_HTML.regex.match(tag):
        append(prefix + tag)
        recurse(block, start + 1, end)
    else:
        start_tag, end_tag = apply_jquery_sugar(tag)
        append(prefix + start_tag)
        recurse(block, start + 1, end)
        append(prefix + end_tag)

[...]

def get_indented_block(prefix_lines, start, end):
    prefix, line = prefix_lines[start]
    len_prefix = len(prefix)
    i = start + 1
    while i < end:
        new_prefix, line = prefix_lines
        if line and len(new_prefix) <= len_prefix:
            break
        i += 1
    while i-1 > start and prefix_lines[i-1][1] == '':
        i -= 1
    return i

[...]

def indent_lines(lines,
            output,
            branch_method,
            leaf_method,
            pass_syntax,
            flush_left_syntax,
            flush_left_empty_line,
            indentation_method,
            get_block,
            ):
    append = output.append
    def recurse(prefix_lines, start, end):
        while start < end:
            prefix, line = prefix_lines[start]
            if line == '':
                start += 1
                append('')
            else:
                block_end = get_block(prefix_lines, start, end)
                if block_end == start + 1:
                    start += 1
                    if line == pass_syntax:
                        pass
                    elif line.startswith(flush_left_syntax):
                        append(line[len(flush_left_syntax):])
                    elif line.startswith(flush_left_empty_line):
                        append('')
                    else:
                        append(prefix + leaf_method(line))
                else:
                    branch_method(output, prefix_lines, start, block_end, recurse)
                    start = block_end
        return
    prefix_lines = map(indentation_method, lines)
    recurse(prefix_lines, 0, len(prefix_lines))


I think your rewrite makes sense for the way that Python is
implemented today.

Instead of mutating the list as you consume elements, you propose to
advance the "start" variable, which is essentially a pointer.

There are only two disadvantage of the "start" approach that I can
think of:

1) You have to pass around two parameters where before there was
only one. (Aside: for stylistic concerns, would you bundle them in a
tuple, since they kind of go hand in hand?)
2) You hold on to memory from the elements longer.

Pushing this complexity down into CPython of course would have similar
disadvantages:

1) you have to store two pointers where before there was only one
2) you hold on to memory from pointers to orphaned elements longer

Disadvantage #1 is more than offset by the fact that you would have
had to waste memory with the "start" in the Python program.

Disadvantage #2 is offset by the fact that you would have been holding
on to the elements themselves in the Python program.

Admittedly this a pretty narrow context where pushing the logic down
into CPython gains overall simplicity and performance. Disadvantage
#1 is the sticky point when you consider the patch in the broader
context of all programs.

Thanks for looking at the code, by the way.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,176
Messages
2,570,950
Members
47,503
Latest member
supremedee

Latest Threads

Top