S
Suresh Pillai
I am performing simulations on networks (graphs). I have a question on
speed of execution (assuming very ample memory for now). I simplify the
details of my simulation below, as the question I ask applies more
generally than my specific case. I would greatly appreciate general
feedback in terms of computing and of course considerations specific to
implementation in Python.
The nodes in my network may be ON or OFF. The network starts off with
all nodes in the OFF state. I loop through the nodes. For each node
that is OFF, I consider some probability of it turning ON based on the
states of its neighbours. I MUST GO THROUGH ALL NODES BEFORE DECIDING
WHICH ONES TO TURN ON.
So my question is whether it is faster to
1. loop through a list of ALL nodes and check for OFF nodes using ifs
or to
2. loop through a container of OFF nodes and remove from this when they
turn ON
The second would result in looping through less nodes, especially as the
simulation progresses, but how does the cost of removal compare with
cheap ifs and would the extra memory usage affect performance.
I an appreciate that the cost of the if check, the number of nodes, and
the type of container I use will come into the answer.
In my case, the ifs are cheap boolean queries (whether the node is ON or
OFF). The number of nodes is very large: millions for sure, maybe tens
of millions. If considering (2), take note of my BOLD text above, which
means I can't remove nodes as I iterate through them in the main loop.
I naturally started coding with (2), but couldn't decide on the best data
structure for python. A set seemed ideal for speedy removal, but then I
can't iterate through them with out popping. An ordered list? Some
creative solution with numpy arrays?
There is also the complication that since we are in interpreted python,
what is theoretically the best data structure may not in reality be
optimal unless it is a default system object or coded externally in a
compiled module.
Of course, I will start experimenting to see what the execution
difference is, but I would appreciate some suggestions from others re
which is best and also on best data structure for (2).
I'm not a newbie, so you can get technical with me python-wise and
algorithm wise. I realise it is a 'basic' question, but it is something
that I have always wondered about (cheap ifs versus extra structure) and
with the number of nodes I am considering, it actually becomes an issue.
Many Thanks,
Suresh
speed of execution (assuming very ample memory for now). I simplify the
details of my simulation below, as the question I ask applies more
generally than my specific case. I would greatly appreciate general
feedback in terms of computing and of course considerations specific to
implementation in Python.
The nodes in my network may be ON or OFF. The network starts off with
all nodes in the OFF state. I loop through the nodes. For each node
that is OFF, I consider some probability of it turning ON based on the
states of its neighbours. I MUST GO THROUGH ALL NODES BEFORE DECIDING
WHICH ONES TO TURN ON.
So my question is whether it is faster to
1. loop through a list of ALL nodes and check for OFF nodes using ifs
or to
2. loop through a container of OFF nodes and remove from this when they
turn ON
The second would result in looping through less nodes, especially as the
simulation progresses, but how does the cost of removal compare with
cheap ifs and would the extra memory usage affect performance.
I an appreciate that the cost of the if check, the number of nodes, and
the type of container I use will come into the answer.
In my case, the ifs are cheap boolean queries (whether the node is ON or
OFF). The number of nodes is very large: millions for sure, maybe tens
of millions. If considering (2), take note of my BOLD text above, which
means I can't remove nodes as I iterate through them in the main loop.
I naturally started coding with (2), but couldn't decide on the best data
structure for python. A set seemed ideal for speedy removal, but then I
can't iterate through them with out popping. An ordered list? Some
creative solution with numpy arrays?
There is also the complication that since we are in interpreted python,
what is theoretically the best data structure may not in reality be
optimal unless it is a default system object or coded externally in a
compiled module.
Of course, I will start experimenting to see what the execution
difference is, but I would appreciate some suggestions from others re
which is best and also on best data structure for (2).
I'm not a newbie, so you can get technical with me python-wise and
algorithm wise. I realise it is a 'basic' question, but it is something
that I have always wondered about (cheap ifs versus extra structure) and
with the number of nodes I am considering, it actually becomes an issue.
Many Thanks,
Suresh