feature requests

M

macker

Hi, hope this is the right group for this:

I miss two basic (IMO) features in parallel processing:

1. make `threading.Thread.start()` return `self`

I'd like to be able to `workers = [Thread(params).start() for params in whatever]`. Right now, it's 5 ugly, menial lines:

workers = []
for params in whatever:
thread = threading.Thread(params)
thread.start()
workers.append(thread)

2. make multiprocessing pools (incl. ThreadPool) limit the size of their internal queues

As it is now, the queue will greedily consume its entire input, and if the input is large and the pool workers are slow in consuming it, this blows upRAM. I'd like to be able to `pool = Pool(4, max_qsize=1000)`. Same with the output queue (finished tasks).

Or does anyone know of a way to achieve this?
 
C

Chris Angelico

I'd like to be able to `workers = [Thread(params).start() for params in whatever]`. Right now, it's 5 ugly, menial lines:

workers = []
for params in whatever:
thread = threading.Thread(params)
thread.start()
workers.append(thread)

You could shorten this by iterating twice, if that helps:

workers = [Thread(params).start() for params in whatever]
for thrd in workers: thrd.start()

ChrisA
 
T

Tim Chase

workers = []
for params in whatever:
thread = threading.Thread(params)
thread.start()
workers.append(thread)

You could shorten this by iterating twice, if that helps:

workers = [Thread(params).start() for params in whatever]
for thrd in workers: thrd.start()

Do you mean

workers = [Thread(params) for params in whatever]
for thrd in workers: thrd.start()

? ("Thread(params)" vs. "Thread(params).start()" in your list comp)

-tkc
 
C

Chris Angelico

Do you mean

workers = [Thread(params) for params in whatever]
for thrd in workers: thrd.start()

? ("Thread(params)" vs. "Thread(params).start()" in your list comp)

Whoops, copy/paste fail. Yes, that's what I meant.

Thanks for catching!

ChrisA
 
E

Ethan Furman

Hi, hope this is the right group for this:

I miss two basic (IMO) features in parallel processing:

1. make `threading.Thread.start()` return `self`

I'd like to be able to `workers = [Thread(params).start() for params in whatever]`. Right now, it's 5 ugly, menial lines:

workers = []
for params in whatever:
thread = threading.Thread(params)
thread.start()
workers.append(thread)

Ugly, menial lines are a clue that a function to hide it could be useful.

2. make multiprocessing pools (incl. ThreadPool) limit the size of their internal queues

As it is now, the queue will greedily consume its entire input, and if the input is large and the pool workers are slow in consuming it, this blows up RAM. I'd like to be able to `pool = Pool(4, max_qsize=1000)`. Same with the output queue (finished tasks).

Have you verified that this is a problem in Python?

Or does anyone know of a way to achieve this?

You could try subclassing.
 
M

macker

Ugly, menial lines are a clue that a function to hide it could be useful.

Or a clue to add a trivial change elsewhere (hint for Ethan: `return self` at the end of `Thread.start()`).
Have you verified that this is a problem in Python?
?

You could try subclassing.

I could try many things. What this thread is about is trying to fix it on stdlib level, so that people don't have to reinvent the wheel every time.

Thanks to Chris for his suggestion. Ethan, please stay away from this thread.

-macker
 
E

Ethan Furman

Or a clue to add a trivial change elsewhere (hint for Ethan: `return self` at the end of `Thread.start()`).

I'm aware that would solve your issue. I'm also aware that Python rarely does a 'return self' at the end of methods.
Since that probably isn't going to change, a helper function is probably your best way forward.


You stated it "would blow up RAM" -- have you actually tested this, or are you making assumptions based on experience
from other languages, or assumptions based on nothing at all?

I could try many things. What this thread is about is trying to fix it on stdlib level, so that people don't have to reinvent the wheel every time.

Did you really expect your idea to just sail through with no opposition, no counter-ideas, no reasons why it might not,
or would not, work?

Thanks to Chris for his suggestion. Ethan, please stay away from this thread.

Wow, you're rude.
 
T

Terry Reedy

I'm aware that would solve your issue. I'm also aware that Python
rarely does a 'return self' at the end of methods.

Not returning self is a basic design principle of Python since its
beginning. (I am not aware of any exceptions and would regard one as
possibly a mistake.) Guido is aware that not doing so prevents chaining
of mutation methods. He thinks it very important that people know and
remember the difference between a method that mutates self and one that
does not. Otherwise, one could write 'b = a.sort()' and not know
(remember) that b is just an alias for a. He must have seen this type of
error, especially in beginner code, in other languages before designing
Python.
Since that probably isn't going to change,

as it would only make things worse.

Note that some mutation methods also return something useful other than
default None. Examples are mylist.pop() and iterator.__next__ (usually
accessed by next(iterator)*. So it is impossible for all mutation
methods to just 'return self'.

* iterator.__next__ is a generalized specialization of list.pop. It can
only return the 'first' item, but can do so with any iterable, including
those that are not ordered and those that represent virtual rather than
concrete collections.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,955
Messages
2,570,117
Members
46,705
Latest member
v_darius

Latest Threads

Top