creating size-limited tar files

D

Dave Angel

2012/11/14 Dave Angel said:
Ok this is all very nice, but:

[andrea@andreacrotti tar_baller]$ time python2 test_pipe.py > /dev/null

real 0m21.215s
user 0m0.750s
sys 0m1.703s

[andrea@andreacrotti tar_baller]$ time ls -lR /home/andrea | cat > /dev/null

real 0m0.986s
user 0m0.413s
sys 0m0.600s

<snip>


So apparently it's way slower than using this system, is this normal?

I'm not sure how this timing relates to the thread, but what it mainly
shows is that starting up the Python interpreter takes quite a while,
compared to not starting it up.


Well it's related because my program has to be as fast as possible, so
in theory I thought that using Python pipes would be better because I
can get easily the PID of the first process.

But if it's so slow than it's not worth, and I don't think is the
Python interpreter because it's more or less constantly many times
slower even changing the size of the input..

Well, as I said, I don't see how the particular timing has anything to
do with the rest of the thread. If you want to do an ls within a Python
program, go ahead. But if all you need can be done with ls itself, then
it'll be slower to launch python just to run it.

Your first timing runs python, which runs two new shells, ls, and cat.
Your second timing runs ls and cat.

So the difference is starting up python, plus starting the shell two
extra times.

I'd also be curious if you flushed the system buffers before each
timing, as the second test could be running entirely in system memory.
And no, I don't know offhand how to flush them in Linux, just that
without it, your timings are not at all repeatable. Note the two
identical runs here.

davea@think:~/temppython$ time ls -lR ~ | cat > /dev/null

real 0m0.164s
user 0m0.020s
sys 0m0.000s
davea@think:~/temppython$ time ls -lR ~ | cat > /dev/null

real 0m0.018s
user 0m0.000s
sys 0m0.010s

real time goes down by 90%, while user time drops to zero.
And on a 3rd and subsequent run, sys time goes to zero as well.
 
A

Andrea Crotti

Well, as I said, I don't see how the particular timing has anything to
do with the rest of the thread. If you want to do an ls within a Python
program, go ahead. But if all you need can be done with ls itself, then
it'll be slower to launch python just to run it.

Your first timing runs python, which runs two new shells, ls, and cat.
Your second timing runs ls and cat.

So the difference is starting up python, plus starting the shell two
extra times.

I'd also be curious if you flushed the system buffers before each
timing, as the second test could be running entirely in system memory.
And no, I don't know offhand how to flush them in Linux, just that
without it, your timings are not at all repeatable. Note the two
identical runs here.

davea@think:~/temppython$ time ls -lR ~ | cat > /dev/null

real 0m0.164s
user 0m0.020s
sys 0m0.000s
davea@think:~/temppython$ time ls -lR ~ | cat > /dev/null

real 0m0.018s
user 0m0.000s
sys 0m0.010s

real time goes down by 90%, while user time drops to zero.
And on a 3rd and subsequent run, sys time goes to zero as well.

Right I didn't think about that..
Anyway the only thing I wanted to understand is if using the pipes in
subprocess is exactly the same as doing
the Linux pipe, or not.

And any idea on how to run it in ram?
Maybe if I create a pipe in tmpfs it might already work, what do you think?
 
D

Dave Angel

<SNIP>
Anyway the only thing I wanted to understand is if using the pipes in
subprocess is exactly the same as doing
the Linux pipe, or not.

It's not the same thing, but you can usually assume it's close. Other
effects will probably dominate any differences.
And any idea on how to run it in ram?
Maybe if I create a pipe in tmpfs it might already work, what do you think?

In a good virtual OS, such as Linux, there's very little predictable
difference between running in RAM (which is to say reading and writing
to the swap file) or reading and writing to a file you specify. In
fact, writing to a file can frequently be quicker, if it's sequential.

Why? Linux is using any given piece of physical RAM to map a file, or
an allocated buffer, or shared memory, or nearly anything. About the
only special cases are the kind of RAM that has to be locked into RAM
for hardware reasons.

Linux decides which pieces to keep in memory, whether it calls it
caching, swapping, memory mapping, or whatever. And frequently,
attempts to "beat the system" result in counterintuitive results.

If in doubt, measure. But choose your measures carefully, because lots
more things will change the measurement than you might expect.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,143
Messages
2,570,822
Members
47,368
Latest member
michaelsmithh

Latest Threads

Top