shutil.copyfile is incomplete (truncated)

R

Rob Schneider

Yep, there's the problem! See my previous post for details. Change this to:



f.close()



and you should be sorted.



ChrisA

Slapping forehead ... hard. Thanks!
 
M

Mark Lawrence

Slapping forehead ... hard. Thanks!

a) We've all done it :)
b) The print function/statement or Python's interactive mode are awesome
in situations like this.
 
R

Roy Smith

Rob Schneider said:
Source (correct one) is 47,970 bytes. Target after copy of 45,056 bytes.
I've tried changing what gets written to change the file size. It is usually
this sort of difference.

The file system is Mac OS Extended Journaled (default as out of the box).

Is it always the tail end of the file that gets truncated, or is it
missing (or mutating) data in the middle of the file? I'm just grasping
at straws here, but maybe it's somehow messing up line endings (turning
CRLF pairs into just LF), or using some other kind of encoding for
unicode characters?

If you compare the files with cmp, does it say:

$ cmp original truncated
cmp: EOF on truncated

that's what I would expect if it's a strict truncation. If it says
anything else, you've got a data munging problem.

What I would normally do around this time is run a system call trace on
the process to watch all the descriptor related (i.e. open, create,
write) system calls. On OSX, that means dtruss. Unfortunately, I'm
not that familiar with the OSX variant so I can't give you specific
advice about which options to use.

When you can see the system calls, you know exactly what your process is
doing. You should be able to see the output file being opened and a
descriptor returned, then find all the write() calls to that descriptor.
You'll also be able to find any other system calls on that pathname
after the descriptor is closed.

Please report back what you find!

Oh, another trick you might want to try is making the output file path
/dev/stdout and redirecting the output into a file with the shell. See
if that makes any difference. Or, try something like (assuming the -o
option to your script sets the output filename):

python my_prog.py -o /dev/stdout | dd bs=1 of=xxx

That will do a couple of things. First, dd will report how many bytes
it read and wrote, so you can see if that's the correct number. Also,
since your process will no longer be writing to a real file, if anything
is doing something weird like a seek() after you're done writing, that
will fail since you can't seek() on a pipe.
 
T

Terry Jan Reedy

Hold it one moment... You're not actually calling close. The file's
still open. Is that a copy/paste problem, or is that your actual code?

In Python, a function call ALWAYS has parentheses after it. Evaluating
a function's name like that returns the function (or method) object,
which you then do nothing with. (You could assign it someplace, for
instance, and call it later.) Try adding empty parens:

f.close()

and see if that solves the problem. Alternatively, look into the
'with' statement and the block syntax that it can give to I/O
operations.

I say *definitely* use a 'with' statement. Part of its purpose is to
avoid close bugs.
 
R

Roy Smith

Steven D'Aprano said:
But note that "done" in this case means "the file system thinks it is
done", not *actually* done. Hard drives, especially the cheaper ones,
lie. They can say the file is written when in fact the data is still in
the hard drive's internal cache and not written to the disk platter.
Also, in my experience, hardware RAID controllers will eat your data, and
then your brains when you try to diagnose the problem.

I would consider the chance that the disk may be faulty, or the file
system is corrupt. Does the problem go away if you write to a different
file system or a different disk?

It is *possible* that this is the problem, but it's really way far out
on the long tail of possibilities. If the file system were corrupted or
the disk faulty, the odds are you would be seeing all sorts of other
problems. And this would not be anywhere near as repeatable as the OP
is describing.

Think horses, not zebras.
 
R

Roy Smith

Cameron Simpson said:
45046 is exactly 11 * 4096. I'd say your I/O is using 4KB blocks,
and the last partial block (to make it up to 47970) didn't get
written (at the OS level).

Yeah, this sounds like a good diagnosis.

BTW, the dtruss command I recommended in my earlier post would confirm
this. But, to be honest, it's such a likely scenario that it hardly
needs confirmation.
 
8

88888 Dihedral

Steven D'Apranoæ–¼ 2013å¹´4月12日星期五UTC+8上åˆ8時06分21秒寫é“:
But note that "done" in this case means "the file system thinks it is

done", not *actually* done. Hard drives, especially the cheaper ones,

lie. They can say the file is written when in fact the data is still in

the hard drive's internal cache and not written to the disk platter.

Also, in my experience, hardware RAID controllers will eat your data, and

then your brains when you try to diagnose the problem.
Don't you model this as a non-blocking operation in
your program?

I would consider the chance that the disk may be faulty, or the file

system is corrupt. Does the problem go away if you write to a different

file system or a different disk?


Back-ups and read-back verifications are important for
those who care.
 
N

Nobody

But note that "done" in this case means "the file system thinks it is
done", not *actually* done. Hard drives, especially the cheaper ones,
lie. They can say the file is written when in fact the data is still in
the hard drive's internal cache and not written to the disk platter.
Also, in my experience, hardware RAID controllers will eat your data, and
then your brains when you try to diagnose the problem.

None of which is likely to be relevant here, as any subsequent access to
the file will reference the in-memory copy; the disk will only get
involved if the data has already been flushed from the OS' cache and has
to be read back in from disk.

write(), close(), etc return once the data has been written to the
OS' disk cache. At that point, the OS usually won't have even started
sending the data to the drive, let alone waited for the drive to report
(or claim) that the data has been written to the physical disk.

If you want to wait for the data written to be written to the physical
disk (in order to obtain specific behaviour with respect to an unclean
shutdown), use f.flush() followed by os.fsync(f.fileno()).

But most of the time, there's no point. If you actually care about what
happens in the event of an unclean shutdown, you typically also need to
sync the directory, otherwise the file's contents will get sync'd but the
file's very existence might not be.
 
C

Chris Angelico

But most of the time, there's no point. If you actually care about what
happens in the event of an unclean shutdown, you typically also need to
sync the directory, otherwise the file's contents will get sync'd but the
file's very existence might not be.

Or just store your content in a PostgreSQL database, and let it worry
about all the platform-specific details of how to fsync reliably.

ChrisA
 
S

Steven D'Aprano

None of which is likely to be relevant here,

Since we've actually identified the bug (the OP was using file.close
without actually calling it), that's certainly the case :)


[...]
If you want to wait for the data written to be written to the physical
disk (in order to obtain specific behaviour with respect to an unclean
shutdown), use f.flush() followed by os.fsync(f.fileno()).

If only it were that simple. It has been documented that some disks will
lie, even when told to sync. When I say "some", I mean *most*. There's
probably nothing you can do about it, apart from not using that model or
brand of disk, so you have to just live with the risk.

http://queue.acm.org/detail.cfm?id=2367378

USB sticks are especially nasty. I've got quite a few USB thumb drives
where the "write" light keeps flickering for anything up to five minutes
after the OS reports that the drive has been unmounted and is safe to
unplug. I corrupted the data on these quite a few times until I noticed
the light. And let's not even mention the drives that have no light at
all...

But my favourite example of lying hard drives of all time is this:

http://blog.jitbit.com/2011/04/chinese-magic-drive.html

I want one of those!
 
C

Chris Angelico

If only it were that simple. It has been documented that some disks will
lie, even when told to sync. When I say "some", I mean *most*. There's
probably nothing you can do about it, apart from not using that model or
brand of disk, so you have to just live with the risk.

It's often close to that simple. With most hard disks, you can make
them 100% reliable, but you may have to check some disk parameters (on
Linux, that's just a matter of writing to something in /proc
somewhere, don't remember the details but it's easy to check). The
worst offenders I've met are SSDs...
USB sticks are especially nasty. I've got quite a few USB thumb drives
where the "write" light keeps flickering for anything up to five minutes
after the OS reports that the drive has been unmounted and is safe to
unplug. I corrupted the data on these quite a few times until I noticed
the light. And let's not even mention the drives that have no light at
all...

.... but you've met worse.
But my favourite example of lying hard drives of all time is this:

http://blog.jitbit.com/2011/04/chinese-magic-drive.html

I want one of those!

Awesome! It's the new version of DoubleSpace / DriveSpace!

http://en.wikipedia.org/wiki/DriveSpace

(And its problems, according to that Wikipedia article, actually had
the same root cause - write caching that the user wasn't aware of.
Great.)

ChrisA
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,141
Messages
2,570,813
Members
47,357
Latest member
sitele8746

Latest Threads

Top