reading files with error

M

Maurice LING

Hi,

I'm trying to read some files (video files) that seems to have some
errors in it. Basically, I cannot copy it out of discs as that gives me
an error message but I can still play the file using a media player like
VLC or QuickTime. I understand that copying a file will also invoke
checking routines as well, and I am guessing that the file may have some
parity-bit error or something.

Am I able to use Python to force read the entire file (full length)?
That is, do not check the read for errors. I know that this is insideous
in many uses but in media files, it may only result in a skipped frame
or something. What I've done is something like this:

What I've noticed is this:
1. sometimes it (Python) only reads to roughly the point of initial copy
error (I try to take note of how far drag-and-drop copying proceeds
before it fails)
2. sometimes it is able to read pass the drag-and-drop copying
fail-point but may or may not be full length.

What determines if Python is able to make it pass drag-and-drop copying
fail-point?

Is there anyway to do what I want, force read full length?

Thanks and cheers
Maurice
 
J

jepler

I have written a program to do something similar. My strategy is:
* use os.read() to read 512 bytes at a time
* when os.read fails, seek to the next multiple of 512 bytes
and write '\0' * 512 to the output
I notice this code doesn't deal properly with short reads, but in my experience
they never happen (because the disk error makes an entire block unreadable, and
a block is never less than 512 bytes)

I use this code on a unix-style system.

def dd(src, target, bs=512):
src = os.open(src, os.O_RDONLY)
if os.path.exists(target):
target = os.open(target, os.O_WRONLY | os.O_APPEND, 0600)
existing = os.lseek(target, 0, SEEK_END)
else:
existing = 0
target = os.open(target, os.O_WRONLY | os.O_CREAT, 0600)

total = os.lseek(src, 0, SEEK_END) / bs
os.lseek(src, existing, SEEK_SET)
os.lseek(target, existing, SEEK_SET)

if existing: print "starting at", existing
i = existing / bs
f = 0
lastrem = -1

last = start = time.time()
while 1:
try:
block = os.read(src, bs)
except os.error, detail:
if detail.errno == errno.EIO:
block = "\0" * bs
os.lseek(src, (i+1) * bs, SEEK_SET)
f = f + 1
else:
raise
if block == "": break

i = i + 1
os.write(target, block)

now = time.time()
if i % 1000 or now - last < 1: continue
last = now

frac = i * 1. / total
rem = int((now-start) * (1-frac) / frac)
if rem < 60 or abs(rem - lastrem) > .5:
rm, rs = divmod(rem, 60)
lastrem = rem
spd = i * 512. / (now - start) / 1024 / 1024
sys.stderr.write("%8d %8d %8d %3.1f%% %6d:%02d %6.1fMB/s\r"
% (i, f, i-f, i * 100. / total, rm, rs, spd))
sys.stderr.write("\n")

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFDLMm0Jd01MZaTXX0RAoyQAKCgrED02MfBBBxGGjB66XR0PtkPUwCeJZaj
qnHVJVnl3zAfuMrOXAiFzd8=
=vBK6
-----END PGP SIGNATURE-----
 
M

Maurice Ling

I have written a program to do something similar. My strategy is:
* use os.read() to read 512 bytes at a time
* when os.read fails, seek to the next multiple of 512 bytes
and write '\0' * 512 to the output
I notice this code doesn't deal properly with short reads, but in my experience
they never happen (because the disk error makes an entire block unreadable, and
a block is never less than 512 bytes)

I use this code on a unix-style system.

def dd(src, target, bs=512):
src = os.open(src, os.O_RDONLY)
if os.path.exists(target):
target = os.open(target, os.O_WRONLY | os.O_APPEND, 0600)
existing = os.lseek(target, 0, SEEK_END)
else:
existing = 0
target = os.open(target, os.O_WRONLY | os.O_CREAT, 0600)

total = os.lseek(src, 0, SEEK_END) / bs
os.lseek(src, existing, SEEK_SET)
os.lseek(target, existing, SEEK_SET)

if existing: print "starting at", existing
i = existing / bs
f = 0
lastrem = -1

last = start = time.time()
while 1:
try:
block = os.read(src, bs)
except os.error, detail:
if detail.errno == errno.EIO:
block = "\0" * bs
os.lseek(src, (i+1) * bs, SEEK_SET)
f = f + 1
else:
raise
if block == "": break

i = i + 1
os.write(target, block)

now = time.time()
if i % 1000 or now - last < 1: continue
last = now

frac = i * 1. / total
rem = int((now-start) * (1-frac) / frac)
if rem < 60 or abs(rem - lastrem) > .5:
rm, rs = divmod(rem, 60)
lastrem = rem
spd = i * 512. / (now - start) / 1024 / 1024
sys.stderr.write("%8d %8d %8d %3.1f%% %6d:%02d %6.1fMB/s\r"
% (i, f, i-f, i * 100. / total, rm, rs, spd))
sys.stderr.write("\n")
Sorry but what are SEEK_END and SEEK_SET?

Maurice

--
Maurice Han Tong LING, BSc(Hons)(Biochem), AdvDipComp, CPT, SSN, FIFA,
MASBMB, MAMBIS, MACM
Doctor of Philosophy (Science) Candidate, The University of Melbourne
mobile: +61 4 22781753, +65 96669233
mailing address: Department of Zoology, The University of Melbourne
Royal Parade, Parkville, Victoria 3010, Australia
residence: 9/41 Dover Street, Flemington, Victoria 3031, Australia
resume: http://maurice.vodien.com/maurice_resume.pdf
www: http://www.geocities.com/beldin79/
 
J

jepler

Sorry but what are SEEK_END and SEEK_SET?

Oops, that's what I get for snipping a part of a larger program.

SEEK_SET = 0
SEEK_END = 2

Jeff

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFDLYPxJd01MZaTXX0RArHTAKCbHRfGu/Bf7A5sopPXudMrKcZnuQCgjS72
uol/6PiY+7GKCnURTEJ05pE=
=fXAP
-----END PGP SIGNATURE-----
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,264
Messages
2,571,319
Members
48,003
Latest member
coldDuece

Latest Threads

Top