P
Phoe6
Hi all,
I had a filesystem crash and when I retrieved the data back
the files had random names without extension. I decided to write a
script to determine the file extension and create a newfile with
extension.
---
method 1:
# File extension utility.
import os
import mimetypes
import shutil
def main():
for root,dirs,files in os.walk(r'C:\Senthil\test'):
for each in files:
fname = os.path.join(root,each)
print fname
mtype,entype = mimetypes.guess_type(fname)
fext = mimetypes.guess_extension(mtype)
if fext is not None:
try:
newname = fname + fext
print newname
shutil.copyfile(fname,newname)
except (IOError,os.error), why:
print "Can't copy %s to %s: %s" %
(fname,newname,str(why))
if __name__ == "__main__":
main()
----
The problem I faced with this script is. if the filename did not have
any extension, the mimetypes.guess_type(filename) failed!!!
How do I get around this problem.
As it was a linux box, I tried using file command to get the work done.
----
Method 2:
import os
import shutil
import re
def detext(filename):
cin,cout,cerr = os.popen3('file ' + filename)
fileoutput = cout.read()
rtf = re.compile('Rich Text Format data')
# doc = re.compile('Microsoft Office Document')
pdf = re.compile('PDF')
if rtf.search(fileoutput) is not None:
shutil.copyfile(filename,filename + '.rtf')
if doc.search(fileoutput) is not None:
shutil.copyfile(filename,filename + '.doc')
if pdf.search(fileoutput) is not None:
shutil.copyfile(filename,filename + '.pdf')
def main():
for root,dirs,files in os.walk(os.getcwd()):
for each in files:
fname = os.path.join(root,each)
detext(fname)
if __name__ == '__main__':
main()
----
but the problem with using file was it recognized both .xls (MS Excel)
and .doc ( MS Doc) as Microsoft Word Document only. I need to separate
the .xls and .doc files, I dont know if file will be helpful here.
--
If the first approach of mimetypes works, it would be great!
Has anyone faced this problem? How did you solve it?
thanks,
Senthil
http://phoe6.livejournal.com
I had a filesystem crash and when I retrieved the data back
the files had random names without extension. I decided to write a
script to determine the file extension and create a newfile with
extension.
---
method 1:
# File extension utility.
import os
import mimetypes
import shutil
def main():
for root,dirs,files in os.walk(r'C:\Senthil\test'):
for each in files:
fname = os.path.join(root,each)
print fname
mtype,entype = mimetypes.guess_type(fname)
fext = mimetypes.guess_extension(mtype)
if fext is not None:
try:
newname = fname + fext
print newname
shutil.copyfile(fname,newname)
except (IOError,os.error), why:
print "Can't copy %s to %s: %s" %
(fname,newname,str(why))
if __name__ == "__main__":
main()
----
The problem I faced with this script is. if the filename did not have
any extension, the mimetypes.guess_type(filename) failed!!!
How do I get around this problem.
As it was a linux box, I tried using file command to get the work done.
----
Method 2:
import os
import shutil
import re
def detext(filename):
cin,cout,cerr = os.popen3('file ' + filename)
fileoutput = cout.read()
rtf = re.compile('Rich Text Format data')
# doc = re.compile('Microsoft Office Document')
pdf = re.compile('PDF')
if rtf.search(fileoutput) is not None:
shutil.copyfile(filename,filename + '.rtf')
if doc.search(fileoutput) is not None:
shutil.copyfile(filename,filename + '.doc')
if pdf.search(fileoutput) is not None:
shutil.copyfile(filename,filename + '.pdf')
def main():
for root,dirs,files in os.walk(os.getcwd()):
for each in files:
fname = os.path.join(root,each)
detext(fname)
if __name__ == '__main__':
main()
----
but the problem with using file was it recognized both .xls (MS Excel)
and .doc ( MS Doc) as Microsoft Word Document only. I need to separate
the .xls and .doc files, I dont know if file will be helpful here.
--
If the first approach of mimetypes works, it would be great!
Has anyone faced this problem? How did you solve it?
thanks,
Senthil
http://phoe6.livejournal.com