Chris said:
I have a rather large (100+ MB) FASTA file from which I need to
access records in a random order.
I just came across this thread today and I don't understand why you are
trying to reinvent the wheel instead of using Biopython which already
has a solution to this problem, among others.
But actually I usually use formatdb, which comes with NCBI-BLAST to
create blastdb files that can also be used for BLAST.
mh5@ecs4a /data/blastdb/Users/mh5
$ python
Python 2.3.3 (#1, Jan 20 2004, 17:39:36) [C] on osf1V5
Type "help", "copyright", "credits" or "license" for more information.('lcl|004/04/m00404.peptide.faa ENSMUSG00000022297 peptide', 'MERSPFLLACILLPLVRGHSLFTCEPITVPRCMKMTYNMTFFPNLMGHYDQGIAAVEMGHFLHLANLECSPNIEMFLCQAFIPTCTEQIHVVLPCRKLCEKIVSDCKKLMDTFGIRWPEELECNRLPHCDDTVPVTSHPHTELSGPQKKSDQVPRDIGFWCPKHLRTSGDQGYRFLGIEQCAPPCPNMYFKSDELDFAKSFIGIVSIFCLCATLFTFLTFLIDVRRFRYPERPIIYYSVCYSIVSLMYFVGFLLGNSTACNKADEKLELGDTVVLGSKNKACSVVFMFLYFFTMAGTVWWVILTITWFLAAGRKWSCEAIEQKAVWFHAVAWGAPGFLTVMLLAMNKVEGDNISGVCFVGLYDLDASRYFVLLPLCLCVFVGLSLLLAGIISLNHVRQVIQHDGRNQEKLKKFMIRIGVFSGLYLVPLVTLLGCYVYELVNRITWEMTWFSDHCHQYRIPCPYQANPKARPELALFMIKYLMTLIVGISAVFWVGSKKTCTEWAGFFKRNRKRDPISESRRVLQESCEFFLKHNSKVKHKKKHGAPGPHRLKVISKSMGTSTGATTNHGTSAMAIADHDYLGQETSTEVHTSPEASVKEGRADRANTPSAKDRDCGESAGPSSKLSGNRNGRESRAGGLKERSNGSEGAPSEGRVSPKSSVPETGLIDCSTSQAASSPEPTSLKGSTSLPVHSASRARKEQGAGSHSDA')
tools2 has this in it:
class LightIterator(object):
def __init__(self, handle):
self._handle = handle
self._defline = None
def __iter__(self):
return self
def next(self):
lines = []
defline_old = self._defline
while 1:
line = self._handle.readline()
if not line:
if not defline_old and not lines:
raise StopIteration
if defline_old:
self._defline = None
break
elif line[0] == '>':
self._defline = line[1:].rstrip()
if defline_old or lines:
break
else:
defline_old = self._defline
else:
lines.append(line.rstrip())
return defline_old, ''.join(lines)
blastdb.py:
#!/usr/bin/env python
from __future__ import division
__version__ = "$Revision: 1.3 $"
"""
blastdb.py
access blastdb files
Copyright 2005 Michael Hoffman
License: GPL
"""
import os
import sys
try:
from poly import NamedTemporaryFile #
http://www.ebi.ac.uk/~hoffman/software/poly/
except ImportError:
from tempfile import NamedTemporaryFile
FASTACMD_CMDLINE = "fastacmd -d %s -s %s -o %s"
class Database(object):
def __init__(self, filename):
self.filename = filename
def fetch_to_file(self, query, filename):
status = os.system(FASTACMD_CMDLINE % (self.filename, query, filename))
if status:
raise RuntimeError, "fastacmd returned %d" % os.WEXITSTATUS(status)
def fetch_to_tempfile(self, query):
temp_file = NamedTemporaryFile()
self.fetch_to_file(query, temp_file.name)
return temp_file