I
Ian Bicking
Paul said:I just looked at c2; it has about 30k pages (I'd call this medium
sized) and finds incoming links pretty fast. Is it using MoinMoin?
It doesn't look like other MoinMoin wikis that I know of. I'd like to
think it's not finding those incoming links by scanning 30k separate
files in the file system.
c2 is the Original Wiki, i.e., the first one ever, and the system that
coined the term. It's written in Perl. It's a definitely not an
advanced Wiki, and it's generally relied on social rather than technical
solutions to problems. Which might be a Wiki principle in itself.
While I believe it used full text searches for things like backlinks in
the past, I believe it uses some kind of index now.
Sometimes I think a wiki could get by with just a few large files.
Have one file containing all the wiki pages. When someone adds or
updates a page, append the page contents to the end of the big file.
That might also be a good time to pre-render it, and put the rendered
version in the big file as well. Also, take note of the byte position
in the big file (e.g. with ftell()) where the page starts. Remember
that location in an in-memory structure (Python dict) indexed on the
page name. Also, append the info to a second file. Find the location
of that entry and store it in the in-memory structure as well. Also,
if there was already a dict entry for that page, record a link to the
old offset in the 2nd file. That means the previous revisions of a
file can be found by following the links backwards through the 2nd
file. Finally, on restart, scan the 2nd file to rebuild the in-memory
structure.
That sounds like you'd be implementing your own filesystem
If you are just trying to avoid too many files in a directory, another
option is to put files in subdirectories like:
base = struct.pack('i', hash(page_name))
base = base.encode('base64').strip().strip('=')
filename = os.path.join(base, page_name)