OT: MoinMoin and Mediawiki?

P

Paul Rubin

I need to set up a wiki for a small group. I've played with MoinMoin
a little bit and it's reasonably straightforward to set up, but
limited in capabilities and uses BogusMarkupConventions. I want to
use it anyway because I need something running right away and I don't
want to spend a whole lot of time messing with it.

In the larger world, though, there's currently One True wiki package,
namely Mediawiki (used by Wikipedia). Mediawiki is written in PHP and
is far more complex than MoinMoin, plus it's database backed, meaning
you have to run an SQL server as well as the wiki software itself
(MoinMoin just uses the file system). Plus, I'll guess that it really
needs mod_php, while MoinMoin runs tolerably as a set of cgi's, at
least when traffic is low. I'll say that I haven't actually looked at
the Mediawiki code, though I guess I should do so.

What I'm getting at is I might like to install MoinMoin now and
migrate to Mediawiki sometime later. Anyone have any thoughts about
whether that's a crazy plan? Should I just bite the bullet and run
Mediawiki from the beginning? Is anyone here actually running
Mediawiki who can say just how big a hassle it is?

There are actually two wikis I want to run, one of which I need
immediately, but which will be private, low traffic and stay that way.
The other one will be public and is planned grow to medium size (a few
thousand active users), but which I don't need immediately. I
definitely want the second one to eventually run Mediawiki. I can
probably keep the first one on MoinMoin indefinitely, but that would
mean I'm eventually running two separate wiki packages, which gets
confusing.
 
B

Brion Vibber

Paul said:
Mediawiki is written in PHP and
is far more complex than MoinMoin, plus it's database backed, meaning
you have to run an SQL server as well as the wiki software itself
(MoinMoin just uses the file system). Plus, I'll guess that it really
needs mod_php, while MoinMoin runs tolerably as a set of cgi's, at
least when traffic is low.

MediaWiki should run with PHP configured in CGI handler mode, but these
days mod_php has got its claws just about everywhere anyway. If you
control your own server and don't have multi-user security worries,
mod_php is simple enough to install and will probably perform better.

For performance I also highly recommend using Turck MMCache or
equivalent PHP bytecode cache extension. Unlike Python, saving compiled
bytecode is not the default behavior of PHP, and for non-trivial scripts
compilation eats up a lot of runtime.
I'll say that I haven't actually looked at
the Mediawiki code, though I guess I should do so.

Cover your eyes...! it _is_ PHP after all. ;)
What I'm getting at is I might like to install MoinMoin now and
migrate to Mediawiki sometime later. Anyone have any thoughts about
whether that's a crazy plan? Should I just bite the bullet and run
Mediawiki from the beginning? Is anyone here actually running
Mediawiki who can say just how big a hassle it is?

I would generally recommend you just start with MediaWiki if you intend
to use it. To migrate a non-tiny site later you'll need to work out a
migration script to import your data in some way (some people have asked
about this in the past, I don't know if anyone's ever completed one or
made it public).

On the other hand if you _do_ write a MoinMoin-to-MediaWiki conversion
script (or vice-versa!) we'd love to include it in the MediaWiki
distribution.

-- brion vibber (brion @ pobox.com)
 
E

Eric Pederson

Paul said:
What I'm getting at is I might like to install MoinMoin now and
migrate to Mediawiki sometime later. Anyone have any thoughts about
whether that's a crazy plan?


Disclaimer, I am neither using Moinmoin nor Mediawiki, and don't really have your answer.
From what I read, Mediawiki stores each page as "wikitext" in a MySQL database; wikitext "is a mixture of content, markup, and metadata."

It seems essentially what you'd need for migration is a mapping function and I do not know how complex the mapping between the systems would be. I could imagine migrating from Moinmoin to Mediawiki via a script looping through the Moinmoin files in a directory, modifying a copy of each, and storing them in MySQL.

I suspect it's less painful to just start with the wiki you want to end up with, but if you're going to migrate between the two, won't Python come in handy! ;-)



Eric Pederson
http://www.songzilla.blogspot.com
:::::::::::::::::::::::::::::::::::
domainNot="@something.com"
domainIs=domainNot.replace("s","z")
ePrefix="".join([chr(ord(x)+1) for x in "do"])
mailMeAt=ePrefix+domainIs
:::::::::::::::::::::::::::::::::::
 
P

Paul Rubin

Brion Vibber said:
MediaWiki should run with PHP configured in CGI handler mode, but
these days mod_php has got its claws just about everywhere anyway. If
you control your own server and don't have multi-user security
worries, mod_php is simple enough to install and will probably perform
better.

Thanks, yes, I could run a special apache instance with mod_php
installed. I'm pretty good with apache. I have no MySQL admin
experience but I suppose enough people are using MySQL that the
installation procedures and docs are pretty well developed and I can
follow the instructions.

What I'm wondering is just how big an adventure I'd be setting off on,
simply to get MediaWiki itself installed, configured, and running.
Any thoughts about that?
For performance I also highly recommend using Turck MMCache or
equivalent PHP bytecode cache extension. Unlike Python, saving
compiled bytecode is not the default behavior of PHP, and for
non-trivial scripts compilation eats up a lot of runtime.

Hmm, that's something I could deal with later, I guess. Is that
similar to what Zend does?
Cover your eyes...! it _is_ PHP after all. ;)

Heehee. I like PHP just fine for small projects. I just cringe at
the notion of something as complex as MediaWiki being written in PHP
and am constantly, involuntarily thinking about how I would do it in
Python. I can't help myself. Without looking at even a line of
WikiMedia's code, I already want to do a total rewrite ;-).
I would generally recommend you just start with MediaWiki if you
intend to use it. To migrate a non-tiny site later you'll need to work
out a migration script to import your data in some way (some people
have asked about this in the past, I don't know if anyone's ever
completed one or made it public).

You're probably right, I'll download Wikimedia and see about
installing it. I have tons of server disk space, though the CPU has
been getting a bit overloaded lately.
On the other hand if you _do_ write a MoinMoin-to-MediaWiki
conversion script (or vice-versa!) we'd love to include it in the
MediaWiki distribution.

I think a rough approximation would be pretty easy to do. Trying to
get every detail right would be very difficult. If I do something like
that, I'll likely go for the rough approximation.
 
A

Alexander Schremmer

I need to set up a wiki for a small group. I've played with MoinMoin
a little bit and it's reasonably straightforward to set up, but
limited in capabilities and uses BogusMarkupConventions.

At which point do you see limitations?
And what of the markup don't you like?
In the larger world, though, there's currently One True wiki package,
namely Mediawiki (used by Wikipedia).

It is just very famous because of Wikipedia IMHO.
Mediawiki is written in PHP and
is far more complex than MoinMoin, plus it's database backed, meaning
you have to run an SQL server as well as the wiki software itself
(MoinMoin just uses the file system).

Having a DBMS backend is good in your opinion? It has some severe
disadvantages like not easy to scale (you would need to setup DBMS
replication), two potential points of failure, more complex setup, bigger
memory requirements, etc.
Plus, I'll guess that it really
needs mod_php, while MoinMoin runs tolerably as a set of cgi's, at
least when traffic is low.

Both should run fine in CGI mode I guess.
What I'm getting at is I might like to install MoinMoin now and
migrate to Mediawiki sometime later. Anyone have any thoughts about
whether that's a crazy plan?

If you really want to use the wiki for content, you have to agree on a
markup style. You could use an independent one (like RestructuredText) and
hope that MediaWiki supports it (MoinMoin does). Or you end up writing
complex migration scripts just for the markup.

Finding some script which just imports the data files into the DB should be
rather easy.
Should I just bite the bullet and run Mediawiki from the beginning?

IMHO you should not bite it but choose MoinMoin. :)
Is anyone here actually running Mediawiki who can say just how
big a hassle it is?

A few months I tried to install it. I got it running. But I did not like
the necessary complex administration tasks.
The other one will be public and is planned grow to medium size (a few
thousand active users), but which I don't need immediately. I
definitely want the second one to eventually run Mediawiki. I can
probably keep the first one on MoinMoin indefinitely, but that would
mean I'm eventually running two separate wiki packages, which gets
confusing.

There are even MoinMoin sites that are as big as that. Maybe you should
rethink your kind of prejudice and re-evaluate MoinMoin.

Kind regards,
Alexander
 
P

Paul Rubin

Alexander Schremmer said:
At which point do you see limitations?

It doesn't have features that MW has, like user pages, lists of
incoming links to wiki pages, automatic discussion links for every
wiki page, automatic update notification for specific pages of your
choice, support for managing image uploads and embedding them
into wiki pages, etc. etc.
And what of the markup don't you like?

The BogusMixedCaseLinkNames. I'd rather have ordinary words with
spaces between them, like we use in ordinary writing.
It is just very famous because of Wikipedia IMHO.

Well, it's gotten a lot more development attention because of that
same Wikipedia.
Having a DBMS backend is good in your opinion? It has some severe
disadvantages like not easy to scale (you would need to setup DBMS
replication), two potential points of failure, more complex setup, bigger
memory requirements, etc.

I didn't say that it was good, in fact I was listing it as a
disadvantage there. I think for a small wiki like I was discussing,
it's just an extra administrative hassle. For a large wiki though,
MoinMoin's approach is completely unworkable and MoinMoin's
documentation actually says so. First of all MoinMoin uses a separate
subdirectory for every page, and all those subdirs are in a flat top
level directory, so if you have 100,000 wiki pages, the top level
directory has that many subdirs. Most file systems are not built to
handle such large directories with any reasonable speed. (Also, every
revision has its own file in the subdir. Lots of Wikipedia pages have
thousands of revisions). Second, DBMS's have indexes and
transactions, that make it simple to have features like "what links
here". Yes you could do something like that in MoinMoin with
additional files, but then you'd have to update multiple files when
you commit a change, which can leave stuff inconsistent if there's a
crash partway through the update (I wonder just how crash-resilient
MoinMoin is right now, even). The DBMS can also handle stuff like
replication automatically.
If you really want to use the wiki for content, you have to agree on
a markup style. You could use an independent one (like
RestructuredText) and hope that MediaWiki supports it (MoinMoin
does). Or you end up writing complex migration scripts just for the
markup.

I looked at RestructuredText once and hated it. WikiMedia's markup
language has bogosities just like anything else, but for the most part
it's not too bad. Anyway, lots more people are used to it than any
other Wiki markup language, just because of Wikipedia's popularity.
A few months I tried to install it. I got it running. But I did not like
the necessary complex administration tasks.

I'm not too surprised. That's why MoinMoin was the first one I tried.
There are even MoinMoin sites that are as big as that. Maybe you should
rethink your kind of prejudice and re-evaluate MoinMoin.

I don't doubt there are MoinMoin sites that size, but with that large
a user base, I want something nicer looking than those
StupidMixedCasePageNames.
 
R

Richie Hindle

[Paul]
[MoinMoin] doesn't have [...] automatic update notification for
specific pages of your choice

Yes it does. See http://entrian.com/sbwiki for example - register there
and you'll see in your preferences "Subscribed wiki pages (one regex per
line)"
The BogusMixedCaseLinkNames. I'd rather have ordinary words with
spaces between them, like we use in ordinary writing.

MoinMoin has an option to display WikiWords with spaces between them
(albeit still capitalised), again in the user preferences.

I'm not saying that MoinMoin is better than MediaWiki, just that it really
does have some of the features you say it doesn't (perhaps you've been
looking at an old version).
 
P

Paul Rubin

Richie Hindle said:
[MoinMoin] doesn't have [...] automatic update notification for
specific pages of your choice

Yes it does. See http://entrian.com/sbwiki for example - register there
and you'll see in your preferences "Subscribed wiki pages (one regex per

Oh interesting, thanks.
MoinMoin has an option to display WikiWords with spaces between them
(albeit still capitalised), again in the user preferences.

Oh good, that's cool too, though this goes to show that the MoinMoin
documentation could stand a lot of improvement (this holds in other
areas as well). I don't understand why using spaces is the default.
Who except for demented Java programmers really likes this MixedCase
nonsense? Is there a way to turn it off altogether, so you can use
mixed case words on wiki pages without automatically generating a
link?
I'm not saying that MoinMoin is better than MediaWiki, just that it really
does have some of the features you say it doesn't (perhaps you've been
looking at an old version).

I'm not saying MoinMoin is "worse" than MediaWiki, just as I wouldn't
say a rowboat is worse than an aircraft carrier. MediaWiki is the
only choice that makes any sense for large wikis (say > 20k pages).
For small ones, MoinMoin is easier to operate in many ways, which is a
big plus.

It would be nice if MoinMoin changed its markup language to be closer
to MediaWiki's. I edit MediaWiki pages all the time and I hate having
to switch back and forth between languages (actually that's another
reason I've reacted not-so-well to MoinMoin's language). I think I
saw something in the MoinMoin docs saying that some standard was being
developed that would look like MediaWiki, and that MoinMoin would
support the standard, so I guess this will work out in the long run.
 
R

Robin Becker

Paul said:
Thanks, yes, I could run a special apache instance with mod_php
installed. I'm pretty good with apache. I have no MySQL admin
experience but I suppose enough people are using MySQL that the
installation procedures and docs are pretty well developed and I can
follow the instructions.

What I'm wondering is just how big an adventure I'd be setting off on,
simply to get MediaWiki itself installed, configured, and running.
Any thoughts about that?
......

A few months ago I tried and failed to get squirrelmail/php to run with Apache2
and freeBSD 4.9. It seems that php prefers the old style apache 1.3 work flow. I
got some help from the php guys, but not enough. I suppose I could have run a
separate apache13 server, but that seems like a cop out to me. We don't want to
maintain an extra set of configs etc etc.

Mailman, moinmoin and others work fine with apache2 maybe because they use a cgi
style interface. I would stick with a pythonic solution unless there's a good
reason not too.
-too old to learn a new language-ly yrs-
Robin Becker
 
P

Paul Rubin

Robin Becker said:
A few months ago I tried and failed to get squirrelmail/php to run
with Apache2 and freeBSD 4.9. It seems that php prefers the old style
apache 1.3 work flow. I got some help from the php guys, but not
enough. I suppose I could have run a separate apache13 server, but
that seems like a cop out to me. We don't want to maintain an extra
set of configs etc etc.

I think mod_php doesn't play nice with apache2 but am not aware of any
cgi interoperability problems. I ran squirrelmail/php as an apache
1.3 cgi a while back (low enough traffic to not get killed by cgi
overhead), if that helps. Note that squirrelmail itself has a bunch
of security bugs that the maintainers refuse to acknowledge as bugs.
Anyway, I'm still using apache 1.3 (haven't had a reason to modernize)
so I can run mod_php if I need to.
Mailman, moinmoin and others work fine with apache2 maybe because they
use a cgi style interface. I would stick with a pythonic solution
unless there's a good reason not too.

Yes, I certainly prefer Python to PHP.
 
A

Alexander Schremmer

It doesn't have features that MW has, like user pages,

It does.
lists of incoming links to wiki pages,

It does.
automatic discussion links for every wiki page,

Can be easily obtained manually. Just include another page where discussion
is going on. You can even protect it with different ACL config etc.
automatic update notification for specific pages of your
choice,

It does.
support for managing image uploads and embedding them
into wiki pages,

It does.
The BogusMixedCaseLinkNames. I'd rather have ordinary words with
spaces between them, like we use in ordinary writing.

Of course it does. It does not even require [[ but [" at the beginning of
the link which might be more intuitive to the writer.
Well, it's gotten a lot more development attention because of that
same Wikipedia.

Yeah, but you know: Too many cooks spoil the broth.
I didn't say that it was good, in fact I was listing it as a
disadvantage there. I think for a small wiki like I was discussing,
it's just an extra administrative hassle. For a large wiki though,
MoinMoin's approach is completely unworkable and MoinMoin's
documentation actually says so. First of all MoinMoin uses a separate
subdirectory for every page, and all those subdirs are in a flat top
level directory, so if you have 100,000 wiki pages, the top level
directory has that many subdirs. Most file systems are not built to
handle such large directories with any reasonable speed.

But many other are.
(Also, every revision has its own file in the subdir. Lots of Wikipedia pages have
thousands of revisions).

Yeah, same point. But if you have a working btree+ etc. implementation in
your FS, then it should not a problem. Personally, I just see another
problem at this point: paths might get very long if your page names are
long. This might result in problems on Windows. But you have to use very
long page names to hit the 256 char limit.
The DBMS can also handle stuff like replication automatically.

But this is non-trivial.
I don't doubt there are MoinMoin sites that size, but with that large
a user base, I want something nicer looking than those
StupidMixedCasePageNames.

See above. Just because most people still work with CamelCase, they are not
the only solution. ["Stupid mixed case page names"] is a completly valid
link in MoinMoin and has been one for years.

Kind regards,
Alexander
 
P

Paul Rubin

Alexander Schremmer said:

Oops, correct, however, they're not anything like MW's, which are
almost like an internal email system inside the wiki. You can sign
any comment with ~~~ or ~~~~ and it generates a link back to your
userpage and anyone who clicks the link can leave you a message by
clicking "+" at the top of your user page or any section of it.
You're then notified automatically at the top of every page you visit,
that someone has left you a new message.

(MoinMoin also gives no apparent way to edit sections of pages, which
is a big help for editing large Wikimedia pages).

Huh? I don't see those. How does it store them, that's resilient
across crashes? Or does it just get wedged if there's a crash?
Can be easily obtained manually. Just include another page where
discussion is going on. You can even protect it with different ACL
config etc.

If you have to do it manually, that's very much reduced usability
compared to how WM does it. With WM, the link is automatically
generated and always in the same place on the page.

OK. I didn't find this but will take your word for it.

Um, how? I see you can attach files to pages and link to them, but I
mean having version history and discussion section like regular wiki
pages etc. All I found with MoinMoin attached files was that if you
wanted to change one, you had to delete the old one and replace it
with a new one.
The BogusMixedCaseLinkNames. I'd rather have ordinary words with
spaces between them, like we use in ordinary writing.

Of course it does. It does not even require [[ but [" at the beginning of
the link which might be more intuitive to the writer.

Hmm, maybe. [[ is certainly easier to type. It would also help to
have that navigation field on every page, where you could type the
name of a page you wanted to jump to.
Yeah, but you know: Too many cooks spoil the broth.

I don't know what the WM code is like and I can certainly believe that
it's an awful mess (in PHP even). However it seems to me that they're
doing a good job of usability engineering on the front side, and the
page layout is much more attractive than MoinMoin's. That's not trying
to knock MoinMoin, but Wikipedia is one of the top few hundred sites
on the web, and it's getting a LOT of attention to detail.
But many other are.

Best to just use a database and be done with it, than depend on magic
properties of the file system. Or else go really berserk and replace
it with something more performance-intensive.
But this is non-trivial.

Well yes, of course, that's why leaving it to a DBMS is helpful.
See above. Just because most people still work with CamelCase, they are not
the only solution. ["Stupid mixed case page names"] is a completly valid
link in MoinMoin and has been one for years.

Well, they should fix the defaults. Can you set it to not automatically
convert CamelCase words into links?
 
P

Peter Maas

Alexander said:
Having a DBMS backend is good in your opinion? It has some severe
disadvantages like not easy to scale (you would need to setup DBMS
replication), two potential points of failure, more complex setup, bigger
memory requirements, etc.

So nobody should use DBMS backends, right? Except those poor guys
who need transactions, consistency rules, indexes, a standard query
language, ... ;)

What do you mean with scaling? If you multiply CPU and RAM by 8, good
DBMS deliver nearly eightfold performance. If you distribute a DBMS on
an 8 computer cluster, the DBMS will scale. If you distribute a DBMS
on 8 arbitrary unrelated computers, you have 8 DBMSs which need to
synchronize their data by replication. Do applications with file based
persistence scale better?

Since files need no setup beyond creation, every setup is complex
compared to files ;) See e.g. the setup of an PostgreSQL DBMS:

../configure
gmake
su
gmake install
adduser postgres
mkdir /usr/local/pgsql/data
chown postgres /usr/local/pgsql/data
su - postgres
/usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data
/usr/local/pgsql/bin/postmaster -D /usr/local/pgsql/data >logfile 2>&1 &
/usr/local/pgsql/bin/createdb test
/usr/local/pgsql/bin/psql test

The first four lines are the same for every source based distribution,
only 8 lines are PostgreSQL specific. I don't think this is too
complex.
 
B

Brion Vibber

Paul said:
I think mod_php doesn't play nice with apache2 but am not aware of any
cgi interoperability problems.

Generally it's recommended to configure apache2 in the child process
mode (eg the way that 1.3 works) when using PHP as many library modules
are alleged not to be threadsafe. Some Linux distributions ship standard
this way.

-- brion vibber (brion @ pobox.com)
 
D

David M. Cooke

Paul Rubin said:
Huh? I don't see those. How does it store them, that's resilient
across crashes? Or does it just get wedged if there's a crash?

Most Wiki implementations (MoinMoin included) have this, by using a
search. Usually, following the original Wiki (http://c2.com/cgi/wiki)
model, you get at it by clicking on the title of the page.

Searching instead of indexing makes it very resilient :)
 
P

Paul Rubin

[email protected] (David M. Cooke) said:
...
Most Wiki implementations (MoinMoin included) have this, by using a
search. Usually, following the original Wiki (http://c2.com/cgi/wiki)
model, you get at it by clicking on the title of the page.

Searching instead of indexing makes it very resilient :)

How does it do that? It has to scan every page in the entire wiki?!
That's totally impractical for a large wiki.
 
R

Robin Becker

Brion said:
Generally it's recommended to configure apache2 in the child process
mode (eg the way that 1.3 works) when using PHP as many library modules
are alleged not to be threadsafe. Some Linux distributions ship standard
this way.
....
unfortunately mod_python3 seems to need exactly the opposite ie apache2
with threads. However, I originally tried to get php going with apache2
in the standard mode and still had problems.
 
A

Alexander Schremmer

On 11 Jan 2005 21:24:51 -0800, Paul Rubin wrote:

[backlinks]
How does it do that? It has to scan every page in the entire wiki?!
That's totally impractical for a large wiki.

So you want to say that c2 is not a large wiki? :)

Kind regards,
Alexander
 
P

Paul Rubin

Alexander Schremmer said:
So you want to say that c2 is not a large wiki? :)

I don't know how big c2 is. My idea of a large wiki is Wikipedia.
My guess is that c2 is smaller than that.
 
P

Paul Rubin

Paul Rubin said:
I don't know how big c2 is. My idea of a large wiki is Wikipedia.
My guess is that c2 is smaller than that.

I just looked at c2; it has about 30k pages (I'd call this medium
sized) and finds incoming links pretty fast. Is it using MoinMoin?
It doesn't look like other MoinMoin wikis that I know of. I'd like to
think it's not finding those incoming links by scanning 30k separate
files in the file system.

Sometimes I think a wiki could get by with just a few large files.
Have one file containing all the wiki pages. When someone adds or
updates a page, append the page contents to the end of the big file.
That might also be a good time to pre-render it, and put the rendered
version in the big file as well. Also, take note of the byte position
in the big file (e.g. with ftell()) where the page starts. Remember
that location in an in-memory structure (Python dict) indexed on the
page name. Also, append the info to a second file. Find the location
of that entry and store it in the in-memory structure as well. Also,
if there was already a dict entry for that page, record a link to the
old offset in the 2nd file. That means the previous revisions of a
file can be found by following the links backwards through the 2nd
file. Finally, on restart, scan the 2nd file to rebuild the in-memory
structure.

With a Wikipedia-sized wiki, the in-memory structure will be a few
hundred MB and the 2nd file might be a few GB. On current 64-bit
PC's, neither of these is a big deal. The 1st file might be several
TB, which might not be so great; a better strategy is needed, left as
an exercise (various straightforward approaches suggest themselves).
Also, the active pages should be cached in ram. For a small wiki (up
to 1-2 GB) that's no big deal, just let the OS kernel do it or use
some LRU scheme in the application. For a large wiki, the cache and
possibly the page store might be spread across multiple servers using
some pseudo-RDMA scheme.

I think the WikiMedia software is sort of barely able to support
Wikipedia right now, but it's pushing its scaling limits. Within a
year or two, if the limits can be removed, Wikipedia is likely to
reach at least 10 times its present size and 1000 times its traffic
volume. So the question of how to implement big, high-traffic wikis
has been on my mind lately. Specifically I ask myself how Google
would do it. I think it's quite feasible to write Wiki software that
can handle this amount of load, but none of the current stuff can
really do it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,996
Messages
2,570,238
Members
46,826
Latest member
robinsontor

Latest Threads

Top