Jack said:
> I have tens of millions (could be more) of document in files. Each of them
> has other
> properties in separate files. I need to check if they exist, update and
> merge properties, etc.
> And this is not a one time job. Because of the quantity of the files, I
> think querying and
> updating a database will take a long time...
>
And I think you are wrong. But of course the only way to find out who's
right and who's wrong is to do some experiments and get some benchmark
timings.
All I *would* say is that it's unwise to proceed with a memory-only
architecture when you only have assumptions about the limitations of
particular architectures, and your problem might actually grow to exceed
the memory limits of a 32-bit architecture anyway.
Swapping might, depending on access patterns, cause you performance to
take a real nose-dive. Then where do you go? Much better to architect
the application so that you anticipate exceeding memory limits from the
start, I'd hazard.
> Let's say, I want to do something a search engine needs to do in terms of
> the amount of
> data to be processed on a server. I doubt any serious search engine would
> use a database
> for indexing and searching. A hash table is what I need, not powerful
> queries.
>
You might be surprised. Google, for example, use a widely-distributed
and highly-redundant storage format, but they certainly don't keep the
whole Internet in memory
Perhaps you need to explain the problem in more detail if you still need
help.
regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC/Ltd
http://www.holdenweb.com
Skype: holdenweb
http://del.icio.us/steve.holden
------------------ Asciimercial ---------------------
Get on the web: Blog, lens and tag your way to fame!!
holdenweb.blogspot.com squidoo.com/pythonology
tagged items: del.icio.us/steve.holden/python
All these services currently offer free registration!
-------------- Thank You for Reading ----------------