User said:
Don't make XML files that are 250MB in size.
It isn't file created by me. File contains about 100'000 records which I
import to my program. Everything is working. Unfortunately several records
in the file have errors which I want to correct. I don't want to write
additional code to be able to correct imported data. I prefer to make some
changes in source file. Of course I could write code for editing imported
data, but I don't need this functionality except for correcting mentioned
errors. I also have no access to editor which exported mentioned xml file.
Use vim, the improved vi editor. I have edited such
large XML files with vi several times ....
Thanks! I've checked it and it's good solution for me.
With this configuration:
- set enc=utf-8 (UTF-8 encoding)
- set undolevels=-1 (maybe with this vim is faster ...)
efficiencies for subtasks of editing in gvim are:
- opening 250MB xml file: 15 seconds
- searching word (case sensitive): to 20 seconds (depending on its place
in file)
In my opinion it could be better because for example in Total
Commander's default viewer it takes only 2 seconds!
But it is acceptable, because I want only to make a few dozen of
changes.
- going to specified line of the file by specifying line number or by
draging vertical slider by mouse: veeeery long, so don't do this!
- making small changes (for example inserting and deleting some lines of
text; writing something): fluently
- writing changes to file (for example when we will do all changes): 15
seconds
I have Athlon 2500 with 1GB RAM. gvim uses only 300MB, so 512MB of RAM were
free.
... and you hardly
notice the difference between 10 MB and 200 MB files.
Current versions of vim (when configured properly)
can also edit any UTF-8 characters, for example Japanese.
I can notice difference between searches which take 2 seconds and 20
seconds
But you are right that "making small changes (for example
inserting and deleting some lines of text; writing something)" is very fast.
Ather alternative is a stream editor -- the Unix tool "sed" or
something equivalent. Downside of that is that it isn't interactive; you
have to essentially write a program that tells it how to find the points
you want changed and what you want done with them.
I would prefer something interactive, because every change will be different
.... I dont want to write a program every time ...
Or find/write a tool that will handle your document in chunks, either
text-based or SAX-based. Again, that presumes that what you're doing
divides up nicely.
Unfortunatelly I can't find such a tool ...
If you're on Windows you could try TextPad (you can get a full-featured
evaluation version to test) or EmEditor (free standard version with
most features).
Here are statistics with default configuration:
- opening 250MB xml file: 70 seconds
- searching word at end of file: 45 seconds
- draging vertical slider by mouse: fluently
- making small changes (for example inserting and deleting some lines of
text; writing something): sometimes 0.5 second, sometimes 30 seconds
((
30 seconds is long, but maybe it will be acceptable for someone ...
- writing changes to file (for example when we will do all changes): not
tested
P.S. Sorry for errors, my English isn't good.