(e-mail address removed)>, (e-mail address removed)
says...
[ ... ]
That depends on how you represent a position in the text.
With the gap buffer, it would be an offset; iterating on the text is
increasing the offset (minus the gap handling) With scratch buffer, it
would be a buffer and an offset in the buffer; iterating on the text
is end-buffer/next-buffer.
Example:
Step 1: Initial state
Buffer="This is a text"
Scratch=""
Text=[ptr=buffer+0,size=14]
Step2: Inserting " nice " before text
Buffer="This is a text"
Scratch=" nice"
Text=[buffer+0,9],[scratch+0,5],[buffer+9,5]
It is true that then going at a specif index in the text is in o(B)
where (B is the number of element in the list) but I expect M would be
small and if movements are small, it is negligible.
Hmm...you haven't specified what "M" is supposed to represent here,
so I'm not sure what you're trying to say. In the end, however, I
think we're left with a fairly simple situation: what you're calling
"negligible" here will usually be larger than what you're objecting
to in the case of a split buffer.
True. But mitigated by the fact that with the gap buffer, there is a
copy that requires moving things in main memory and I would expect it
to be at each move since the cache will likely be flushed, given the
interaction time.
First of all, I doubt that, at least with a modern processor. Most
processors now have at least a megabyte of cache, and five to ten
megabytes is probably closer to typical. Unless you have something in
the background that's really churning through a _lot_ of data, the
two cache lines used by a split buffer are fairly likely to survive a
task switch.
Second, even if we assume that all the data involved has to be read
from main memory, we're still stuck with the fact that a split buffer
is going to deal with two contiguous areas, where chained buffers
plus change buffers is going to deal with a larger number of disjoint
areas of memory, and (ultimately) is going to simply deal with more
total data -- virtually all the data being read with a split buffer
is the actual data itself, whereas the chained buffers plus change
buffer involves things like buffers containing not only the raw data,
but also pointers to other buffers.
[ ... ]
IMO, line representation is another issue. With gap buffer, the
position of char are moving while they don't with tree representation.
I have a hard time believing that this means much, at least under
most typical circumstances. If you really think you're going to see a
'goto line N' command often enough to bother, it's pretty easy to
optimize for it reasonably well in a split buffer. Simply put, you
create an index, but instead of storing line positions, you store
line sizes. To navigate to a particular line, you add up the sizes of
lines up to that one, and if that's greater than the current cursor
position, you add the gap size. This is still linear, but on the
number of lines rather than the number of characters.
With the tree representation, it really does change -- every time you
consolidate trees. From what you've said elsethread (and I'd tend to
agree) you're going to have to do that fairly frequently to avoid
other problems -- which means you're going to be updating your line
position data fairly frequently as well.
It is also true of many buffers. Isn't it ?
Yes and no. Yes, writes have short latency and reads have long
latency in both cases. The difference is that a multiple buffer setup
does more reading to avoid doing writes -- "penny wise and pound
foolish" as the old saying goes.
The same comment as above apply since it is the same argument.
I mentioned that the contiguous buffer would be reformed upon writing;
in most cases, I expect saves happens fairly often, in which case you
are in the same situation as with the gap buffer..
You're in the same situation -- but only by doing some arbitrary
amount of extra work that you're currently ignoring. To an extent,
that's fair -- the extra work can be done asynchronously. To another
extent, it's completely unfair. To make your design competitive, you
just about _have_ to do it asynchronously. That, in turn, means you
have to write the whole thing to be thread safe.
[ ... ]
But to have something practical (I usually write quite a lot of char
when I write), I guess the gap size would exceed a cache entry and you
loose this advantage around the gap where most of the operation occur
and which, in definitive, is what is displayed to the screen.
Not so. When you're inserting text, a gap buffer is dealing with
exactly one cache line, covering the space immediately after the
cursor. You only start to use a different cache line when the user
has inserted text that completely fills the current cache line.
In the end, you're always stuck with a few simple facts: a split
buffer spends most of its time dealing with one contiguous chunk of
memory. A multiple buffer/change buffer setup uses a bare minimum of
two separate areas of memory, _and_ it almost inevitably deals with
more raw data representing the same amount of real (user-visible)
data. That means it has to read more data to do its job.
I would not expect that in in multiple-core environment: write is
expensive because of synchronisation of memory and R/W requires two
bus access which is nowadays the bottleneck.
Write is expensive when and if you need synchronization. As noted
above, that's probably important for the design you're advocating,
but (at most) is much less so for the split-buffer design.
[ ... ]
I think you will agree that, given the power of computers, even
with a very poorly designed or over-consumptive editor, the user
experience will be the same so I don't think this is an issue here.
I agree that fast enough hardware can overcome a poor design, but I
don't think that's a good reason to create a poor design.