Looking for a design pattern

M

Michael Doubez

[ ... ]
That has the sake of simplicity but if you maintain a list of
modification (like rope or another list structure with modifications
stored in a scratch buffer) until next write to file then insertion
and deletion seems cheap to me.
I have never tried to write a text editor but this approach appeals
me more than gap buffer.

I can't quite understand why. Offhand, I can only think of one
operation (pasting a large chunk of text) that it _might_ make faster
(if you could just link in the pasted block instead of copying it).
In general, it sounds to me like a lose-lose situation -- slower
operation, and more complex implementation.

How do you come up with slower operation ?

In the cursor gap case, I have a contiguous representation (except the
gap) requiring a copy at each move.

In the scratch buffer case, I have a tree structure with no cost of
insertion or deletion except maintaining the tree and perhaps some
merge logic to avoid too much fragmentation.
The only of those that's of any concern at all is moving between
marks. To move between marks, you copy all the text between the marks
from one side of the gap to the other. With marks more than a few
tens of megabytes apart, it's going to be difficult to implement that
in the 100 ms (or so) to be perceived as "instant". A couple hundred
ms isn't too terrible, but much beyond that can start to bother the
user.

I don't speak about user experience. The sheer raw power of computer
makes everything possible (once the text is in memory). I agree that I
will always type more slowly than any processor and in the case you
mention, the editor can always display the text before copying.

The rest are simply too small to care about. Figure a line is <80
characters, usually being moved around in the cache. Such a movement
is usually going to be in the sub-microsecond range.


The gap is always at the cursor, from the time the file is loaded.

My point here is that a lot of editing is reading/moving hence a lot
of useless copy occur; while this would not be normally a concern, it
bothers me somewhat - perhaps a belief I have to get rid of. All the
most since moving/writing are usually well defined phase of editing
(like in vi).
In the end editing/word processing is interactive -- a (very) soft
real-time process. In most cases, what matters is fast _enough_
response; faster doesn't hurt, but rarely matters much either.

But implementation is done only once and the added complexity is
really small. IMO it wouldn't even be noticed on a budget.

My question is regarding the design, not the user experience.
 
J

James Kanze

[ ... ]
That has the sake of simplicity but if you maintain a list
of modification (like rope or another list structure with
modifications stored in a scratch buffer) until next write
to file then insertion and deletion seems cheap to me.
I have never tried to write a text editor but this approach
appeals me more than gap buffer.
I can't quite understand why.

There's a quality rope class available on the internet. I don't
know of a readily available gap buffer class.
Offhand, I can only think of one operation (pasting a large
chunk of text) that it _might_ make faster (if you could just
link in the pasted block instead of copying it). In general,
it sounds to me like a lose-lose situation -- slower
operation, and more complex implementation.

Implementing a gap buffer isn't that trivial, either.
The only of those that's of any concern at all is moving
between marks. To move between marks, you copy all the text
between the marks from one side of the gap to the other. With
marks more than a few tens of megabytes apart, it's going to
be difficult to implement that in the 100 ms (or so) to be
perceived as "instant". A couple hundred ms isn't too
terrible, but much beyond that can start to bother the user.

Just out of curiousity, I ran a couple of tests on my machine,
with a buffer containing 10 million characters, over a total
buffer size of 12,5 million characters. I get the following
times:

Move the cursor from the position 10 to 10 before the end, and
back, alternately, in a gap buffer:
11.2 ms per move with char
28,0 ms per move with wchar_t
Insert or delete at the position 10, alternatively ("flat"
buffer:
6.9 ms per operation with char
16.8 ms per operation with wchar_t
Those aren't what I'd consider prohibitive times. And the
simplest implementation is clearly the flat buffer, which is in
practice probably faster than the gap buffer for the most
frequent operations, and acceptably fast for the least frequent.

(Admittedly, this is a machine which was installed only a few
weeks ago: A PC running under Ubuntu 9.04 (Linux 2.6), with
3.2GB real memory and four Intel Xeon processor cores at
2.00GHz.)

[...]
The gap is always at the cursor, from the time the file is
loaded.

And what do you do if there are several cursors? (A frequent
case, at least in the code I work on.)
In the end editing/word processing is interactive -- a (very)
soft real-time process. In most cases, what matters is fast
_enough_ response; faster doesn't hurt, but rarely matters
much either.

Agreed. And since just using an std::vector in the most obvious
manner seems fast enough, and is surely the simplest solution...
 
A

Alf P. Steinbach

* James Kanze:
[ ... ]
That has the sake of simplicity but if you maintain a list
of modification (like rope or another list structure with
modifications stored in a scratch buffer) until next write
to file then insertion and deletion seems cheap to me.
I have never tried to write a text editor but this approach
appeals me more than gap buffer.
I can't quite understand why.

There's a quality rope class available on the internet. I don't
know of a readily available gap buffer class.
Offhand, I can only think of one operation (pasting a large
chunk of text) that it _might_ make faster (if you could just
link in the pasted block instead of copying it). In general,
it sounds to me like a lose-lose situation -- slower
operation, and more complex implementation.

Implementing a gap buffer isn't that trivial, either.
The only of those that's of any concern at all is moving
between marks. To move between marks, you copy all the text
between the marks from one side of the gap to the other. With
marks more than a few tens of megabytes apart, it's going to
be difficult to implement that in the 100 ms (or so) to be
perceived as "instant". A couple hundred ms isn't too
terrible, but much beyond that can start to bother the user.

Just out of curiousity, I ran a couple of tests on my machine,
with a buffer containing 10 million characters, over a total
buffer size of 12,5 million characters. I get the following
times:

Move the cursor from the position 10 to 10 before the end, and
back, alternately, in a gap buffer:
11.2 ms per move with char
28,0 ms per move with wchar_t

This is a bad implementation.

The operation involves copying some 20 bytes.

You should not get into millisecond range for that.


Cheers & hth.,

- Alf
 
J

Jorgen Grahn

Alf P. Steinbach a écrit : ....

I had a look on wikipedia ... as clear as the bottom of my socks. But it
looks interesting. Definitely will get a look at "The Craft of Text
Editing" (if I can find a working link).

If you can't, drop me a line. I have a copy of it.

/Jorgen
 
J

Jorgen Grahn

I cannot believe :

1. I got an answer in a couple of minutes,
2. You managed to decipher the description of my issue,

Well, it /is/ the first pattern in the GoF book, and the example they
use is characters in a word processor ...
3. There is actually a pattern very well detailed, and with a very
intuitive name

Thanks Alf !

There's also a flyweight utility in Boost, but I don't know how
relevant it is.

/Jorgen
 
M

Michael Doubez

* James Kanze: [snip]
Just out of curiousity, I ran a couple of tests on my machine,
with a buffer containing 10 million characters, over a total
buffer size of 12,5 million characters.  I get the following
times:
Move the cursor from the position 10 to 10 before the end, and
back, alternately, in a gap buffer:
    11.2 ms per move with char
    28,0 ms per move with wchar_t

This is a bad implementation.

The operation involves copying some 20 bytes.

You should not get into millisecond range for that.

I think you didn't read correctly the test: he moved from index 10 to
index size-10 back and forth. Which means a copy of 10Mb - 10b per
move.
 
A

Alf P. Steinbach

* Michael Doubez:
* James Kanze: [snip]
Just out of curiousity, I ran a couple of tests on my machine,
with a buffer containing 10 million characters, over a total
buffer size of 12,5 million characters. I get the following
times:
Move the cursor from the position 10 to 10 before the end, and
back, alternately, in a gap buffer:
11.2 ms per move with char
28,0 ms per move with wchar_t
This is a bad implementation.

The operation involves copying some 20 bytes.

You should not get into millisecond range for that.

I think you didn't read correctly the test:

I did.

he moved from index 10 to
index size-10 back and forth.
Yes.


Which means a copy of 10Mb - 10b per move.

No.



Cheers & hth.,

- Alf
 
J

Jerry Coffin

[ ... ]
How do you come up with slower operation ?

Consider, just for example, displaying the current "screen" of
information. With a split buffer, you have one possible
discontinuity: at the cursor. When the cursor is visible, you grab
one contiguous block from the beginning of the screen to the
beginning of the gap, and another contiguous block from the end of
the gap to the end of the screen. Pretty trivial really.

With a list structure with modifications stored in a scratch buffer,
you need to grab the base data from the (noncontiguous) list, and
then grab the changes from the scratch buffer, apply the changes from
the scratch buffer to the base data, so you have a contiguous buffer
holding the current version of the text for display.

From a purely algorithmic viewpoint, that's clearly more work. From a
viewpoint of locality of reference, it appears likely that you'd
refer to more different blocks of memory as well.
In the cursor gap case, I have a contiguous representation (except the
gap) requiring a copy at each move.

Yes, but the amount of copying is trivial, and most of the time
involved can't really be avoided anyway. Just for example, assume you
move right one word. You have (nearly) no choice but to read bytes
until you get to a byte that's whitespace. Doing a copy just means
that as you read each byte, you also put it into a write buffer.

You can only avoid reading data if you have some pointer directly to
the next place to go -- e.g. if you start with a vector of linked
list of lines, then it's simple and straightforward to move by lines
without reading the data in each line. Absent that, you need to read
the data to find where you're going -- and once you've read the data
into cache, writing is usually almost free, since it can be done
asynchronously (i.e. you put the data into a write queue and move
on).

We're left with only a few rather cases where the copying matters: 1)
move to beginning or end of file, 2) move to a mark.

As James Kanze already pointed out, even those are pretty easy to
handle fast enough, even for fairly outrageous file sizes (~10 MiB).
In the scratch buffer case, I have a tree structure with no cost of
insertion or deletion except maintaining the tree and perhaps some
merge logic to avoid too much fragmentation.

But, 1) unless you have a pointer from one word/line/whatever to the
next, you still need to read through the data to find that next word,
line, or whatever, and 2) while you're doing it, you have to apply
the changes from the scratch buffer. The only part you're avoiding is
the writing -- which, as already noted, is the cheapest part of the
whole operation in most cases.

[ ... ]
My point here is that a lot of editing is reading/moving hence a
lot of useless copy occur; while this would not be normally a
concern, it
bothers me somewhat - perhaps a belief I have to get rid of. All the
most since moving/writing are usually well defined phase of editing
(like in vi).

I think the problem is that you're more or less ignoring a few basic
facts, such as:
1) Locality of reference matters -- even a few references to main
memory are expensive compared to lots of references to cache.
2) In small chunks, reading is much more expensive than writing.
But implementation is done only once and the added complexity is
really small. IMO it wouldn't even be noticed on a budget.

My question is regarding the design, not the user experience.

I think for something like a word processor, attempting to design
without taking the user experience into account is a poor idea.
 
J

Jerry Coffin

* James Kanze: [...]
It's a *simple* structure that yields O(1) insertion and
deletion at the cursor position and O(1) random access read
and write, where the former is the same as with a linked list,
and the latter is much better than a linked list.
But nobody uses a linked list.
Not sure about that.

OK. I've never seen it:).

As I recall, in _Software Tools_, they say they tried it, but never
got it working quite right, and eventually discarded the idea. I
suspect with a higher-level interface (e.g. std::list) you could make
it work a bit more easily, but I'm still not at all sure anybody has
done so (or that it's a particularly good idea).

[ ... ]
In a modern editor, you do have several cursors. More
precisely, a modern editor will separate the model (the buffer)
and the view (the window), with the possibility of having
several views in the same model. Logically, the cursor is part
of the view, although it can be complicated, since it is also
used by the model in most commands. (I'd still keep it in the
view, and only pass it down to the model when a command which
needed it, like insert, was issued. Maybe that's how emacs and
others using a cursor gap buffer work today: they only move the
cursor in the buffer when needed---when a view passes them a
command with the cursor in it.)

I doubt that part of emacs gets updated a lot, so it wouldn't
surprise me if its implementation today is similar to 15-20 years
ago.

If I was doing it, I'd probably have each window keep track of its
cursor position, and the buffer manager would keep the gap at the
position of the current view's cursor.
 
J

Jerry Coffin

[ ... ]
Implementing a gap buffer isn't that trivial, either.

It's hardly what I'd call tremendously difficult. Way back when, I
wrote a split-buffer editor in Turbo Pascal 2.0. As I recall, it was
reasonably complete and usable in something like two weeks (and that,
of course, with quite a minimal standard library by modern
standards).
Just out of curiousity, I ran a couple of tests on my machine,
with a buffer containing 10 million characters, over a total
buffer size of 12,5 million characters. I get the following
times:

Move the cursor from the position 10 to 10 before the end, and
back, alternately, in a gap buffer:
11.2 ms per move with char
28,0 ms per move with wchar_t
Insert or delete at the position 10, alternatively ("flat"
buffer:
6.9 ms per operation with char
16.8 ms per operation with wchar_t
Those aren't what I'd consider prohibitive times. And the
simplest implementation is clearly the flat buffer, which is in
practice probably faster than the gap buffer for the most
frequent operations, and acceptably fast for the least frequent.

You could be right, but I'm a bit less convinced. About the only
operation for which it's faster is moving the cursor, but it's not
going to be a lot faster even for that. To move by increments of
something like a word or a line, you still have to read through the
data to find the next whitespace or newline, so you haven't saved
much even at best.

The only places I can see it saving a substantial amount of time
would be moving to an explicit position (e.g. a mark, or beginning or
end of file).
(Admittedly, this is a machine which was installed only a few
weeks ago: A PC running under Ubuntu 9.04 (Linux 2.6), with
3.2GB real memory and four Intel Xeon processor cores at
2.00GHz.)

For a job like this, CPU speed won't usually make a lot of
difference; the bottleneck will almost certainly be main-memory
bandwidth.
Agreed. And since just using an std::vector in the most obvious
manner seems fast enough, and is surely the simplest solution...

Regardless of which implementation I was going to use, I'd create a
small buffer manager class to give a reasonable interface -- and I'd
expect that buffer manager to take a matter of hours to implement for
either a plain vector or a split buffer. In fact, most of the
implementation for either one is so obvious that the limitation for a
lot of it is likely to be typing speed...
 
A

Alf P. Steinbach

* I V:
Could you explain? The discussions of gap buffers that I've seen use a
simple array with a gap in it, which would indeed have the
characteristics James and Micheal note. You're evidently talking about a
somewhat more sophisticated implementation (here, and above where you
mentioned gap buffers that deal efficiently with moving to beginning and
end of a document). Could you explain the implementation you're thinking
of? One improvement that comes to mind is using a circular buffer,
although I think this still requires copying half the size of the buffer
in the worst case (when moving the gap from the middle of the buffer to
one of the ends).

Yes. :)

The reason that circular buffer is almost required and is implicitly assumed is
that moving from one end to the other is a frequent operation.

Anyway, the data structure supports that operation, so from the point of view of
discussing data structures that operation is O(1), regardless of implementations
that manage to do it inefficiently.


Cheers,

- Alf

PS: This sort of connects with Andrei's recently proposed "ranges", because
there's much that suddenly becomes possible when there's maximum one active
cursor/pointer/iterator. E.g. consider the problem of deleting a node in a
singly linked list when all you have is a pointer directly to that node. With
suitable assumptions that's an O(1) operation for the general case. <g>
 
A

Alf P. Steinbach

* I V:
Could you explain? The discussions of gap buffers that I've seen use a
simple array with a gap in it, which would indeed have the
characteristics James and Micheal note. You're evidently talking about a
somewhat more sophisticated implementation (here, and above where you
mentioned gap buffers that deal efficiently with moving to beginning and
end of a document). Could you explain the implementation you're thinking
of? One improvement that comes to mind is using a circular buffer,
although I think this still requires copying half the size of the buffer
in the worst case (when moving the gap from the middle of the buffer to
one of the ends).

Yes. :)

The reason that circular buffer is almost required and is implicitly assumed is
that moving from one end to the other is a frequent operation.

Anyway, the data structure supports that operation, so from the point of view of
discussing data structures that operation is O(1), regardless of implementations
that manage to do it inefficiently.


Cheers,

- Alf

PS: This sort of connects with Andrei's recently proposed "ranges", because
there's much that suddenly becomes possible when there's maximum one active
cursor/pointer/iterator. E.g. consider the problem of deleting a node in a
singly linked list when all you have is a pointer directly to that node. With
suitable assumptions that's an O(1) operation for the general case. <g>
 
D

Doctor Duggar

Alf said:
Yes. :)

The reason that circular buffer is almost required and is implicitly assumed is
that moving from one end to the other is a frequent operation.

Anyway, the data structure supports that operation, so from the point of view of
discussing data structures that operation is O(1), regardless of implementations
that manage to do it inefficiently.

Ok, I don't understand the point of all this cat and mouse? It
seems to be impeding discussion more than helping it. Fine, you
are considering a circular gap buffer. So just pretend that all
the "end of buffer" points were being made about "middle of
buffer" instead. Now we are back to the simple point that move
is O(n) for such moves.

Now what? Perhaps we have in mind another "better" "gap buffer"
that optimizes for the "middle of buffer" moves too because they
are a common case. At what point are these souped-up "gap buffer"
implementations no longer a "gap buffer" in any reasonable sense?
Eventually we are talking about things that probably have other
other names such as unrolled linked list, b-tree, rope, skip
list, etc.

KHD
 
A

Alf P. Steinbach

* Doctor Duggar:
Ok, I don't understand the point of all this cat and mouse?

That may be because the play you see is only in your mind, Keith.

It
seems to be impeding discussion more than helping it. Fine, you
are considering a circular gap buffer. So just pretend that all
the "end of buffer" points were being made about "middle of
buffer" instead. Now we are back to the simple point that move
is O(n) for such moves.

That point has not been raised, so you're not back to it.

Moving to the ends have been discussed because those are common operations in
editors, and as it happens also in other uses of this data structure.

Regarding these operations in editors, most editors have shortcut keys for them,
e.g. [Ctrl Home] and [Ctrl End], while few if any have shortcut keys to go to
the middle of the text.

Now what? Perhaps we have in mind another "better" "gap buffer"
that optimizes for the "middle of buffer" moves too because they
are a common case.

No improvement of the basic cursor gap buffer has been discussed.

The only issue that has apparently entered the discussion has been the incorrect
idea that that a flawed, (relatively speaking) inefficient implementation says
something about the efficiency the data structure allows for certain operations.

At what point are these souped-up "gap buffer"
implementations no longer a "gap buffer" in any reasonable sense?

You'd have to provide detailed descriptions of your proposed data structures,
and for some of your creations your question might be practically decidable,
while for other of your creations your question might be practically undecidable.

However, if you plan on actually describing what you have in mind, please do so
over in e.g. [comp.programming].

Eventually we are talking about things that probably have other
other names such as unrolled linked list, b-tree, rope, skip
list, etc.

The "we", presumably referring to an assumed participation of others in your
proposed off-topic discussion, sounds a little odd. :)


Cheers & hth.,

- Alf
 
D

Doctor Duggar

Alf said:
* Doctor Duggar:


That may be because the play you see is only in your mind, Keith.

Ok, I guess the confusion shown by at least three other readers
(James, Michael, IV) was just a figment of my imagination and had
nothing to do with your answers.
That point has not been raised, so you're not back to it.

I think if some others had known that you had in mind a circular
buffer, they would have raised their algorithmic points differently.
That's what I'm trying to point out to you. In other words, simply
mentioning "circular buffer" in any of these posts

http://groups.google.com/group/comp.lang.c++/msg/22a77eed7c1c1dba
http://groups.google.com/group/comp.lang.c++/msg/90118a99ea35238c
http://groups.google.com/group/comp.lang.c++/msg/db656bf160333225

(especially the last two) would have prevented and/or ended the
confusion. And the last one is very much, in my opinion, a example
of cat and mouse games that add nothing to the discussion and
usually add further to the confusion.
Moving to the ends have been discussed because those are common operations in
editors, and as it happens also in other uses of this data structure.

Regarding these operations in editors, most editors have shortcut keys for them,
e.g. [Ctrl Home] and [Ctrl End], while few if any have shortcut keys to go to
the middle of the text.

Maybe. Though, I think some others were also just trying to raise
points about the structure in general (for example because a move
from "mark to mark" was brought up) not just about the convenient
move to begin/end.
No improvement of the basic cursor gap buffer has been discussed.

Ok, so a circular buffer is the "basic cursor gap buffer"? Then
judging from the confusion it seems some others mistakenly thought
two linear buffers was the "basic cursor gap buffer".
The only issue that has apparently entered the discussion has been the incorrect
idea that that a flawed, (relatively speaking) inefficient implementation says
something about the efficiency the data structure allows for certain operations.
At what point are these souped-up "gap buffer"
implementations no longer a "gap buffer" in any reasonable sense?

You'd have to provide detailed descriptions of your proposed data structures,
and for some of your creations your question might be practically decidable,
while for other of your creations your question might be practically undecidable.

However, if you plan on actually describing what you have in mind, please do so
over in e.g. [comp.programming].

So discussing general algorithms is on-topic when you are doing it
(now for 8 posts) and off-topic if I want to discuss them? Or why
didn't you move your responses re "cursor gap" algorithms to another
forum as you suggest I do?
The "we", presumably referring to an assumed participation of others in your
proposed off-topic discussion, sounds a little odd. :)

Given that you have already been participating in a non-C++
specific cursor gap discussion for several posts, how is your
suggestion and comment not hypocritical?

KHD
 
M

Michael Doubez

[ ... ]
How do you come up with slower operation ?

Consider, just for example, displaying the current "screen" of
information. With a split buffer, you have one possible
discontinuity: at the cursor. When the cursor is visible, you grab
one contiguous block from the beginning of the screen to the
beginning of the gap, and another contiguous block from the end of
the gap to the end of the screen. Pretty trivial really.

With a list structure with modifications stored in a scratch buffer,
you need to grab the base data from the (noncontiguous) list, and
then grab the changes from the scratch buffer, apply the changes from
the scratch buffer to the base data, so you have a contiguous buffer
holding the current version of the text for display.

That depends on how you represent a position in the text.
With the gap buffer, it would be an offset; iterating on the text is
increasing the offset (minus the gap handling) With scratch buffer, it
would be a buffer and an offset in the buffer; iterating on the text
is end-buffer/next-buffer.

Example:

Step 1: Initial state
Buffer="This is a text"
Scratch=""
Text=[ptr=buffer+0,size=14]

Step2: Inserting " nice " before text
Buffer="This is a text"
Scratch=" nice"
Text=[buffer+0,9],[scratch+0,5],[buffer+9,5]

It is true that then going at a specif index in the text is in o(B)
where (B is the number of element in the list) but I expect M would be
small and if movements are small, it is negligible.
From a purely algorithmic viewpoint, that's clearly more work. From a
viewpoint of locality of reference, it appears likely that you'd
refer to more different blocks of memory as well.

True. But mitigated by the fact that with the gap buffer, there is a
copy that requires moving things in main memory and I would expect it
to be at each move since the cache will likely be flushed, given the
interaction time.
Yes, but the amount of copying is trivial, and most of the time
involved can't really be avoided anyway. Just for example, assume you
move right one word. You have (nearly) no choice but to read bytes
until you get to a byte that's whitespace. Doing a copy just means
that as you read each byte, you also put it into a write buffer.

I agree that individual move are trivial, even give moves.
You can only avoid reading data if you have some pointer directly to
the next place to go -- e.g. if you start with a vector of linked
list of lines, then it's simple and straightforward to move by lines
without reading the data in each line.

IMO, line representation is another issue. With gap buffer, the
position of char are moving while they don't with tree representation.
Absent that, you need to read
the data to find where you're going -- and once you've read the data
into cache, writing is usually almost free, since it can be done
asynchronously (i.e. you put the data into a write queue and move
on).

It is also true of many buffers. Isn't it ?
We're left with only a few rather cases where the copying matters: 1)
move to beginning or end of file, 2) move to a mark.

As James Kanze already pointed out, even those are pretty easy to
handle fast enough, even for fairly outrageous file sizes (~10 MiB).

I agree with that.
But, 1) unless you have a pointer from one word/line/whatever to the
next, you still need to read through the data to find that next word,
line, or whatever, and 2) while you're doing it, you have to apply
the changes from the scratch buffer. The only part you're avoiding is
the writing -- which, as already noted, is the cheapest part of the
whole operation in most cases.

The same comment as above apply since it is the same argument.
I mentioned that the contiguous buffer would be reformed upon writing;
in most cases, I expect saves happens fairly often, in which case you
are in the same situation as with the gap buffer..
[ ... ]
My point here is that a lot of editing is reading/moving hence a
lot of useless copy occur; while this would not be normally a
concern, it
bothers me somewhat - perhaps a belief I have to get rid of. All the
most since moving/writing are usually well defined phase of editing
(like in vi).

I think the problem is that you're more or less ignoring a few basic
facts, such as:
1) Locality of reference matters -- even a few references to main
memory are expensive compared to lots of references to cache.

But to have something practical (I usually write quite a lot of char
when I write), I guess the gap size would exceed a cache entry and you
loose this advantage around the gap where most of the operation occur
and which, in definitive, is what is displayed to the screen.
2) In small chunks, reading is much more expensive than writing.

I would not expect that in in multiple-core environment: write is
expensive because of synchronisation of memory and R/W requires two
bus access which is nowadays the bottleneck.
I think for something like a word processor, attempting to design
without taking the user experience into account is a poor idea.

I think you will agree that, given the power of computers, even with a
very poorly designed or over-consumptive editor, the user experience
will be the same so I don't think this is an issue here.
 
J

James Kanze

* James Kanze:
* James Kanze: [...]
It is, in fact, O(n)
when you go to the end of the file, which is a frequent
operation (since in many cases, that's where you're inserting
text).
Above is incorrect. Cursor gap is O(1) for moving to other
end. With any reasonable implementation, since going to the
ends are frequent operations.
It's certainly O(n) with the classical implementation of
cursor gap. Say, the one in emacs (or at least, the one in
emacs 15-20 years ago, which was the last time I looked at
the source).
You might as well say that quicksort is O(n^3) average time
because some novice has managed to implement something he
calls quicksort with cubic average time.

I wouldn't call the implementors of emacs novices. I'm talking
of actual implementations that I've seen, in real code.
[...]
I've never seen or made a cursor gap structure that
supported multiple cursors. Possibly it can be done for
small number of cursors. I'm not sure.
In a modern editor, you do have several cursors.
Yeah, but except for collaborative editors like the Mac one,
only one at a time.

Emacs supports opening windows on different terminals, at least
under X. (That's probably the biggest single feature of emacs I
miss in vim.) I don't know how the implement it, however; I
suspect that they distinguish between the cursor in the model,
and the one in the view, and only update the one in the model
when necessary.
The cursors and information about which one's active is to me
obviously part of the model used to present the GUI (note: it
will be backed by other models).

Different views have different cursors, and several can be
visible (with the cursor materialized) at the same time. (But
maybe this is just an artifact of the way the non-active windows
are updated, when they are updated.)
 
J

James Kanze

[ ... ]
In a modern editor, you do have several cursors. More
precisely, a modern editor will separate the model (the
buffer) and the view (the window), with the possibility of
having several views in the same model. Logically, the
cursor is part of the view, although it can be complicated,
since it is also used by the model in most commands. (I'd
still keep it in the view, and only pass it down to the
model when a command which needed it, like insert, was
issued. Maybe that's how emacs and others using a cursor
gap buffer work today: they only move the cursor in the
buffer when needed---when a view passes them a command with
the cursor in it.)
I doubt that part of emacs gets updated a lot, so it wouldn't
surprise me if its implementation today is similar to 15-20
years ago.

Me neither. If it ain't broken, don't fix it.
If I was doing it, I'd probably have each window keep track of
its cursor position, and the buffer manager would keep the gap
at the position of the current view's cursor.

That's roughly Alf's suggestion, and it sounds likely to me,
except... emacs supports windows open on different X terminals,
and there's one active view per terminal. (I think---I don't have
the possibility of trying it at present.)
 
J

James Kanze

[ ... ]
In the cursor gap case, I have a contiguous representation
(except the gap) requiring a copy at each move.
Yes, but the amount of copying is trivial, and most of the
time involved can't really be avoided anyway. Just for
example, assume you move right one word. You have (nearly) no
choice but to read bytes until you get to a byte that's
whitespace. Doing a copy just means that as you read each
byte, you also put it into a write buffer.
You can only avoid reading data if you have some pointer
directly to the next place to go -- e.g. if you start with a
vector of linked list of lines, then it's simple and
straightforward to move by lines without reading the data in
each line. Absent that, you need to read the data to find
where you're going -- and once you've read the data into
cache, writing is usually almost free, since it can be done
asynchronously (i.e. you put the data into a write queue and
move on).
We're left with only a few rather cases where the copying
matters: 1) move to beginning or end of file, 2) move to a
mark.

Don't forget changing the active window or pane. If the cursor
is maintained as part of the buffer, then each time the active
view changes, it has to be updated.
As James Kanze already pointed out, even those are pretty easy
to handle fast enough, even for fairly outrageous file sizes
(~10 MiB).

And as I pointed out, just naïvely using a flat buffer is even
faster:). All of these strategies were developed back at a
time when machines were a lot, lot slower.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,159
Messages
2,570,879
Members
47,414
Latest member
GayleWedel

Latest Threads

Top