Text editor data structures

S

SD

I am thinking about writing a text editor in C for unix sometime soon.
I am just doing this to learn more about C. I want to write something
like ed.c, a simple line editor. What types of data structures would be
appropriate? I am thinking about using a linked list, but I am also
wondering whether a tree would be useful. Please give me your
ideas...thanks, tilak
 
K

Karthik Kumar

SD said:
I am thinking about writing a text editor in C for unix sometime soon.
I am just doing this to learn more about C. I want to write something
like ed.c, a simple line editor. What types of data structures would be
appropriate? I am thinking about using a linked list, but I am also
wondering whether a tree would be useful. Please give me your
ideas...thanks, tilak

In this n.g. , we primarily the C programming language, as prescribed
by the standard.

<OT>
Having said that, if you are starting on writing a text editor, get a
copy of 'Design Patterns' - Erich Gamma et al. That should help you a
lot.
</OT>
 
G

gooch

SD said:
I am thinking about writing a text editor in C for unix sometime soon.
I am just doing this to learn more about C. I want to write something
like ed.c, a simple line editor. What types of data structures would be
appropriate? I am thinking about using a linked list, but I am also
wondering whether a tree would be useful. Please give me your
ideas...thanks, tilak

Asking what data structure to use witout any context of what exactly
you want to use it for is not likely to get you a very good answer. You
probably need to spend a little time doing some design and determine
what types of things you need different data structures to do. Do you
need sorting capabilities in a given time? Will you only need to look
through values sequencially? Many other questions will need to be
answered before you can decide on a data structure. The way your
question is currently written no one here has any idea what you really
want to use the structures for.
 
M

Malcolm

SD said:
I am thinking about writing a text editor in C for unix sometime soon.
I am just doing this to learn more about C. I want to write something
like ed.c, a simple line editor. What types of data structures would be
appropriate? I am thinking about using a linked list, but I am also
wondering whether a tree would be useful. Please give me your
ideas...thanks, tilak
Normally a test editor is buitl around linked lists.

typedef struct line
{
struct line *next;
struct line *prev;
char *text;
} LINE;

define a "line" as any block of test terminated by an "\n" or end of file.

This structure means that insertions, deletions, and block moves can be
implemented quite efficiently. A modern computer is actually so fast that
you can probably move up an entire text file by one byte each time the user
presses a key, but there is no point throwing away CPU cycles.

You will need to call malloc() to grab new blocks, and also to allocate the
text. You might want to allocate lines in units of 100 or so to avoid
contuinal calls to realloc(). This would require a "capacity" member of the
LINE structure.

You probably want LINE *start and LINE *tail to be global variables, for
convenience, and also a global insertion point.

The main problem is displaying the text. You can use printf(), but such an
editor is extremely awkward to use. Really you need to look a a library such
as curses which allows to to place text at will on the screen.
 
K

Keith Thompson

Malcolm said:
Normally a test editor is buitl around linked lists.

typedef struct line
{
struct line *next;
struct line *prev;
char *text;
} LINE;

define a "line" as any block of test terminated by an "\n" or end of file.
[...]

Is that true? I haven't looked at the internals of any text editors
lately, but I think a lot of them store buffers of characters, not
lists of lines. Some may also use temporary files to avoid using too
much memory when editing large files. One disadvantage of a linked
list of lines is that accessing any given line requires an O(n)
search.

There are a number of open-source text editors. GNU has an "ed"
implementation, for example.
 
M

Malcolm

Keith Thompson said:
Is that true? I haven't looked at the internals of any text editors
lately, but I think a lot of them store buffers of characters, not
lists of lines.
It's ages since I've been in the text-editor writing business, and things
may have moved on. The central problem is that the user expects to be able
to insert text interactively at any point in the document, and on a slow
computer it isn't possible to shift the entire string up by one byte if the
user types text in at the beginning. So some sort of indexing system is
necessary.

A linked list is the most obvious solution, but the newline-terminated line
doesn't have to be your unit. It's just the easiest thing to choose if you
are doing jobs which are inherently line-based.

However there might be a better way of doing things. One very important
feature is the "undo". If you provide this then you do need an undo buffer,
which you can combine with an edit buffer, so it might make sense to abandon
linked lists for some sort of batch update system. If anyone has experience
of writing a modern text editor, then thoughts are welcome.
 
N

Novitas

Much of this question is somewhat off-topic, since it deals more with
software design than the C language in particular.

For a beginner's exercise, using linked lists where each element is a
line is probably the wisest approach. However if you want to be a bit
more aggressive and make the editor a more "industrial grade", then I
would suggest...

1. The principal data structure would be the "buffer."

2. Buffers would be represented as an in-memory header containing
things like a buffer name, an associated filename (if any), root
pointer to a "section tree", a line count, maybe a character descriptor
of some kind, etc...

3. The tree structure would also be in memory and could be done either
as a binary tree or even an n-way tree though I'd probably begin with a
binary tree. The leaves of this tree would be "text sections"

4. A text section would consist of a continuous text string (including
0 or more line terminations) that represents continuously typed in text
data which could exist either in-memory or in temporary files
(depending on length).

5. All edits would be accomplished by the appropriate splitting of text
sections and logical insertions of new ones. Discarded text sections
would not be immediately deleted however.

6. A transaction log data structure would be maintained that would
consist of a header and an extendable array (realloc'd in memory as
necessary) that would sequentially identify edit operations and
maintain pointers to text sections that may no longer be part of the
buffer's current text image. This transaction log would permit the
"undo" operation to be implemented. The transaction log header should
be part of the buffer header it is associated with.

7. The buffer image would be derived by walking the tree and emitting
the text sections in order. This is how a file save would be
implemented.

8. A flatten operation should be implemented that clears out the
transaction log, frees associated memory, and reduces the buffer image
to a single text section.

I would begin by designing the structs needed to implement these
concepts and define the prototypes of the major internal functions.
 
E

ed_davis2

SD said:
I am thinking about writing a text editor in C for unix
sometime soon. I am just doing this to learn more about C. I
want to write something like ed.c, a simple line editor. What
types of data structures would be appropriate?

Buffer Gap and paged Buffer Gap seem to be popular.

Try posting this on comp.editors - they go into great detail
about such things from time to time.

In the meantime, you might like playing with Ant's Editor
vIOCCC91 - which uses Buffer Gap, by the way. To compile
on my system, I type: gcc ant.c -o ant -lcurses. Yours
may be different.

#include <stdio.h>
#include <ctype.h>
#include <curses.h>
#define BUF 0x40000
#define T isspace(*(t=Z(p)))&&
#define V return
#define _ while
#define e int
#define o char
e d,i,j,m,n,p,q,x,y;
o*c,b[BUF],*f,*g=b,*h,*t;

o*Z(e a){if(a<0)V b;V b+a+(b+a<g?0:h-g);}

P(o*a){V a-b-(a<h?0:h-g);}

S(){p=0;}

bf(){n=p=P(c);}

Q(){q=1;}

G(){t=Z(p);_(t<g)*--h=*--g;_(h<t)*g++=*h++;p=P(h);}

B(){_(!T b<t)--p;_(T b<t)--p;}

M(e a){_(b<(t=Z(--a))&&*t-'\n');V b<t?++a:0;}

N(e a){_((t=Z(a++))<c&&*t-'\n');V t<c?a:p(c);}

A(e a,e j){i=0;_((t=Z(a))<c&&*t-'\n'&&i<j){
i+=*t-'\t'?1:8-(i&7);++a;}V a;}

L(){0<p&&--p;}

R(){p<P(c)&&++p;}

U(){p=A(M(M(p)-1),x);}

D(){p=A(N(p),x);}

H(){p=M(p);}

E(){p=N(p);L();}

J(){m=p=M(n-1);_(0<y--)D();n=P(c);}

K(){j=d;_(0<--j)m=M(m-1),U();}

X(){G();p=h<c?P(++h):p;}

F(){FILE*fp=fopen(f,"w");j=p;p=0;G();
fwrite(h,1,(e)(c-h),fp);fclose(fp);p=j;}

W(){_(!T t<c)++p;_(T t<c)++p;}

Y(){m=p<m?M(p):m;if(n<=p){m=N(p);i=m-P(c)?d:d-2;
_(0<i--)m=M(m-1);}move(0,0);i=j=0;n=m;for(;;){p-n||(y=i,x=j);
t=Z(n);if(d<=i||c<=t)break;
if(*t-'\r')addch(*t),j+=*t-'\t'?1:8-(j&7);
if(*t=='\n'||COLS<=j)++i,j=0;++n;}clrtobot();
if(++i<d)mvaddstr(i,0,"<<EOF>>");move(y,x);refresh();}

I(){G();_((j=getch())!='\f'){
if(j-'\b')g-h&&(*g++=(o)(j-'\r'?j:'\n'));else b<g&&--g;
p=P(h);Y();}}

C(){clear();Y();}

e(*z[])()={L,D,U,R,B,J,K,W,H,E,S,bf,I,X,F,C,Q,G};
o k[]="hjklHJKL[]tbixWRQ";

e main(e u,o**v){FILE*fp;h=c=b+BUF;if(u<2)V 2;initscr();
d=LINES;raw();noecho();idlok(stdscr,1);
if(0!=(fp=fopen(f=*++v,"r"))){g+=fread(b,1,BUF,fp);g=g<b?b:g;
fclose(fp);}S();_(!q){Y();j=getch();
for(i=0;k&&j-k;++i);(*z)();}endwin();V 0;}
 
E

ed_davis2

SD said:
I am thinking about writing a text editor in C for unix
sometime soon. I am just doing this to learn more about C. I
want to write something like ed.c, a simple line editor. What
types of data structures would be appropriate?

Buffer Gap and paged Buffer Gap seem to be popular.

Try posting this on comp.editors - they go into great detail
about such things from time to time.

In the meantime, you might like playing with Ant's Editor
vIOCCC91 - which uses Buffer Gap, by the way. To compile
on my system, I type: gcc ant.c -o ant -lcurses. Yours
may be different.

#include <stdio.h>
#include <ctype.h>
#include <curses.h>
#define BUF 0x40000
#define T isspace(*(t=Z(p)))&&
#define V return
#define _ while
#define e int
#define o char
e d,i,j,m,n,p,q,x,y;
o*c,b[BUF],*f,*g=b,*h,*t;

o*Z(e a){if(a<0)V b;V b+a+(b+a<g?0:h-g);}

P(o*a){V a-b-(a<h?0:h-g);}

S(){p=0;}

bf(){n=p=P(c);}

Q(){q=1;}

G(){t=Z(p);_(t<g)*--h=*--g;_(h<t)*g++=*h++;p=P(h);}

B(){_(!T b<t)--p;_(T b<t)--p;}

M(e a){_(b<(t=Z(--a))&&*t-'\n');V b<t?++a:0;}

N(e a){_((t=Z(a++))<c&&*t-'\n');V t<c?a:p(c);}

A(e a,e j){i=0;_((t=Z(a))<c&&*t-'\n'&&i<j){
i+=*t-'\t'?1:8-(i&7);++a;}V a;}

L(){0<p&&--p;}

R(){p<P(c)&&++p;}

U(){p=A(M(M(p)-1),x);}

D(){p=A(N(p),x);}

H(){p=M(p);}

E(){p=N(p);L();}

J(){m=p=M(n-1);_(0<y--)D();n=P(c);}

K(){j=d;_(0<--j)m=M(m-1),U();}

X(){G();p=h<c?P(++h):p;}

F(){FILE*fp=fopen(f,"w");j=p;p=0;G();
fwrite(h,1,(e)(c-h),fp);fclose(fp);p=j;}

W(){_(!T t<c)++p;_(T t<c)++p;}

Y(){m=p<m?M(p):m;if(n<=p){m=N(p);i=m-P(c)?d:d-2;
_(0<i--)m=M(m-1);}move(0,0);i=j=0;n=m;for(;;){p-n||(y=i,x=j);
t=Z(n);if(d<=i||c<=t)break;
if(*t-'\r')addch(*t),j+=*t-'\t'?1:8-(j&7);
if(*t=='\n'||COLS<=j)++i,j=0;++n;}clrtobot();
if(++i<d)mvaddstr(i,0,"<<EOF>>");move(y,x);refresh();}

I(){G();_((j=getch())!='\f'){
if(j-'\b')g-h&&(*g++=(o)(j-'\r'?j:'\n'));else b<g&&--g;
p=P(h);Y();}}

C(){clear();Y();}

e(*z[])()={L,D,U,R,B,J,K,W,H,E,S,bf,I,X,F,C,Q,G};
o k[]="hjklHJKL[]tbixWRQ";

e main(e u,o**v){FILE*fp;h=c=b+BUF;if(u<2)V 2;initscr();
d=LINES;raw();noecho();idlok(stdscr,1);
if(0!=(fp=fopen(f=*++v,"r"))){g+=fread(b,1,BUF,fp);g=g<b?b:g;
fclose(fp);}S();_(!q){Y();j=getch();
for(i=0;k&&j-k;++i);(*z)();}endwin();V 0;}
 
C

CBFalconer

ed_davis2 said:
.... snip ...

#include <stdio.h>
#include <ctype.h>
#include <curses.h>
#define BUF 0x40000
#define T isspace(*(t=Z(p)))&&
#define V return
#define _ while
#define e int
#define o char

The next time this nonsense appears so does a PLONK.
 
D

Derrick Coetzee

SD said:
I am thinking about writing a text editor in C for unix sometime soon.
I am just doing this to learn more about C. I want to write something
like ed.c, a simple line editor. What types of data structures would be
appropriate? I am thinking about using a linked list, but I am also
wondering whether a tree would be useful. Please give me your
ideas...thanks, tilak

Assuming you mean the data structure for the text buffer's contents, one
fancy but very cool and appropriate solution is the rope. Here's a page
discussing SGI's C++ implementation of ropes in a good bit of detail:

http://www.sgi.com/tech/stl/Rope.html

To my knowledge there's no good C implementation of ropes, and this
could be a novel project - but perhaps overkill for someone just
starting out. My two cents.
 
D

Derrick Coetzee

ed_davis2 said:
In the meantime, you might like playing with Ant's Editor [ . . . ]

e main(e u,o**v){FILE*fp;h=c=b+BUF;if(u<2)V 2;initscr();
d=LINES;raw();noecho();idlok(stdscr,1);
if(0!=(fp=fopen(f=*++v,"r"))){g+=fread(b,1,BUF,fp);g=g<b?b:g;
fclose(fp);}S();_(!q){Y();j=getch();
for(i=0;k&&j-k;++i);(*z)();}endwin();V 0;}


Sweet, sweet obfuscation.
 
M

Michael Wojcik

The next time this nonsense appears so does a PLONK.

Why? Citing or quoting IOCCC entries, particularly in response to
overly-broad, OT, or otherwise suspect questions, is a comp.lang.c
tradition of long standing. (About as long as the IOCCC's been
around, I'd guess.)

--
Michael Wojcik (e-mail address removed)

He smiled and let his gaze fall to hers, so that her cheek began to
glow. Ecstatically she waited until his mouth slowly neared her own.
She knew only one thing: rdoeniadtrgove niardgoverdgovnrdgog.
 
C

CBFalconer

Michael said:
Why? Citing or quoting IOCCC entries, particularly in response to
overly-broad, OT, or otherwise suspect questions, is a comp.lang.c
tradition of long standing. (About as long as the IOCCC's been
around, I'd guess.)

I over-reacted. He said nothing about citing an IOCCC entry, and I
took it as another of those idiot postings using a herd of
one-letter defines only to annoy. We just stamped out this a few
months ago.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,159
Messages
2,570,883
Members
47,415
Latest member
SharonCran

Latest Threads

Top