Errors in line numbers reported?

Y

Yukihiro Matsumoto

Hi,

In message "Re: Errors in line numbers reported?"
on Tue, 26 Oct 2004 23:04:05 +0900, (e-mail address removed) writes:

|how about storing the line in some word aligned bytes __behind__ nd_file?
|
| nd_file = "a.rb\0xxxx";
|
|the nd_line(n) would be something like
|
| (unsigned long)*((unsinged long *)(nd_file + ((strlen(nd_file) + 4) % 4) - 1));
|
|you get the idea...

Interesting idea, but it uses a lot more memory since we share single
string for all nd_file from the file now.

matz.
 
M

Markus

M> I think the 12.5% increase only applies to the generated code size,
M> which I suspect is only a fraction of the total memory footprint; I
M> think it comes to 4 bytes per node, which (estimating ~5 nodes per line,
M> average) comes to around 160K for a 8k line program.

rb_newobj() in gc.c


I'm sorry, I'm missing the connection. That seems to be dealing
with RVALUEs, not RNODEs; the line numbers in RVALUEs are already
integers (not just an 13 bit field). Are RVALUEs and RNODes related
somehow? Or is my lack of C showing?

-- Markus
 
N

nobu.nokada

Hi,

At Tue, 26 Oct 2004 11:27:59 +0900,
Hal Fulton wrote in [ruby-talk:117698]:
OK, I never knew that.

The file is 11,756 lines (down from 22,000).

Well, 2,097,151 lines are enough?


Index: gc.c
===================================================================
RCS file: /cvs/ruby/src/ruby/gc.c,v
retrieving revision 1.188
diff -U2 -p -d -r1.188 gc.c
--- gc.c 6 Oct 2004 07:40:04 -0000 1.188
+++ gc.c 26 Oct 2004 15:16:41 -0000
@@ -509,7 +509,69 @@ init_mark_stack()

#define MARK_STACK_EMPTY (mark_stack_ptr == mark_stack)
-
+
static st_table *source_filenames;

+#ifdef NODE_SEGMENTED_LINENO
+
+static int
+srcfile_cmp(s1, s2)
+ const char *s1, *s2;
+{
+ int ret = strcmp(s1, s2);
+ if (ret == 0) return 0;
+ return s1[srcfile_lineno - srcfile_offset]
+ - s2[srcfile_lineno - srcfile_offset];
+}
+
+int strhash _((const char *));
+
+static int
+srcfile_hash(s)
+ const char *s;
+{
+ return strhash(s) ^ s[srcfile_lineno - srcfile_offset];
+}
+
+static struct st_hash_type type_srchash = {
+ srcfile_cmp,
+ srcfile_hash,
+};
+
+#define init_srcfile_table() st_init_table(&type_srchash)
+
+char *
+rb_source_filenameline(f, n)
+ const char *f;
+ unsigned int n;
+{
+ char *name, *ptr;
+ long len = strlen(f) + 1;
+
+ ptr = ALLOC_N(char, len + srcfile_offset);
+ MEMCPY(ptr + srcfile_offset, f, char, len);
+ f = ptr + srcfile_offset;
+ ptr[srcfile_lineno] = (char)(n >> (sizeof(NODE*)*CHAR_BIT-NODE_LSHIFT));
+
+ if (!st_lookup(source_filenames, (st_data_t)f, (st_data_t *)&name)) {
+ name = ptr;
+ ptr[srcfile_gcmark] = 0;
+ ptr += srcfile_offset;
+ st_add_direct(source_filenames, (st_data_t)ptr, (st_data_t)name);
+ return ptr;
+ }
+ xfree(ptr);
+ return name + srcfile_offset;
+}
+
+char *
+rb_source_filename(f)
+ const char *f;
+{
+ return rb_source_filenameline(f, 0);
+}
+
+#else
+#define init_srcfile_table() st_init_strtable()
+
char *
rb_source_filename(f)
@@ -520,12 +582,15 @@ rb_source_filename(f)
if (!st_lookup(source_filenames, (st_data_t)f, (st_data_t *)&name)) {
long len = strlen(f) + 1;
- char *ptr = name = ALLOC_N(char, len + 1);
- *ptr++ = 0;
+ char *ptr = name = ALLOC_N(char, len + srcfile_offset);
+ ptr[srcfile_gcmark] = 0;
+ ptr += srcfile_offset;
MEMCPY(ptr, f, char, len);
st_add_direct(source_filenames, (st_data_t)ptr, (st_data_t)name);
return ptr;
}
- return name + 1;
+ return name + srcfile_offset;
}
+#endif
+

static void
@@ -534,5 +599,5 @@ mark_source_filename(f)
{
if (f) {
- f[-1] = 1;
+ f[srcfile_gcmark - srcfile_offset] = 1;
}
}
@@ -542,10 +607,10 @@ sweep_source_filename(key, value)
char *key, *value;
{
- if (*value) {
- *value = 0;
+ if (value[srcfile_gcmark]) {
+ value[srcfile_gcmark] = 0;
return ST_CONTINUE;
}
else {
- free(value);
+ xfree(value);
return ST_DELETE;
}
@@ -1922,5 +1987,5 @@ Init_GC()
finalizers = rb_ary_new();

- source_filenames = st_init_strtable();
+ source_filenames = init_srcfile_table();

nomem_error = rb_exc_new2(rb_eNoMemError, "failed to allocate memory");
Index: intern.h
===================================================================
RCS file: /cvs/ruby/src/ruby/intern.h,v
retrieving revision 1.156
diff -U2 -p -d -r1.156 intern.h
--- intern.h 6 Oct 2004 07:40:04 -0000 1.156
+++ intern.h 26 Oct 2004 15:21:45 -0000
@@ -239,4 +239,5 @@ NORETURN(void rb_memerror __((void)));
int ruby_stack_check _((void));
int ruby_stack_length _((VALUE**));
+char *rb_source_filenameline _((const char*, unsigned int));
char *rb_source_filename _((const char*));
void rb_gc_mark_locations _((VALUE*, VALUE*));
Index: node.h
===================================================================
RCS file: /cvs/ruby/src/ruby/node.h,v
retrieving revision 1.57
diff -U2 -p -d -r1.57 node.h
--- node.h 2 Oct 2004 11:34:13 -0000 1.57
+++ node.h 26 Oct 2004 15:17:03 -0000
@@ -164,8 +164,26 @@ typedef struct RNode {

#define NODE_LSHIFT (FL_USHIFT+8)
-#define NODE_LMASK (((long)1<<(sizeof(NODE*)*CHAR_BIT-NODE_LSHIFT))-1)
-#define nd_line(n) ((unsigned int)(((RNODE(n))->flags>>NODE_LSHIFT)&NODE_LMASK))
-#define nd_set_line(n,l) \
+#define NODE_LBITS (SIZEOF_VOIDP*CHAR_BIT-NODE_LSHIFT)
+#define NODE_LMASK (((long)1<<NODE_LBITS)-1)
+#define NODE_SEGMENTED_LINENO (NODE_LBITS > 20)
+
+enum {
+ srcfile_gcmark,
+#ifdef NODE_SEGMENTED_LINENO
+ srcfile_lineno,
+#endif
+ srcfile_offset
+};
+
+#define nd_line_low(n) ((unsigned int)(((RNODE(n))->flags>>NODE_LSHIFT)&NODE_LMASK))
+#define nd_set_line_low(n,l) \
RNODE(n)->flags=((RNODE(n)->flags&~(-1<<NODE_LSHIFT))|(((l)&NODE_LMASK)<<NODE_LSHIFT))
+#ifdef NODE_SEGMENTED_LINENO
+#define nd_line(n) rb_node_line(n)
+#define nd_set_line(n,l) rb_node_set_line(n,l)
+#else
+#define nd_line(n) nd_line_low(n)
+#define nd_set_line(n,l) nd_set_line_low(n,l)
+#endif

#define nd_head u1.node
@@ -355,4 +373,6 @@ NODE *rb_compile_file _((const char*, VA
void rb_add_method _((VALUE, ID, NODE *, int));
NODE *rb_node_newnode _((enum node_type,VALUE,VALUE,VALUE));
+unsigned int rb_node_line _((NODE *));
+void rb_node_set_line _((NODE *, unsigned int));

NODE* rb_method_node _((VALUE klass, ID id));
Index: parse.y
===================================================================
RCS file: /cvs/ruby/src/ruby/parse.y,v
retrieving revision 1.353
diff -U2 -p -d -r1.353 parse.y
--- parse.y 20 Oct 2004 15:44:05 -0000 1.353
+++ parse.y 26 Oct 2004 15:32:09 -0000
@@ -6456,4 +6456,30 @@ yylex(p)

#ifndef RIPPER
+#ifdef NODE_SEGMENTED_LINENO
+unsigned int
+rb_node_line(node)
+ NODE *node;
+{
+ unsigned int l = nd_line_low(node);
+ const char *file = node->nd_file;
+ if (file) {
+ l |= (unsigned char)file[srcfile_lineno-srcfile_offset] << NODE_LBITS;
+ }
+ return l;
+}
+
+void
+rb_node_set_line(node, line)
+ NODE *node;
+ unsigned int line;
+{
+ const char *file = node->nd_file;
+ if (file) {
+ node->nd_file = rb_source_filenameline(file, line);
+ }
+ nd_set_line_low(node, line);
+}
+#endif
+
NODE*
rb_node_newnode(type, a0, a1, a2)
@@ -6465,6 +6491,6 @@ rb_node_newnode(type, a0, a1, a2)
n->flags |= T_NODE;
nd_set_type(n, type);
- nd_set_line(n, ruby_sourceline);
n->nd_file = ruby_sourcefile;
+ nd_set_line(n, ruby_sourceline);

n->u1.value = a0;
Index: st.c
===================================================================
RCS file: /cvs/ruby/src/ruby/st.c,v
retrieving revision 1.30
diff -U2 -p -d -r1.30 st.c
--- st.c 23 Sep 2004 00:51:31 -0000 1.30
+++ st.c 26 Oct 2004 15:31:15 -0000
@@ -42,5 +42,5 @@ static struct st_hash_type type_numhash

/* extern int strcmp(const char *, const char *); */
-static int strhash(const char *);
+int strhash(const char *);
static struct st_hash_type type_strhash = {
strcmp,
@@ -530,5 +530,5 @@ st_foreach(table, func, arg)
}

-static int
+int
strhash(string)
register const char *string;
 
M

Markus

This seems to be along the lines I was thinking, but there seems
to be more to it than I was imagining, and I do not understand all of
it. Why is the interaction with the GC needed? Wouldn't it be enough
to allocate them and keep them around forever?

I was thinking it could be done by patching node.h (to add and
adjust macros), eval.c (to use them), and parse.y, to make a small
struct with int line_number_offset and char *file_name to store in the
RNODEs in place of nd_file. The main complexity would be making a new
current struct any time a RNODE was created and

ruby_sourceline - current->line_number_offset >= 8*1024

Is there a reason that I am not seeing why this would not work?

-- Markus



Hi,

At Tue, 26 Oct 2004 11:27:59 +0900,
Hal Fulton wrote in [ruby-talk:117698]:
OK, I never knew that.

The file is 11,756 lines (down from 22,000).

Well, 2,097,151 lines are enough?


Index: gc.c
===================================================================
RCS file: /cvs/ruby/src/ruby/gc.c,v
retrieving revision 1.188
diff -U2 -p -d -r1.188 gc.c
--- gc.c 6 Oct 2004 07:40:04 -0000 1.188
+++ gc.c 26 Oct 2004 15:16:41 -0000
@@ -509,7 +509,69 @@ init_mark_stack()

#define MARK_STACK_EMPTY (mark_stack_ptr == mark_stack)
-
+
static st_table *source_filenames;

+#ifdef NODE_SEGMENTED_LINENO
+
+static int
+srcfile_cmp(s1, s2)
+ const char *s1, *s2;
+{
+ int ret = strcmp(s1, s2);
+ if (ret == 0) return 0;
+ return s1[srcfile_lineno - srcfile_offset]
+ - s2[srcfile_lineno - srcfile_offset];
+}
+
+int strhash _((const char *));
+
+static int
+srcfile_hash(s)
+ const char *s;
+{
+ return strhash(s) ^ s[srcfile_lineno - srcfile_offset];
+}
+
+static struct st_hash_type type_srchash = {
+ srcfile_cmp,
+ srcfile_hash,
+};
+
+#define init_srcfile_table() st_init_table(&type_srchash)
+
+char *
+rb_source_filenameline(f, n)
+ const char *f;
+ unsigned int n;
+{
+ char *name, *ptr;
+ long len = strlen(f) + 1;
+
+ ptr = ALLOC_N(char, len + srcfile_offset);
+ MEMCPY(ptr + srcfile_offset, f, char, len);
+ f = ptr + srcfile_offset;
+ ptr[srcfile_lineno] = (char)(n >> (sizeof(NODE*)*CHAR_BIT-NODE_LSHIFT));
+
+ if (!st_lookup(source_filenames, (st_data_t)f, (st_data_t *)&name)) {
+ name = ptr;
+ ptr[srcfile_gcmark] = 0;
+ ptr += srcfile_offset;
+ st_add_direct(source_filenames, (st_data_t)ptr, (st_data_t)name);
+ return ptr;
+ }
+ xfree(ptr);
+ return name + srcfile_offset;
+}
+
+char *
+rb_source_filename(f)
+ const char *f;
+{
+ return rb_source_filenameline(f, 0);
+}
+
+#else
+#define init_srcfile_table() st_init_strtable()
+
char *
rb_source_filename(f)
@@ -520,12 +582,15 @@ rb_source_filename(f)
if (!st_lookup(source_filenames, (st_data_t)f, (st_data_t *)&name)) {
long len = strlen(f) + 1;
- char *ptr = name = ALLOC_N(char, len + 1);
- *ptr++ = 0;
+ char *ptr = name = ALLOC_N(char, len + srcfile_offset);
+ ptr[srcfile_gcmark] = 0;
+ ptr += srcfile_offset;
MEMCPY(ptr, f, char, len);
st_add_direct(source_filenames, (st_data_t)ptr, (st_data_t)name);
return ptr;
}
- return name + 1;
+ return name + srcfile_offset;
}
+#endif
+

static void
@@ -534,5 +599,5 @@ mark_source_filename(f)
{
if (f) {
- f[-1] = 1;
+ f[srcfile_gcmark - srcfile_offset] = 1;
}
}
@@ -542,10 +607,10 @@ sweep_source_filename(key, value)
char *key, *value;
{
- if (*value) {
- *value = 0;
+ if (value[srcfile_gcmark]) {
+ value[srcfile_gcmark] = 0;
return ST_CONTINUE;
}
else {
- free(value);
+ xfree(value);
return ST_DELETE;
}
@@ -1922,5 +1987,5 @@ Init_GC()
finalizers = rb_ary_new();

- source_filenames = st_init_strtable();
+ source_filenames = init_srcfile_table();

nomem_error = rb_exc_new2(rb_eNoMemError, "failed to allocate memory");
Index: intern.h
===================================================================
RCS file: /cvs/ruby/src/ruby/intern.h,v
retrieving revision 1.156
diff -U2 -p -d -r1.156 intern.h
--- intern.h 6 Oct 2004 07:40:04 -0000 1.156
+++ intern.h 26 Oct 2004 15:21:45 -0000
@@ -239,4 +239,5 @@ NORETURN(void rb_memerror __((void)));
int ruby_stack_check _((void));
int ruby_stack_length _((VALUE**));
+char *rb_source_filenameline _((const char*, unsigned int));
char *rb_source_filename _((const char*));
void rb_gc_mark_locations _((VALUE*, VALUE*));
Index: node.h
===================================================================
RCS file: /cvs/ruby/src/ruby/node.h,v
retrieving revision 1.57
diff -U2 -p -d -r1.57 node.h
--- node.h 2 Oct 2004 11:34:13 -0000 1.57
+++ node.h 26 Oct 2004 15:17:03 -0000
@@ -164,8 +164,26 @@ typedef struct RNode {

#define NODE_LSHIFT (FL_USHIFT+8)
-#define NODE_LMASK (((long)1<<(sizeof(NODE*)*CHAR_BIT-NODE_LSHIFT))-1)
-#define nd_line(n) ((unsigned int)(((RNODE(n))->flags>>NODE_LSHIFT)&NODE_LMASK))
-#define nd_set_line(n,l) \
+#define NODE_LBITS (SIZEOF_VOIDP*CHAR_BIT-NODE_LSHIFT)
+#define NODE_LMASK (((long)1<<NODE_LBITS)-1)
+#define NODE_SEGMENTED_LINENO (NODE_LBITS > 20)
+
+enum {
+ srcfile_gcmark,
+#ifdef NODE_SEGMENTED_LINENO
+ srcfile_lineno,
+#endif
+ srcfile_offset
+};
+
+#define nd_line_low(n) ((unsigned int)(((RNODE(n))->flags>>NODE_LSHIFT)&NODE_LMASK))
+#define nd_set_line_low(n,l) \
RNODE(n)->flags=((RNODE(n)->flags&~(-1<<NODE_LSHIFT))|(((l)&NODE_LMASK)<<NODE_LSHIFT))
+#ifdef NODE_SEGMENTED_LINENO
+#define nd_line(n) rb_node_line(n)
+#define nd_set_line(n,l) rb_node_set_line(n,l)
+#else
+#define nd_line(n) nd_line_low(n)
+#define nd_set_line(n,l) nd_set_line_low(n,l)
+#endif

#define nd_head u1.node
@@ -355,4 +373,6 @@ NODE *rb_compile_file _((const char*, VA
void rb_add_method _((VALUE, ID, NODE *, int));
NODE *rb_node_newnode _((enum node_type,VALUE,VALUE,VALUE));
+unsigned int rb_node_line _((NODE *));
+void rb_node_set_line _((NODE *, unsigned int));

NODE* rb_method_node _((VALUE klass, ID id));
Index: parse.y
===================================================================
RCS file: /cvs/ruby/src/ruby/parse.y,v
retrieving revision 1.353
diff -U2 -p -d -r1.353 parse.y
--- parse.y 20 Oct 2004 15:44:05 -0000 1.353
+++ parse.y 26 Oct 2004 15:32:09 -0000
@@ -6456,4 +6456,30 @@ yylex(p)

#ifndef RIPPER
+#ifdef NODE_SEGMENTED_LINENO
+unsigned int
+rb_node_line(node)
+ NODE *node;
+{
+ unsigned int l = nd_line_low(node);
+ const char *file = node->nd_file;
+ if (file) {
+ l |= (unsigned char)file[srcfile_lineno-srcfile_offset] << NODE_LBITS;
+ }
+ return l;
+}
+
+void
+rb_node_set_line(node, line)
+ NODE *node;
+ unsigned int line;
+{
+ const char *file = node->nd_file;
+ if (file) {
+ node->nd_file = rb_source_filenameline(file, line);
+ }
+ nd_set_line_low(node, line);
+}
+#endif
+
NODE*
rb_node_newnode(type, a0, a1, a2)
@@ -6465,6 +6491,6 @@ rb_node_newnode(type, a0, a1, a2)
n->flags |= T_NODE;
nd_set_type(n, type);
- nd_set_line(n, ruby_sourceline);
n->nd_file = ruby_sourcefile;
+ nd_set_line(n, ruby_sourceline);

n->u1.value = a0;
Index: st.c
===================================================================
RCS file: /cvs/ruby/src/ruby/st.c,v
retrieving revision 1.30
diff -U2 -p -d -r1.30 st.c
--- st.c 23 Sep 2004 00:51:31 -0000 1.30
+++ st.c 26 Oct 2004 15:31:15 -0000
@@ -42,5 +42,5 @@ static struct st_hash_type type_numhash

/* extern int strcmp(const char *, const char *); */
-static int strhash(const char *);
+int strhash(const char *);
static struct st_hash_type type_strhash = {
strcmp,
@@ -530,5 +530,5 @@ st_foreach(table, func, arg)
}

-static int
+int
strhash(string)
register const char *string;
 
H

Hal Fulton

Well, 2,097,151 lines are enough?

:) Quite enough for now.

[snip patch]

Forgive me, I am largely ignorant of the interpreter
internals.

What are the implications of this patch? What happens
to per-object memory and overall speed?


Hal
 
M

markus

Well, 2,097,151 lines are enough?

:) Quite enough for now.

[snip patch]

Forgive me, I am largely ignorant of the interpreter
internals.

What are the implications of this patch? What happens
to per-object memory and overall speed?

I don't understand all the details myself (see my previous post on
this thread) but it looks like it adds a few hundred bytes of code and
about 1/50 of a bit (on average, of course) per line of ruby source. So,
almost no impact. I think it may be possible to do even better--maybe
half as much code and a tenth as much space per line (again, see my
previous post), but I am not sure there isn't some problem that this patch
addresses that the other idea wouldn't, and I'm not sure the difference is
worth worrying about in any case.

-- Markus
 
N

nobu.nokada

Hi,

At Wed, 27 Oct 2004 01:58:51 +0900,
Markus wrote in [ruby-talk:117790]:
This seems to be along the lines I was thinking, but there seems
to be more to it than I was imagining, and I do not understand all of
it. Why is the interaction with the GC needed? Wouldn't it be enough
to allocate them and keep them around forever?

Do you mean:

typedef struct {
#ifdef NODE_SEGMENTED_LINENO
unsigned char lineno_upper;
#endif
char gcmark, name[1];
} srcfilename;

and

ptr = ALLOC_N(char, len + offsetof(srcfile, name));
...
srcfilename *ptr = (srcfilename *)(f - offsetof(srcfile, name));
?

Just because I didn't prefer reverting a member to the struct
containing it, there is no definite reason.
I was thinking it could be done by patching node.h (to add and
adjust macros), eval.c (to use them), and parse.y, to make a small
struct with int line_number_offset and char *file_name to store in the
RNODEs in place of nd_file. The main complexity would be making a new
current struct any time a RNODE was created and

ruby_sourceline - current->line_number_offset >= 8*1024

Is there a reason that I am not seeing why this would not work?

To save memory space.
 
N

nobu.nokada

Hi,

At Wed, 27 Oct 2004 09:33:54 +0900,
Yukihiro Matsumoto wrote in [ruby-talk:117841]:
Interesting solution. I'm thinking of another fix; the following
flags have no use for nodes:

FL_TAINT
FL_EXTIVAR
FL_FREEZE

and possibly FL_FINALIZE. So we can use 3 (or 4) more bits to
represent line numbers, thus allowing 65535 (or 131071) lines maximum.

Is it enough?


Index: node.h
===================================================================
RCS file: /cvs/ruby/src/ruby/node.h,v
retrieving revision 1.57
diff -u -2 -p -r1.57 node.h
--- node.h 2 Oct 2004 11:34:13 -0000 1.57
+++ node.h 27 Oct 2004 03:42:09 -0000
@@ -164,8 +164,26 @@ typedef struct RNode {

#define NODE_LSHIFT (FL_USHIFT+8)
-#define NODE_LMASK (((long)1<<(sizeof(NODE*)*CHAR_BIT-NODE_LSHIFT))-1)
-#define nd_line(n) ((unsigned int)(((RNODE(n))->flags>>NODE_LSHIFT)&NODE_LMASK))
+#define NODE_LBITS (sizeof(NODE*)*CHAR_BIT-NODE_LSHIFT)
+#define NODE_LMASK (~(~0L<<NODE_LBITS))
+#if SIZEOF_LONG > 4
+#define NODE_LUSHIFT 0
+#define NODE_LUBITS 0
+#else
+#define NODE_LUSHIFT 7 /* FL_FINALIZE */
+#define NODE_LUBITS 4 /* thru FL_FREEZE */
+#endif
+#define NODE_LUMASK (~(~0L<<NODE_LUBITS)<<NODE_LUSHIFT)
+#if NODE_LUBITS
+#define nd_line_high(n) ((RNODE(n)->flags&NODE_LUMASK)<<(NODE_LBITS-NODE_LUSHIFT))
+#define nd_line_upper(l) (((l)>>(NODE_LBITS-NODE_LUSHIFT))&NODE_LUMASK)
+#else
+#define nd_line_high(n) 0
+#define nd_line_upper(l) 0
+#endif
+#define nd_line_low(n) ((unsigned int)(((RNODE(n))->flags>>NODE_LSHIFT)&NODE_LMASK))
+#define nd_line(n) (nd_line_low(n) | nd_line_high(n))
#define nd_set_line(n,l) \
- RNODE(n)->flags=((RNODE(n)->flags&~(-1<<NODE_LSHIFT))|(((l)&NODE_LMASK)<<NODE_LSHIFT))
+ RNODE(n)->flags=((RNODE(n)->flags&~((NODE_LMASK<<NODE_LSHIFT)|NODE_LUMASK))|\
+ (((l)&NODE_LMASK)<<NODE_LSHIFT)|nd_line_upper(l))

#define nd_head u1.node
 
M

Markus

Hi,

At Wed, 27 Oct 2004 01:58:51 +0900,
Markus wrote in [ruby-talk:117790]:
This seems to be along the lines I was thinking, but there seems
to be more to it than I was imagining, and I do not understand all of
it. Why is the interaction with the GC needed? Wouldn't it be enough
to allocate them and keep them around forever?

Do you mean:

typedef struct {
#ifdef NODE_SEGMENTED_LINENO
unsigned char lineno_upper;
#endif
char gcmark, name[1];
} srcfilename;

and

ptr = ALLOC_N(char, len + offsetof(srcfile, name));
...
srcfilename *ptr = (srcfilename *)(f - offsetof(srcfile, name));
?

No, I meant something like:

typedef struct {
unsigned long lineno_offset;
char *nd_file;
} srcfile_segment;

I was intending to share the (unaltered) file name among all the segments.
To save memory space.

If B is the number of 8K-line blocks (rounded up) in a file, and N
is the number of bytes in the file name (including the path), we have an
overhead of:

8*B+N

or

(2+N)*B

So the savings is six bytes per source file, not counting the additional
code complexity (and size), and assuming that no file has more than 8k
lines (in which case we give back N, ~= 40).

Not a significant difference in any case.

I'm playing with a patch which should handle up to 4 billion lines
with no memory penalty at all (or perhaps a very slight gain), and
potentially much simpler code (so probably a net speed increase) but am
running into problems with how the members of the union are referenced.
I'm finding cases where (for example) u1.value is used to reference a
member of u2 or u3 that happens to occupy the same memory location. Is
there and documentation of where/when this is considered appropriate?

-- Markus
 
T

ts

M> I'm playing with a patch which should handle up to 4 billion lines
M> with no memory penalty at all (or perhaps a very slight gain),

You can perhaps handle 4 billion lines, this will not change the fact that
the number of nodes in a block is limited.


Guy Decoux
 
M

Markus

M> I'm playing with a patch which should handle up to 4 billion lines
M> with no memory penalty at all (or perhaps a very slight gain),

You can perhaps handle 4 billion lines, this will not change the fact that
the number of nodes in a block is limited.

*laugh* No, and I am not intending to tackle that soon. (Though I
suppose it could be possible to have a billion line program with very
few nodes--if, for example, there were large here-doc strings being
managed by a small amount of code). But the real reason for the 4
billion is of course that I'm using a long so as to maintain alignment
within the struct.

Technically I should say "I'm trying to..." since it isn't quite
working. The basic idea is this:

* RNODEs have a header (constant length part) and one of three
variant parts, called u2, u2. and u3.

* I came up with a patch (see earlier in this thread) that neatly
handled line numbers greater than 8k; I added a long to the
header. (Previously, they had been packed into bits in the
flags field). This maintained double-word-alignment, but grew
the structure by four bytes.

* u3 is the longest variant part. If it were four bytes shorter,
the whole structure would be the same length as it had been
before I my patch.

* u3 contains a long status field that is only ever checked for
0,1,2, or none of the above. If it were moved into the flag
bits previously used by the line numbers (it would only take two
of the thirteen) balance and harmony would be restored to the
chaos.

But the devil is in the details. It seems (I have not gotten to the
bottom of it) that sometimes fields in the various variants are
referenced to modify or access the fields in the others (making use of
the overlap). Not really knowing C, I'm having a hard time tracking
them down--using grep, detective work, and a blunt stick where I suspect
others would use gdb and a surgical knife, or some other tool I'm not
even aware of.

It is of course also possible that I have misdiagnosed the problem.

-- Markus
 
T

ts

M> * I came up with a patch (see earlier in this thread) that neatly
M> handled line numbers greater than 8k; I added a long to the
M> header. (Previously, they had been packed into bits in the
M> flags field). This maintained double-word-alignment, but grew
M> the structure by four bytes.

Well, if you don't want to look at rb_newobj() then look at
rb_node_newnode() in parse.y


Guy Decoux
 
M

Markus

M> * I came up with a patch (see earlier in this thread) that neatly
M> handled line numbers greater than 8k; I added a long to the
M> header. (Previously, they had been packed into bits in the
M> flags field). This maintained double-word-alignment, but grew
M> the structure by four bytes.

Well, if you don't want to look at rb_newobj() then look at
rb_node_newnode() in parse.y

*laugh* I've looked at both, but looking may not be seeing. I
added the following to rb_node_newnode():

n->u3_state = a0;

which I subsequently changed to:

nd_set_state(n,a0);

and then:

if (a0) nd_set_state(n,3); else nd_set_state(n,0); /* kludge */

as I nailed down the values needed for status. That part seems to
work. But I am still having trouble running "make test" and it appears
that something else is coulouring outside the lines.

-- Markus

P.S. If you are interested, I could make up a simple buggy-patch to show
you how far I've gotten.
 
T

ts

M> *laugh* I've looked at both, but looking may not be seeing.

Well, I don't know how to say this

M> * I came up with a patch (see earlier in this thread) that neatly
M> handled line numbers greater than 8k; I added a long to the
M> header. (Previously, they had been packed into bits in the
M> flags field). This maintained double-word-alignment, but grew
M> the structure by four bytes.

if you add a long to the header, you change the size of *all* R struct

M> * u3 is the longest variant part. If it were four bytes shorter,
M> the whole structure would be the same length as it had been
M> before I my patch.

u3 is not the longest variant part, u3 has the same size than u1 and u2

it exist nodes which uses u1, u2 and u3



Guy Decoux
 
T

trans. (T. Onoma)

I'm wondering, given some of the talk about a bug reporting and mailing lists
on ruby-core. Might this "Errors" thread itself be more appropriate on core?

Guess what I'm trying to say in general is that perhaps we should start making
a concerted effort to talk about "internals" on core, so that Nuby's and
persons only interest4ed in Ruby per ruby, do not need to be _concerned_ by
it.

Core get so little use, too.

T.
 
M

Markus

M> *laugh* I've looked at both, but looking may not be seeing.

Well, I don't know how to say this

M> * I came up with a patch (see earlier in this thread) that neatly
M> handled line numbers greater than 8k; I added a long to the
M> header. (Previously, they had been packed into bits in the
M> flags field). This maintained double-word-alignment, but grew
M> the structure by four bytes.

if you add a long to the header, you change the size of *all* R struct

M> * u3 is the longest variant part. If it were four bytes shorter,
M> the whole structure would be the same length as it had been
M> before I my patch.

u3 is not the longest variant part, u3 has the same size than u1 and u2

it exist nodes which uses u1, u2 and u3

I suspect we have a failure to communicate. Either I am not
understanding you, or...oh wait, (rushes for introductory C book...)

*hang head in embarrassment*

Thank you. I must have been confusing it with some other language
(?Ada?) where the variants are declared "inside out."

-- Markus
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,160
Messages
2,570,889
Members
47,421
Latest member
StacyTaver

Latest Threads

Top