Dynamic function generation

M

Marco Loskamp

Dear list,

I'm trying to dynamically generate functions; it seems that what I
really want is beyond C itself, but I'd like to be confirmed here.

In the minimal example below, I'd like to create content to put at
the address pointed to by f. In particular, I'd like to avoid/replace
the memcpy line.

Possible application (inspired by Paul Graham, "ANSI Common Lisp",
page 2): write a function that takes a number n, and returns a function
that adds n to its argument.

Any comments would be much appreciated.

Thanks,
Marco


// BEGIN CODE
#include <stdio.h>

int main(void) {
int (*f)(const char *f, ...);
int n;
f = printf;
n = sizeof(int(*)(const char *, ...));
f("Hi! My size is %d.\n", n);
f = (int(*)(const char *f, ...))malloc(n+1);
memcpy(f, printf, n+1);
if (f == (int(*)(const char*, ...))NULL)
printf("Mem full!\n");
f("Hello, World!");
free(f);
return 0;
}
// END CODE
 
T

Tor Rustad

#include <stdio.h>

#include <stdlib.h> /*malloc */
#include said:
int main(void) {
int (*f)(const char *f, ...);
int n;
f = printf;
n = sizeof(int(*)(const char *, ...));
f("Hi! My size is %d.\n", n);
f = (int(*)(const char *f, ...))malloc(n+1);
memcpy(f, printf, n+1);
if (f == (int(*)(const char*, ...))NULL)
printf("Mem full!\n");
f("Hello, World!");

Hmm.. you cannot expect this to work, because the heap may
be non-executable. No splint errors though (see below).
free(f);
return 0;
}

C:\Temp>splint test.c
Splint 3.0.1.6 --- 11 Feb 2002

test.c: (in function main)
test.c(9,3): Assignment of size_t to int:
n = sizeof([function (char *, ...) returns int] *)
To allow arbitrary integral types to match any integral type, use
+matchanyintegral.
test.c(10,3): Return value (type int) ignored: f("Hi! My size i...
Result returned by function call is not used. If this is intended, can
cast
result to (void) to eliminate message. (Use -retvalint to inhibit
warning)
test.c(11,42): Function malloc expects arg 1 to be size_t gets int: n +
1
test.c(11,35): Cast from function pointer type ([function (char *, ...)
returns
int] *) to non-function pointer (void *):
([function (char *, ...) returns int] *)malloc(n + 1)
A pointer to a function is cast to (or used as) a pointer to void (or
vice
versa). (Use -castfcnptr to inhibit warning)
test.c(12,10): Function memcpy expects arg 1 to be void * gets [function
(char
*, ...) returns int] *: f
Types are incompatible. (Use -type to inhibit warning)
test.c(12,13): Function memcpy expects arg 2 to be void * gets [function
(char
*, ...) returns int]: printf
test.c(12,21): Function memcpy expects arg 3 to be size_t gets int: n +
1
test.c(12,10): Possibly null storage f passed as non-null param:
memcpy (f, ...)
A possibly null pointer is passed as a parameter corresponding to a
formal
parameter with no /*@null@*/ annotation. If NULL may be used for this
parameter, add a /*@null@*/ annotation to the function parameter
declaration.
(Use -nullpass to inhibit warning)
test.c(11,3): Storage f may become null
test.c(15,3): Return value (type int) ignored: f("Hello, World!")
test.c(16,8): Function free expects arg 1 to be void * gets [function
(char *,
...) returns int] *: f

Finished checking --- 10 code warnings
 
B

Barry Schwarz

Dear list,

I'm trying to dynamically generate functions; it seems that what I
really want is beyond C itself, but I'd like to be confirmed here.

In the minimal example below, I'd like to create content to put at
the address pointed to by f. In particular, I'd like to avoid/replace
the memcpy line.

Possible application (inspired by Paul Graham, "ANSI Common Lisp",
page 2): write a function that takes a number n, and returns a function
that adds n to its argument.

This exercise does not require anything like what you attempt below.
Any comments would be much appreciated.

Thanks,
Marco


// BEGIN CODE
#include <stdio.h>

int main(void) {
int (*f)(const char *f, ...);
int n;
f = printf;
n = sizeof(int(*)(const char *, ...));

n contains the size of the pointer, not the size of the function. I
don't know of any portable way to obtain the size of a function.
f("Hi! My size is %d.\n", n);

Plan on seeing four or eight.
f = (int(*)(const char *f, ...))malloc(n+1);

I wonder if casting a void* to a function pointer is legal. If so,
you now have space for a pointer plus one char.
memcpy(f, printf, n+1);

This will copy the first n+1 bytes of printf to your allocated space,
probably five or nine. I am pretty certain that this is not the
complete code for printf.
if (f == (int(*)(const char*, ...))NULL)

This is a bit late since you already tried to access the memory. You
need to place this before the memcpy.
printf("Mem full!\n");
f("Hello, World!");

Why would you continue to dereference f after you know it is NULL.
You need to bypass both the memcpy and this once your if evaluates to
true.

My system does not differentiate between the address of code and the
address of data. I doubt if this assumption is very portable.
free(f);
return 0;
}



<<Remove the del for email>>
 
W

Walter Roberson

:I'm trying to dynamically generate functions; it seems that what I
:really want is beyond C itself, but I'd like to be confirmed here.

C has no inherent mechanism for generating a new function.
It offers no certainty that any kind of run-time compiler or
assembler or linker will be available, and certainly offers no
reassurance that you will be able to create new executable code --
it is considered acceptable for there to be environments in which the
code is burned into ROM, or environments in which the code is
marked as read+execute only, and that all writable pages are marked
as do-not-execute.

One can start thinking of all kinds of hacks to attach a piece
of data to a function (a 'closure'), but one still runs across
the problem that C does not offer ways of generating new functions
on demand. The closest one would be able to get would be to
offer a "stable" of functions. For example,

#define gfn(n)
#define gf(n) int gfn(n) ( int i ) { return i + const_tab[n] );
#define MaxFunc 3219
int nextfunc = 0;
int const_tab[MaxFunc];
gf(0)
gf(1)
gf(2)
gf(3)
gf(4)
....
gf(3218)
int *()fp[MaxFunc] = {
gfn(0), gfn(1), gfn(2), gfn(3), ... gfn(3218)
};

int *() genadd( int n ) {
if ( nextfunc >= MaxFunc ) return NULL;
const_tab[nextfunc] = n;
return fp[nextfunc++];
}

This will (once debugged) return a pointer to a function as requested,
but has a maximum limit of 3219 such functions returned.
(Why 3219? Why not!? A magic number is a magic number.)


If you need more generalization, but can still live with the
issue of having a limited number of functions, then what could
be passed in to the generator could be a string that described
the operations, and the function returned could parse and take
action on the appropriate string. For more efficiency, one
could create a "work tree" of operations and nodes [saves
on parsing each time], and the appropriate tree could be processed
when the function was invoked.
 
C

Chris Torek

I'm trying to dynamically generate functions [as is done in languages
like Scheme and ML and all sorts of other "functional" languages] ...
... it seems that what I really want is beyond C itself, but I'd
like to be confirmed here.

It is (beyond C, that is), except insofar as you can write a
Lisp interpreter in C, and then write your code in Lisp. :)

If C's ordinary data objects ("int", "double", and so on) are "first
class" items, and structures are also "first class"[1] citizens,
and arrays are then "second class" citizens[2], that puts C's
functions at a distant third-class. You cannot take the size of
a function, functions may be (and somewhat often, really are) in
"special" memory (ROM or write-protected RAM and/or a separate
instruction-space on the processor), and functions can never be
generated "on the fly", or even loaded dynamically[3].
-----
[1] Really true in C99, only half-true in C89. C99 has a new
feature called a "compound literal" so that you can write things
like (const struct foo){1,2}. These allow the creation of
anonymous aggregates. Thus, given void f1(int), you can do
both f1(i) and f1(3) in C89; but given void f2(struct foo),
you can only do f2((struct foo){3}) in C99.

[2] The entire value of an array can never be accessed all at once.
In particular, f(array) passes &array[0] to f(), instead of a copy
of every element of the array. Similarly:

struct S a, b;
...
a = b;

is OK, but:

int a[5], b[5];
...
a = b;

is not.

[3] Lots of systems have dynamically-loaded functions (whether
called "DLLs" or "dynamic shared libraries" or "DSOs" or some other
name(s)). This is done by those implementations going beyond the
minimum requirements for Standard C. Such systems can, in general,
implement dynamic code generation as well -- but in some cases it
takes quite a lot of fancy footwork, and in all cases it is not
portable.
-----
There is a sort of compromise position in C, halfway beween "writing
a full-blown interpreter" and "doing full-blown runtime code
generation". To build a Lisp-like "closure", so that you can
perform partial application of some function f() to generate new
functions f1() through fN(), write function f() so that it takes
an extra parameter, e.g.:

struct adder_context { int addend; };

int adder(int param, struct adder_context *p) {
return param + p->addend;
}

Now you can generate a new adder simply by allocating an
"adder_context" and filling in the addend:

/* remember to #include <stdlib.h> */

struct adder_context *new_adder(int addend) {
struct adder_context *p = malloc(sizeof *p);

if (p == NULL)
panic("out of memory");
p->addend = addend;
return p;
}

Of course, this is not very general. You probably want to be
able to construct not only an adder but also a multiplier, and/or
various other functions. So now we fancy up the context, and
perhaps also use "void *":

struct generic_context {
int (*func)(void *ctx, int arg);
};

struct adder_context {
struct generic_context common;
int addend;
};

struct multiplier_context {
struct generic_context common;
int mult;
};

struct mult_and_add_context {
struct generic_context common;
int mult;
int addend;
};

We now have three specific kinds of contexts, and can write three
functions (which can be "static" so that their names are invisible
outside the implementation routine) and their three exported
function-builders:

static int do_add(void *ctx, int arg) {
struct adder_context *p = ctx;

return arg + p->addend;
}

/* NB: emalloc is malloc + panic-if-out-of-memory */

struct generic_context *new_adder(int addend) {
struct adder_context *p = emalloc(sizeof *p);

p->func = do_add;
p->addend = addend;
return &p->common;
}

static int do_mult(void *ctx, int arg) {
struct multiplier_context *p = ctx;

return arg + p->addend;
}

struct generic_context *new_mult(int mult) {
struct multiplier_context *p = emalloc(sizeof *p);

p->func = do_mult;
p->mult = mult;
return &p->common;
}

static int do_mult_and_add(void *ctx, int arg) {
struct mult_and_add_context *p = ctx;

return (arg * p->mult) + p->addend;
}

struct generic_context *new_mult_and_add(int mult, int addend) {
struct mult_and_add_context *p = emalloc(sizeof *p);

p->func = do_mult_and_add;
p->mult = mult;
p->addend = addend;
return &p->common;
}

Whatever code calls these need only use the "generic" context that
is common to all these functions:

struct generic_context *p;

if (...)
p = new_adder(3); /* so that p->func computes x + 3 */
else if (...)
p = new_mult(5); /* here p->func computes 5x */
else
p = new_mult_and_add(7, 2); /* p->func computes 7x + 2 */
...
printf("func(4) = %d\n", p->func(p, 4));
/* prints 7, 20, or 30, depending on p->func */

The use of "void *" provides a kind of "type system sleight-of-hand"
that allows us to write cast-free (i.e., "far less ugly") C code.
The only constraint is that the generic context must be the first
member of the specific contexts.

For those familiar with C++, note that this is really just a "hand
expansion" of a C++ "base class" with a single "virtual function".
If we had multiple virtual functions, it would often be good to
use a second level of indirection, where the generic context has
a pointer to a table of function pointers, so that instead of:

result = p->func(p, other_args);

we would write:

result1 = p->ops->func1(p, other_args);
result2 = p->ops->func2(p, other_args);
result3 = p->ops->func3(p, other_args);

and so on.

Note that, if this were to be used in a serious program, the common
"generic" context would go in some header file, along with the
declarations of the various builder functions. The actual
implementations (and the specific contexts that contain the generic
context as their first element) can then be in a separate translation
unit. The actual contents of a specific context are thus
well-contained, with the interface being determined entirely by
the generic context. That generic context may be as simple or as
complicated as you like -- the only real constraint is that it is
fixed at compile-time, and all the functions have the same "type
signature" (return value, and number-and-types of parameters).
 
T

Tor Rustad

Barry Schwarz said:
On Sun, 20 Mar 2005 23:59:43 +0000 (UTC), Marco Loskamp



I wonder if casting a void* to a function pointer is legal.

This cast is _not_ "legal".

void * is a generic data pointer, and e.g. void (*f) (void) is a generic
function pointer, but there is no generic pointer for both data _and_
functions provided in standard C.

For example on old 16-bit x86, we had memory models with near and
far pointers, and there was an example where data used near pointer
(16-bit), while functions used far pointers (32-bit). Of course, if
function pointers are wider than data pointers, the cast doesn't
work.

To make a generic data and function pointer, we need something ala

typedef struct
{
int ptr_type;

union
{
void *data;
void (*func)(void);
} ptr;
} pointer_t;
 
T

Tor Rustad

Tor Rustad said:
#include <stdlib.h> /*malloc */

This example, show how a function pointer can be
malloc'ed at runtime:


#include <stdio.h>
#include <stdlib.h>


typedef int (*func_t)(const char *, ...);

int main(void)
{
func_t f, *pf;
struct hack


func_t f;
} *phack;

f = printf;
printf("%-10s has address: %p\n", "printf", (void*)printf);
f("%-10s has address: %p\n", "f", (void*)f);

/* Non portable conv between void* and function pointer */
pf = malloc(sizeof *pf);
if (pf == NULL)
exit(EXIT_FAILURE);

*pf = printf;
(*pf)("%-10s has address: %p\n", "*pf", (void*)(*pf));


/* Try to hack a malloc function pointer */
phack = malloc (sizeof *phack);
if (phack == NULL)
exit(EXIT_FAILURE);

phack->f = printf;
phack->f("%-10s has address: %p\n", "phack->f", (void*)(phack->f));


free((void*)pf);
free(phack);

return 0;
}
 
T

Tor Rustad

Tor Rustad said:
struct hack


func_t f;
} *phack;

An odd copy and paste problem here, anyway the orginal
source says:

struct hack
{
func_t f;
} *phack;

..
 
O

Old Wolf

Marco Loskamp wrote:

Your code is very verbose. Ignoring for the moment that the whole
idea won't work, here are some tips:
// BEGIN CODE
#include <stdio.h>

#include <stdlib.h> /* malloc */
#include said:
int main(void) {
int (*f)(const char *f, ...);
int n;
f = printf;

This may not work in C99 (printf has a different prototype),
I'm not sure.
n = sizeof(int(*)(const char *, ...));

n = sizeof f;
f("Hi! My size is %d.\n", n);
f = (int(*)(const char *f, ...))malloc(n+1);

It would have been better to typedef this earlier, eg:
typedef int (F)(const char *, ...);
F *f;
........
f = (F *)malloc(n+1);

This, of course, is your problem. The size of a function
pointer is likely to be 4 bytes or something similar, but
the function body of printf is probably much larger. There's
no way of telling how big the function body actually is.
memcpy(f, printf, n+1);
if (f == (int(*)(const char*, ...))NULL)

if (f == NULL) /* or: if (!f) */
 
P

Peter Nilsson

Old said:
This may not work in C99 (printf has a different prototype),

It has a restrict qualified first parameter, but that doesn't
affect it's compatibility in this case.
 
F

Flash Gordon

Old said:
Marco Loskamp wrote:

Your code is very verbose. Ignoring for the moment that the whole
idea won't work, here are some tips:


#include <stdlib.h> /* malloc */


This may not work in C99 (printf has a different prototype),
I'm not sure.

To be awkward?
n = sizeof f;


It would have been better to typedef this earlier, eg:
typedef int (F)(const char *, ...);
F *f;
........
f = (F *)malloc(n+1);

This, of course, is your problem. The size of a function
pointer is likely to be 4 bytes or something similar, but
the function body of printf is probably much larger. There's
no way of telling how big the function body actually is.

That is only the first problem.

memcpy expects void* pointers which are pointers to data. printf decays
to a function pointer which is a completely different beast. The
function pointer might be a different size to a void* pointer it might
(and on some DSP processors is) pointing to a completely separate memory
space. So memcpy might not read *any* of the code for printf.
 
D

Dave Thompson

Dear list,

I'm trying to dynamically generate functions; it seems that what I
really want is beyond C itself, but I'd like to be confirmed here.
What you asked for is; what you really want (or need) I can't say.
In the minimal example below, I'd like to create content to put at
the address pointed to by f. In particular, I'd like to avoid/replace
the memcpy line.
Highly unsafe example of trying to merely duplicate a (library)
function, not actually create anything, snipped.
Possible application (inspired by Paul Graham, "ANSI Common Lisp",
page 2): write a function that takes a number n, and returns a function
that adds n to its argument.
If you want to generate arbitrary code, you really need a language
that can manipulate code, either cleanly like the LISP family, or as
strings like many scripting or interpretive languages with some form
of EVAL (perl, many shells), or (at least some) APL, or FORTH.

Or the approach (kludge?) of invoking the full toolchain at runtime.
Standard C does have a library function system() which can pass a
string to an implementation dependent command processor. On normal
general purpose operating systems where the command processor(s) can
run any installed program (subject to privilege) and if you have a C
compiler/toolchain on your runtime system(s) (not always or required
to be true) then you can write source code to a file, use system() to
try to compile it, and if successful use system() to run it passing
any data through files and command-line arguments and possibly (not
fully portable) the exit status value; or less portably through pipes,
shared memory, or various IPC facilities.

If you want to generate only certain limited constructs, like the
simple Currying here, you can build some representation, canonically a
tree, out of primitives which you either walk explicitly or whose
nodes walk themselves, passing any needed data/state around
explicitly. In C++ which is <OFFTOPIC> here go to comp.lang.c++ you
can do this conveniently with a function object (class), which is
basically a hand-built closure with sugared invocation. IINM Boost,
which is a widespread but not yet de-jure standard library for C++,
provides some function-object stuff that may help -- or not. </>

- David.Thompson1 at worldnet.att.net
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,161
Messages
2,570,892
Members
47,427
Latest member
HildredDic

Latest Threads

Top