Strange C developments

BGB · Aug 3, 2012

BGBæ–¼ 2012å¹´7æœˆ24æ—¥æ˜ŸæœŸäºŒUTC+8ä¸‹åˆ10æ™‚55åˆ†33ç§’å¯«é“ï¼š

The GC part can be implememted by reference counts.
But that adds overheads to using every objet.

There is another way to achieve the same funtionalities as the GC.

But that will involve error trapping and serializing objects to be storable to the hard disk, and perform jobs in a shell way that can be safely exited in all modules.

It is as difficult as building the OS mmu part.

typically, tracing GC's, such as mark/sweep collectors, incremental
tricolor collectors, ... are what are commonly used in this case.

usually, the stack and global variables and similar are regarded as
"roots" and are then searched for object references, making it possible
to determine which objects are and are-not visible.

reference-counting tends to be a PITA to make work without compiler
assistance, so is not commonly used as a general purpose GC mechanism.

similarly, given that compacting collectors (such as copy-collectors)
tend not to really be compatible with C (likewise would require compiler
support), they are not generally used.

I am not really aware of anyone using persistent storage *as* the GC.

BGB · Aug 3, 2012

And how do you explain the reports on TIOBE and other sites
according to which C has knocked GC-Java from its top position?

it is worth noting that many larger C or C++ apps also tend to make use
of GC, in particular, Boehm tends to be popular.

so, it isn't an issue of GC vs no-GC, but rather of C vs Java in particular.

likely, much of the status Java holds is more due to hype, but the
downside of Java is that it is actually kind of a painful language to
write code in (once one goes much outside the "there is a library for
everything" territory, writing actual code in the language tends to be
awkward and painful, due mostly to its built-in limitations...).

but, for a language that lets the programmer write the code they want to
write, it starts looking more like C.

to a lesser extent is the VM vs no-VM issue, but sadly, given the way C
works, it is a language which is a bit harder to run generally within a
VM, and the alterations needed to make it work more effectively in a VM
setting would tend to violate the standard (a person could still have a
language which looks and behaves much like good old C, but would have
quirks that would likely make it so that the "100% of conforming
software still works" assumption is no longer upheld, and a person may
instead be looking at, say, 95% backwards compatibility...).

I have wanted to make a fully VM-managed C for a while, but most of my
attempts tend to stall out mostly due to the effort this would require,
and that most of my time/effort is better spent elsewhere in the project
(and also that most of the power of C is captured well enough by the
natively compiled variant).

BGB · Aug 3, 2012

Nick Keighley said:
Nick Keighley said:

#the nonsense is almost all [but 2 people not] that write here
#have fear in think or managiament memory they programs use

Click to expand...

incomprehensible.

if I guess right you suggest I'm scared to write my own version of
malloc.
I simply don't see the point.

Click to expand...

One of the benchmarks[1] I was using on my compiler project was,
miraculously, faster than gcc, no matter what the latter's optimisation
level.

also happened to me before.
in the past I had cases where JIT-compiled BGBScript code was
outperforming GCC-compiled C code (usually for trivial tests, like
recursive Fibonacci or doing arithmetic in a loop).

nevermind if this is hardly the usual case. being a VM-based scripting
language, and currently running as threaded code (almost purely function
calls back into the C-based interpreter logic), it is generally
considerably slower than natively compiled C.

(the issue is that a full JIT is terrible to try to maintain in the face
of a language/VM design that is still a moving target).

Ie. always 4.2 seconds, compared with gcc's 5.6 seconds. Then I discovered
the timing of this benchmark is mostly based around calls to malloc() and
free(). Two other C compilers were both around 2.4 seconds. I was using
lcc-win's library.

I plugged in my own allocator, built on top of malloc(), which was
better at
lots of small allocations, and my timing was then 1.2 seconds!

yeah.

pretty much how I got into writing memory managers originally, was long
ago off in the land of Linux (although my memory is fuzzy, as it may
have been on DJGPP+DOS, I forget, this was around the late 90s), I was
writing some apps that were allocating lots of small strings and
similar. "malloc()" would chew through memory, and then often cause the
app to crash as well.

I later devised a solution:
I would allocate a 1MB block of memory, and created a small linked-list
allocator (the block would have a link to the start of the free-list),
and used best-fit. I was able to considerably reduce memory-bloat and
crashes by doing so.

GC started being used on-off early on as well (it has been a long hard
road getting it "just right" though).

I later improved density by switching from a free-list to fixed-size
memory cells (originally 8 bytes) and using a bitmap for allocation, and
improved performance by switching over to first-fit, and capacity (by
allocating more chunks as-needed).

later on, I switched to 16-byte cells and 4MB chunks, as well as tending
to fall back to using a different allocation strategy (currently pulling
the actual memory from malloc) for larger objects (several kB and
larger), mostly as this tends to be faster (the performance of scanning
a bitmap for spans of free cells drops off rapidly as the object size
gets larger).

similarly, free-lists were also devised for the small case as well,
mostly by daisy-chaining free objects of a particular number of cells
(used prior to a bitmap-based scan). this handled the case of up to 256
cells (4kB).

....

actually, I remember much more recently (on WinXP, likely using either
MinGW or MSVC, I forget) writing an small app which used "malloc()" for
small strings, and it also chewed through absurd amounts of memory (it
processed text files, allocating strings for individual tokens, ..., and
could easily run up against the 3GB process-size limit). so, a custom MM
still makes some sense here as well.

So, you were saying about there not being any point? (Of course this was
not
rewriting malloc(), but the large memory blocks I worked with could just
as easily have come from the OS, and given had the same results.)

if the object becomes significantly larger than the OS page size (not
necessarily 1:1 with HW pages, for example Windows uses 4kB HW pages,
but 64kB OS pages), then the cost overhead of just allocating a raw
chunk of memory gets much smaller (an only partially-used page is small
vs the size of the object itself).

[ actually, I have before wondered about bi-level allocation in a
filesystem, say, where 64 or 256 byte cells are used for smaller files
(probably by subdividing blocks), and the usual 4kB / 8kB / 16kB blocks
are used for larger ones, but I have little say over what OS developers
do, and app-local pseudo-filesystems are of little relevance to OS
people. (note that the Linux EXT filesystems instead work by typically
storing small files directly within the "inode" structure, but this
creates a jump between what will fit within the inode, and what will
require full disk blocks). ]

but, if smaller, using a full page to allocate a small string is almost
pure waste.

(Sadly I had to retire this benchmark, as it was not actually testing
generated code.)

[1]
http://shootout.alioth.debian.org/u32/program.php?test=binarytrees&lang=gcc&id=1

Malcolm McLean · Aug 3, 2012

×‘×ª××¨×™×š ×™×•× ×©× ×™,30 ×‘×™×•×œ×™ 2012 10:09:41 UTC+1, ×ž××ª Nick Keighley:

Since
malloc() is suppiled with my C library I've had no compulsion to write
my own.

Life's too short.

In Basic Algorithms I covered things like memory allocators, meths library routines, Fouruer transforms. Generally you don't have to code these for real. But if you don't know how they work, you've no insight into the isues surrounding their use.

Stefan Ram · Aug 3, 2012

BGB said:
downside of Java is that it is actually kind of a painful language to
write code in

There is a slight disturbance in what sometimes looks like
redundant code, such as

if( i instanceof java.lang.CharSequence )
{ java.lang.CharSequence s =( java.lang.CharSequence )i; ... }

, but in Java SE (standard edition) you can at least write a
portable GUI program using sockets, regular expressions and
file system access without external libraries, all of which
is not possible with the standard C library.

Java is lacking some features, like closures, but C does not
have closures either.

but, for a language that lets the programmer write the code they want to
write, it starts looking more like C.

C is used for core/kernel software, where one wants to have
direct access to the ABI of the OS or even to hardware or
when one needs highest performance or realtime response, or
for small microcontrollers, that do not allow for a large
runtime.

Jens Gustedt · Aug 3, 2012

Am 03.08.2012 13:13, schrieb tom st denis:

Why would you need such a "heap" marker syntactically?

For your first example for the pointer (syntactically) I would just do

int foo[1];

An implementation could just decide to allocate "auto" arrays on the
heap, and free everything at the end of the liftime of the variable. You
don't have to add a jota to the syntax of C for that.

The downside of such a "heap" approach for array allocation is that the
implementation of setjmp/longjmp would be come a bit more complicated.
You can't just re-adjust the stack pointer but you'd have to re-invent a
mechanism that "free"s all the allocated arrays on the way down, once
you jump to a different function stack. Isn't completely trivial.

Click to expand...

Are there real people who use setjmp()/etc in everyday code? I've
never once used them despite all the oddball places my code has been
found.

youp, I do

But really don't think of every day's programmer for that, but e.g
people that implement C++'s try/catch mechanism or other such stuff
for higher languages in C.

Said otherwise, deleting setjmp/longjmp from the language is certainly
not an option. It is a core feature.

And while the compiler could automagically decide to take from the
heap for auto variables it would be nice for that to be more manual.

perhaps

I was trying to think on a more realistic line. If you'd want such a
feature, there is nothing essential that would hinder you to implement
it first for auto variables without letting a choice to the
programmer. This can be done without extending the language. It would
have advantages (namely avoid stackoverflow, namely for VLA) and
downsides (a bit of performance loss).

If then you have a system that implements this, runs smoothly without
a bug for some time on a large variety of systems and code, you may
even propose the extension to allow to specify that explicitly, why
not.

Jens

BGB · Aug 3, 2012

There is a slight disturbance in what sometimes looks like
redundant code, such as

if( i instanceof java.lang.CharSequence )
{ java.lang.CharSequence s =( java.lang.CharSequence )i; ... }

, but in Java SE (standard edition) you can at least write a
portable GUI program using sockets, regular expressions and
file system access without external libraries, all of which
is not possible with the standard C library.

note that the omitted context ("once one goes much outside the 'there is
a library for everything' territory").

or, IOW, where a person has to write a lot of their own logic code,
without being able to fall back on library functionality (IOW: assume
there are *no* pre-existing packages or classes to solve a problem, and
the person has to write all their own code).

on this front, Java gets a bit more painful (vs either C or C++).

Java is lacking some features, like closures, but C does not
have closures either.

newer Java "sort of" has closures.
C also has closures, sort of, if one is willing to live with
compiler-specific extensions.

C is used for core/kernel software, where one wants to have
direct access to the ABI of the OS or even to hardware or
when one needs highest performance or realtime response, or
for small microcontrollers, that do not allow for a large
runtime.

or for things like developing non-trivial apps or 3D games or similar...

yes, there is Minecraft, but it is a relative oddity in these regards...

BartC · Aug 4, 2012

In Basic Algorithms I covered things like memory allocators, meths library
routines, Fouruer transforms. Generally you don't have to code these for
real. But if you don't know how they work, you've no insight into the
isues surrounding their use.

This site's pretty good too:

http://www.malcolmmclean.site11.com/www/

Phil Carmody · Aug 10, 2012

BartC said:
This site's pretty good too:

http://www.malcolmmclean.site11.com/www/

But does it cover things like meths?

Phil
--

I'd argue that there is much evidence for the existence of a God.

Pics or it didn't happen.
-- Tom (/. uid 822)

Tim Rentsch · Sep 7, 2012

io_x said:
the conter-point is simply this: programmer has to think
on memory of his/her program...

That's not a counter-point, it's a position statement.
To be a counter-point there needs to be some sort of
evidence offered, not just opinion.

Tim Rentsch · Sep 7, 2012

Phil Carmody said:
Tim Rentsch said:

Folklore. Measurements of real programs using, eg, the Boehm collector,
show macro-scale performance similar to, or better than, the same
program using manual rather than automatic reclamation. Micro-scale

[snip]

Some notes I have collected on the subject (repost): [snip]

Click to expand...

A nice collection of quotes. Thank you for reposting them.

Click to expand...

He missed this quote regarding Boehm's GC (relative to malloc.free):

"for programs allocating primarily large objects it will be slower."

Which was said by some guy called 'Boehm'.

That is a good quote to know! thank you.

Kaz Kylheku · Sep 9, 2012

you can not compare what you control of what you not controll
at all...
if someone want to be a programer, he/she has to build his/her own
malloc routines only for to have more
controll hisself/herself program memory...
so the way is opposite of use gc() etc...

It's not opposite because you can build your own gc and tune it endlessly
just like malloc and free.

Your application layer might not control the exact timing of when a given
piece of storage is reclaimed, but that's not the same thing as having no
control whatsoever.

Weird Behavior with Rays in C and OpenGL	4	Feb 13, 2024
Seeking co-founders for my company.	3	Sep 8, 2024
Threading tutorial for C++?	6	Oct 19, 2012
C language now truly universal	0	Jan 1, 2011
ANN: eGenix mxODBC Connect 2.1.0 - Python ODBC Database Interface	0	May 28, 2014
C++ Now 2013 Call for Submissions	0	Oct 31, 2012
C++ developer	0	May 16, 2011
I'm tempted to quit out of frustration	1	Aug 13, 2023

Strange C developments

BGB

BGB

BGB

Malcolm McLean

Stefan Ram

Jens Gustedt

BGB

BartC

Phil Carmody

Tim Rentsch

Tim Rentsch

Kaz Kylheku

Ask a Question

Similar Threads

Staff online

Members online

Forum statistics

Latest Threads