global/static variables & loops

M

Malcolm McLean

 Can you give an example of a non-trivial program, that has no global
state in any way?
Ultimately you need some way of doing IO, which means communicating
with some physical device, which is likely to mean memory mapping at a
low level and thus global control variables at a high level.

But if we can assume that fopen() and printf() work by magic, then
it's possible to write the program without any global state. Trivally,
you could declare a struct myglobals and pass it to every function in
the call tree. But often you don't need to do this, the program it's
just natural to declare the few persistent state varibales as local to
main and pass them as parameters.

The program I'm currently working on has no globals or statics, for
example. It takes quite a rich set of command line arguments friom the
user, ands so it calls a general purpose "options parser" which stores
the command line in a temporary structure, and allows queries. Then it
destroys that structure after extracting about a dozen parameter
variables. It then loads a csv file into a CSV structure, extracts two
columns of data as specified by the user, and destoys the CSV
structure. The columns are converted to a data density map, using a
palette specified byt eh user, and then written out as a JPEG.

So no globals required. It does have state in main which persists for
the life of the program, however.
 
A

Anders Wegge Keller

Malcolm McLean said:
But if we can assume that fopen() and printf() work by magic, then
it's possible to write the program without any global state. Trivally,
you could declare a struct myglobals and pass it to every function in
the call tree.

That is global state in my book.
But often you don't need to do this, the program it's just natural
to declare the few persistent state varibales as local to main and
pass them as parameters.

As is this, if you need to propagate the same value unchanged down
the call tree from main().

My point is that no matter how you try to hide it, whether it's to
get rid of global variables, just because they are baaad, or if you
actually need to do something in a smarter way, there is hardly a
program that hasn't got a global state. And in that case, I fail to
see why the program would be any worse, by actually admitting the
fact.
 
J

James Kuyper

That is global state in my book.

If that counts as global state, then storing global state in global
variables is clearly not necessary.

....
My point is that no matter how you try to hide it, whether it's to
get rid of global variables, just because they are baaad, ...

Your use of "just" and "baaad" implies that you think that the idea that
globals are bad is an unjustified prejudice, rather than the
well-justified judgment that it actually is. Are you actually unaware of
the problems that they can cause?
... or if you
actually need to do something in a smarter way, there is hardly a
program that hasn't got a global state. And in that case, I fail to
see why the program would be any worse, by actually admitting the
fact.
n
The key problem with global variables is visibility; by being visible
everywhere, they can be affected by anything, which can make it
difficult to determine what part of the program caused those variables
to have their current value, or what part had it's behavior affected by
the value. Variables which are local to main() and are passed by a
pointer to subroutines need not be passed to all subroutines, but only
the ones that actually need to have access to those variables, and that
could be a different set of subroutines for different variables. Making
information available to a subroutine only on a "need to know" basis is
good software design. It can be overdone, but that's seldom a real-world
problem.
 
M

Malcolm McLean

If that counts as global state, then storing global state in global
variables is clearly not necessary.
If you take a program, list every global variable, pack them into a
structure, and then pass a pointer to the structure to every function
in the call tree, you aren't really achieving much. However you are
making it easier for the program to accept two states (think a GUI
program going from a single document model to a tabbed interface
allowing multiple documents.
The key problem with global variables is visibility; by being visible
everywhere,
The other problem is portability and testing.

Lets say we do this

EMPLOYEE myemployees[100];
int Nemployees;
double averagesalary()
{
/* step through the employee array calculating average salary */
}

Now it's hard to test. We have to set up the employees structure to
check it.

Let's say we do this instead

double average(void *data, int offset, int stride, int N)

it takes a bit more writing. But once we've done, we can copy and
paste and use the code in many different applications, which might not
have anything else in common with each other. And we're much more
likely to document what happens when N == 0, which is where the bug is
likely to creep in, or what happens when we get Mr Fred the Shred and
salary goes to DBL_MAX (apologies to American readers who won't
understand this reference).
 
A

Anders Wegge Keller

James Kuyper said:
On 02/15/2012 12:25 PM, Anders Wegge Keller wrote:
If that counts as global state, then storing global state in global
variables is clearly not necessary.

No, but when the effect is the same anyway, why add another level of
complexity?
Your use of "just" and "baaad" implies that you think that the idea
that globals are bad is an unjustified prejudice,

It very often is an unreflected, almost religious observance. Just
like "gotos are bad". The next thing you see, is people mechanically
replacing a global variable with a static, and adding getters and
setters because the same state is still needed globally.
rather than the well-justified judgment that it actually is.

If it quacks like a duck, walks like a duck and swims like a duck, it
probably is a duck, no matter how you dress it up.
Are you actually unaware of the problems that they can cause?

If people write bad code, they will get into trouble, no matter what
paradigm they adhere to. I can think of lots of ways a program can
become difficult to manage by doing something stupid with a
global. But that is not exclusive to that style.
The key problem with global variables is visibility; by being
visible everywhere, they can be affected by anything, which can make
it difficult to determine what part of the program caused those
variables to have their current value, or what part had it's
behavior affected by the value.

Scratch the getter/setter pattern, and the global struct pointer
that's passed down the call tree then. *OR* prototype the darn things
as const outside the translation unit that actually has to modify
them, and see where the compiler complains. (And remember to remove
the const prototype afterwards, unless you want to see some
spectacular failures :)
Variables which are local to main() and are passed by a
pointer to subroutines need not be passed to all subroutines, but only
the ones that actually need to have access to those variables, and that
could be a different set of subroutines for different variables.

Unless you have a very limited state space, that will lead to some
very convoluted and error-prone code.
Making information available to a subroutine only on a "need to
know" basis is good software design. It can be overdone, but that's
seldom a real-world problem.

You should try to do real-time machine control with 10-15
co-concurrent tasks running on a 80186 platform, where each individual
thread has a stack in the range 128 to 256 bytes, and avoid overdoing
need-to-know :) I'm in the 0.1%, where overdoing things are a
real-world problem.
 
J

James Kuyper

On 02/15/2012 01:51 PM, Malcolm McLean wrote:
....
If you take a program, list every global variable, pack them into a
structure, and then pass a pointer to the structure to every function
in the call tree, you aren't really achieving much.

Yes, that's right. I wouldn't pass the whole set of variables in main()
to each subroutine of main(), but only the parts needed by that subroutine.
 
A

Anders Wegge Keller

Malcolm McLean said:
Lets say we do this

EMPLOYEE myemployees[100];
int Nemployees;
double averagesalary()
{
/* step through the employee array calculating average salary */
}

I don't know if I've made myself clear previously, but I'm in this
thread, asking about programs having *NO GLOBAL STATE AT ALL*.

No debug levels.
No log destinations.
No file handles.
No nothing.

I have no trouble seing why *DATA* should not be globally visible,
unless there are some very good reasons for them being so, like my
co-existing embedded threads, mentioned elsewhere. But that is beyond
the express negative I questioned earlier.
 
A

Anders Wegge Keller

James Kuyper said:
On 02/15/2012 02:30 PM, Anders Wegge Keller wrote:

Well, that works both ways - labeling it a prejudice doesn't make it
cease to be a well-justified judgment.

Yoy have missed the point. I'm talking about cases like

int debug_level;

versus:

static int debug_level
int get_debug_level(void) { return debug_level; }
void set_debug_level(int lvl) { debug_level = lvl; }

I'm not trying to justify "All data should be global", as you seem to
imply. I'm questioning "No data should be global". There is a world
between those two statements.
I'm not sure what any of that word salad has to do with what I was
talking about. Can you re-write it to make the relevance clearer?

If my explanation above doesn't make it clearer, please ask
again. But I think we have two very different opinions about what
"state" actually means at present.
Not in my experience.

Try implementing GNU ls without globals. I know that the whole lot
are statics, since the code is in just one TU, but imagine something
of a similar magnitude then. The first few levels of functions will
have prototypes of several pages.
Limited stack space is something I've little recent experience with,
but I agree that it's an excellent reason to avoid using the stack;
in extreme cases, even passing pointers on the stack becomes
problematic - but you should rarely need to go all the way to global
visibility to avoid that problem. You have a greater need to use
objects with static storage duration than I do, but you don't have
to make every single one of them globally visible. I've a number of
such objects in my own code. Most of them have block scope, a few
have file scope - but none has external linkage.

When data is shared between several threads, and each thread live in
its own translation unit, that cannot be helped. The central part is
an array-esque (80186 memory model limitations) representation of
system state. Most manipulation happens through a set of state-event
machines, each doing their part.

And mind, this is hardware that was state of the art in the late
1980's, and we have been stuck on that ever since, because it works
too good to warrant a major architecture rewrite.
 
J

James Kuyper

Yoy have missed the point. I'm talking about cases like

int debug_level;

versus:

static int debug_level
int get_debug_level(void) { return debug_level; }
void set_debug_level(int lvl) { debug_level = lvl; }

That's just a global dressed up in fancy clothes. It has all of the
disadvantages of a global, while being unnecessarily more complicated
than defining a global. If the setter function at least implemented some
kind of validity test on the new level, the example would make a little
more sense, but I had not considered that you were arguing against this
kind of idea.
I'm not trying to justify "All data should be global", as you seem to
imply. I'm questioning "No data should be global". There is a world
between those two statements.

I'd argue for "little, if any, data should be global". The only objects
that should have external linkage are those which will be used almost
everywhere in a program, and in my experience there aren't many of
those. If there are, poor design is usually to blame. To take your
example, I could quite easily imagine wanting to have different debug
levels in different parts of the program, which would make a single
global debug level inappropriate.

...
Try implementing GNU ls without globals. I know that the whole lot
are statics, since the code is in just one TU, but imagine something
of a similar magnitude then. The first few levels of functions will
have prototypes of several pages.

I don't have time to implement the entirety of ls; if the sheer
complexity of ls is your point, then I won't be able to address it
directly. If there's some particular feature of ls that you think
requires use of globals, I could possibly spare time to implement a
simplified version providing only that feature.
 
A

Anders Wegge Keller

James Kuyper said:
On 02/15/2012 03:30 PM, Anders Wegge Keller wrote:
That's just a global dressed up in fancy clothes. It has all of the
disadvantages of a global, while being unnecessarily more complicated
than defining a global.

I'm happy to see that I managed to make myself clear. But sadly, I've
seen text books and style guides that advocates exactly this pattern.

...
I'd argue for "little, if any, data should be global". The only
objects that should have external linkage are those which will be
used almost everywhere in a program, and in my experience there
aren't many of those.

Apart from our embedded code, most of what I write for a living are
event-driven processes, that do a lot of IPC with each other. That
gives quite a lot of state, that either has to be a global, or be
dressed up as something else with the same properties, as the naive
example from last post.
If there are, poor design is usually to blame. To take your example,
I could quite easily imagine wanting to have different debug levels
in different parts of the program, which would make a single global
debug level inappropriate.

Actually you have CHAR_BIT * (sizeof(int)/sizeof(char)) debug
masks. If that isn't enough, the process should probably be split into
two or more parts that are a bit more focused.
I don't have time to implement the entirety of ls; if the sheer
complexity of ls is your point, then I won't be able to address it
directly.

I stated ls as an antithesis to

"Variables which are local to main() and are passed by a pointer
to subroutines need not be passed to all subroutines, but only
the ones that actually need to have access to those variables, and
that could be a different set of subroutines for different
variables."

Dividing the whole set of options into structs that are passed on a
stict "need to access"-basis for each individual function called from
main() will in effect lead to passing almost any command line option
as a separate argument down the call graph from main().
 
S

Stefan Ram

Anders Wegge Keller said:
Can you give an example of a non-trivial program, that has no global
state in any way?

Global state is something else than a global variable.

Most programs don't have a global state. For example, when I
call a library function, such as »rand()«, whatever »global
variables« I declare in my application won't be visible by
»rand()«. So they are not global in the sense that they are
visible from /every/ part of the program: »rand()« is a part
of my program, yet does not see them.

And Haskell programs don't have any state at all, but can
become non-trivial, like Pugs, which is an implementation of
Perl 6 written in Haskell.

»Global«, of course, does not exist in C, but we have »file
scope« & »external visibility«. But still, a compilation
unit can choose not to declare such a variable (which is
defined in another compilation unit), and then it won't see it.
And »libraries« usually do so, yet are part of the program.
 
A

Anders Wegge Keller

Global state is something else than a global variable.

No matter how elaborate you wrap it up, there will be a variable or
other storage object underneath in the end. And if direct access to a
global variable is a performance penalty, I fail to see how wrapping
them in another layer is going to help matters.
 
M

Malcolm McLean

 Dividing the whole set of options into structs that are passed on a
stict "need to access"-basis for each individual function called from
main() will in effect lead to passing almost any command line option
as a separate argument down the call graph from main().
Depends on the program. If you allow a "language" parameter for a text-
intensive interactive program, then yes, pretty much everything will
want to access the language the user has chosen. On the other hand if
you allow the user to specify the name of the output file, that only
needs to be passed to the routine that actually calls fopen(), which
might well be main() itself.
 
K

Kaz Kylheku

Only if it never makes sense to have more than one instance of the model
manipulated by the same program.

Suppose it makes sense to have three instances of the model.

Where will you keep those three instances?
 
K

Kaz Kylheku

Global state is something else than a global variable.

Most programs don't have a global state. For example, when I
call a library function, such as »rand()«, whatever »global
variables« I declare in my application won't be visible by
»rand()«. So they are not global in the sense that they are
visible from /every/ part of the program: »rand()« is a part
of my program, yet does not see them.

You're bending the definition of global variable beyond what is reasonable. Any
function that simply does not mention a variable X does not see that variable.

X does not have to be mentioned in every scope in order to be considered
global.

Of course a standard library function like rand isn't going to contain
unresolved references to globals in your program.
 
A

Anders Wegge Keller

Kaz Kylheku said:
You're bending the definition of global variable beyond what is
reasonable. Any function that simply does not mention a variable X
does not see that variable.

If directly acessing a global variable inside a loop is bad for
performance, then so is acessing any other global object hidden even
further.
 
J

James Kuyper

I stated ls as an antithesis to

"Variables which are local to main() and are passed by a pointer
to subroutines need not be passed to all subroutines, but only
the ones that actually need to have access to those variables, and
that could be a different set of subroutines for different
variables."

Dividing the whole set of options into structs that are passed on a
stict "need to access"-basis for each individual function called from
main() will in effect lead to passing almost any command line option
as a separate argument down the call graph from main().

I had intended some judgment to be used; need-to-access must always be
traded off against other issues. Since I assumed that your challenge was
based upon the complexity of ls, I haven't bothered to give the design
any thought; but I would expect from previous experience that many of
the options would have corresponding data that must be passed with them,
and would therefore be grouped together in the same structure. In many
cases it will be convenient to tell a subroutine that a given option is
not turned on, by passing a null pointer as the argument which would
otherwise point to the data associated with that option.
 
E

Eric Sosman

Suppose it makes sense to have three instances of the model.

Where will you keep those three instances?

C'mon, Kaz, you know better. Usually, an instance inhabits a
`static' somewhere (so its name is not available outside the scope
of its keeper), or it inhabits dynamically allocated memory (which
has no name at all).

True story: A former employer's flagship product did things with
documents, and the earliest versions could handle only one document
at a time. THE document was, fairly naturally, described by a host
of global variables: What are THE page margins, what is THE set of
paragraph styles, what is THE associated file name, and so on. At
some point (before I joined), the product was extended to handle
multiple documents simultaneously -- but by then, all those globals
had infiltrated themselves into too many places to extricate; there
just Was Not Going To Be an attempt to track down every reference to
every global and route it through a pointer instead, nor to inflate
all the functions and function calls to pass the pointer around.

Solution? The overarching framework noticed when the user moved
from Document A to Document B, and swapped things around to make it
work. It saved all the globals for A into one big struct, and then
restored all their values from B's struct (build-time tools created
the struct definition and the save/restore code, with a little help
from source-code markers to identify the globals). When the user then
moved to Document C, B's globals were squirrelled away and C's values
overwrote them.

By the time I left that employer, the size of the automatically-
generated struct and of the save/restore functions had grown to the
point where some platforms' compilers could no longer handle them,
and the tools had been modified to break the struct into three, with
three sets of functions. Also, just moving your mouse from Window A
to Window B could drive your paging disk berserk as it tried to handle
all these references to globals scattered hither and yon all over the
address space; you could force your workstation to its knees just by
flicking your mouse back and forth on the screen.

That's the power of globals.

I take two lessons from this experience: First, sufficient ingenuity
and a willingness to hack can come up with a short-term solution to most
any problem. Second, short-term solutions have little staying power.
 
E

Eric Sosman

Then show me the »definition of global variable«.

The C language has no definition for the term (as I'm fairly
sure you know).

Often, what people mean by "global variable" is "a variable with
static duration whose identifier has external linkage." Sometimes a
phrase like "file global" or "local global" (hey, I didn't invent it!)
means "a variable with static duration whose identifier has internal
linkage." I have never seen the word "global" attached to a variable
whose identifier has no linkage.

And then there are people who'll use "global" to refer to things
that aren't "variables" in C-speak, but which encapsulate or provide
access to some piece of program-wide state. Elsethread I used `stdout'
in this way and caught pedantflak for doing so, but since "global"
itself is not defined by C I'm not too sure what ground the pedants
stand upon. At an uncommon stretch, `getenv("SHELL")' could be said
to be "a global," though I personally wouldn't use the term that way.

So, what are the concerns raised about "globals" in this thread?
I see two:

- First, access to a "global" variable may be slower than access
to some other variables, or the use of a "global" may inhibit
optimizations the compiler might otherwise be able to perform.
My take: Probably not worth worrying about. Certainly not worth
worrying about "Yet," in the sense of Jackson's Laws.

- Second, the universal availability of a "global" variable can
produce unanticipated interactions between (supposedly) unrelated
pieces of the program. To my mind, this is by far the worse
drawback.

Globals are not Evil, just Perilous. They are not to be rejected
altogether, but neither should they be used liberally. If you can make
a good case for `X' being global, go ahead and make it so -- but first,
be sure you can make the case.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,082
Messages
2,570,589
Members
47,211
Latest member
Shamestone

Latest Threads

Top