Non-constant constant strings

J

James Kuyper

....
Also, the OS/360 examples from "Mythical Man-Month" included cases
where people were given a byte limit on their code. OS/360 also
includes an overlay linker, so when someone approached the limit
they could start using overlays. I might even imagine someone meeting
the test cases (speedier debugging) without overlays, but in actual
use it would run too slow.

If a byte limit was provided, and a speed limit wasn't, is there
anything wrong with those results? It doesn't sound like cheating, it
sounds like a failure at the requirements specification level.

For six years, my boss repeatedly asked our client how fast a key
program had to run. Our client repeatedly said "It doesn't matter";
getting a long laundry list of desired features implemented was far more
important to him than any issue of run times. "Not even if it takes
several hours to run?" - "Don't worry about it - just get these features
implemented before launch".

However, when implementation of one of those features increased the run
time from 7.5 minutes to 40 minutes, suddenly our client decided it was
too slow. We never did convince him to provide a specific requirement,
but after a complete reorganization of the code that took several
months, and convincing a third-party vendor to re-write a very
inefficient library routine, I managed to reduce the run time below 5
minutes, and our client was satisfied. I'd known for years that such a
reorganization would speed up the code (though I wasn't sure how much of
a speed-up to expect), and it would have saved me a lot of work to
implement it earlier. However, my boss could never have justified having
me spend that much time on it, time that could have been spent
implementing new desired features, until our client changed his mind
about his priorities.

If you don't specify your actual requirements, you can't reasonably
expect them to be met, and the earlier that you specify them correctly,
the less expensive it will be to meet them.
 
G

glen herrmannsfeldt

Ike Naar said:
(snip)
(snip)
Let's say the requirement was that the algorithm should work for any
positive integer, and let's suppose that integers are 32 bits wide..
Then a test setup or fixture that would not be "wrong" would require
approximately 2 to the power 31 test cases. That's a huge number of
test cases.

A specific case of statistical independence.

If the test values are statistically independent from the program,
then you can, on the average, compute the probability that the
test will fail. You compute enough cases to satisfy the
statistical requirement.

If they are not statistically independent, then that doesn't work.

Among others, the financial problems in 2008 were the result of
assuming statistical independence when it wasn't true.

In taking measurement on physical systems, such as the length
of a rod, if you take N statistically indpendent (Gaussian random
error) measurements, and average them, you can reduce the random
error by a factor of sqrt(n).

If they are not statistically independent, say your ruler was
not calibrated right, the error doesn't average away.

-- glen
 
G

glen herrmannsfeldt

(snip, I wrote)
If a byte limit was provided, and a speed limit wasn't, is there
anything wrong with those results? It doesn't sound like cheating, it
sounds like a failure at the requirements specification level.

Reminds me how many companies have ways to keep track of the work
of individuals, but no system to keep track of the company's goals
overall.

So, yes, it wasn't cheating but, put together, the system didn't
do useful work.
For six years, my boss repeatedly asked our client how fast a key
program had to run. Our client repeatedly said "It doesn't matter";
getting a long laundry list of desired features implemented was far more
important to him than any issue of run times. "Not even if it takes
several hours to run?" - "Don't worry about it - just get these features
implemented before launch".
However, when implementation of one of those features increased the run
time from 7.5 minutes to 40 minutes, suddenly our client decided it was
too slow.

To get back to C, some years ago in working on a project, one person
had written a routine (combination of C and C++ just to make it
confusing) that pretty much read a file a line at a time, concatenating
them with strcat(). (I believe properly considering the size, such
that it could avoid running off the end.)

Now, maybe not so noticable, and not in the simple test cases,
but the resulting loop runs O(n**2). In the end, n got to be in
the millions (reading lines of about 60 characters each) and
the time got into minutes. I had to go through and figure out
why it was so slow, and to fix it.

-- glen
 
K

Keith Thompson

Ike Naar said:
Let's say the requirement was that the algorithm should work for any
positive integer, and let's suppose that integers are 32 bits wide..
Then a test setup or fixture that would not be "wrong" would require
approximately 2 to the power 31 test cases. That's a huge number of
test cases.

Not really. If the computation isn't horribly expensive, you can run
through 2**32 examples in a few minutes.

2**64 would likely be intractable, though.
 
I

Ike Naar

Not really. If the computation isn't horribly expensive, you can run
through 2**32 examples in a few minutes.

One would have to create the test cases before one could run through
them.
 
K

Keith Thompson

Ike Naar said:
One would have to create the test cases before one could run through
them.

It depends on the algorithm. If you have to write 2**32 test cases
manually, then yes, that would be bad. But there are plenty of
algorithms for which you can detect fairly cheaply whether a given
result is correct.
 
R

Robbie Brown

snip


Let's say the requirement was that the algorithm should work for any
positive integer, and let's suppose that integers are 32 bits wide..
Then a test setup or fixture that would not be "wrong" would require
approximately 2 to the power 31 test cases. That's a huge number of
test cases.

Oh go on ... you know what I mean. You don't write a test for every case
for goodness sake. Do you know what a boundary case is?
More often than not IME code fails at the boundaries of it's
environment, so a simple method/function that returns the next positive
int starting from 0 would have at the least two test cases.

1. What comes after 0
2. What comes after (MAX_POSITIVE_INT_SIZE_ON_THIS_MACHINE - 1)

So test 1 would expect a result of 1
test2 would expect a result of MAX_POSITIVE_INT_SIZE_ON_THIS_MACHINE

You would also have some other tests for some random selection of
non-boundary cases, say what comes after 999, 3, 2346 etc etc

If you find yourself struggling to analyse the requirements sufficiently
well to write a meaningful test then I suggest your decomposition of the
problem space into 'brain sized chunks' needs some more work.

.... and before anyone points out the obvious, testing what happens when
an argument outside the required params is passed is a *separate test
case* (one or more error cases)
 
P

Paul N

The way I look at it, a programming language is mechanism for specifying
*behavior*. How the implementation translates my C code into machine
language is of little concern to me as long as the resulting program
behaves the way I want it to. I don't even necessarily know or care
which flavor of CPU my code is running on.

I don't think this is a universal view of C. C is derived (as I may have said before :) ) from BCPL, and one of the stated aims of BCPL was to eliminate hidden overheads. I believe C was also intended to follow this philosphy. So users can expect their code to do no more than what they have written, and this may well be one of the reasons why they are using C in the firstplace.

I think this is one of the main conceptual differences between C and C++, where in C++ the emphasis is indeed on what gets done as opposed to how it gets done. For instance, a simple a = b + c; can involve calling two functions, either or both of which might be highly complex. C++ is not just C with a few extra features.
 
B

Ben Bacarisse

<snip>

A curiosity:
int r= (*lookupfuncptr)(a,b);

You could (an I would) just write

int t = lookupfuncptr(a,b);

Function calls in C are now defined as being done via a pointer, so when
you write functionname(args) the function designator, functionname, is,
conceptually, converted to a pointer, just like an array expression
would be. As a result, the * on a function pointer, which used to be
required, is now a little odd. It does no harm, but the de-reference
produces an expression of function type that gets converted back to a
pointer. You can write (****f)(args) and get the same result.

Of course, no compiler will generate any code for any of this. The
change simply makes the syntax of calling functions directly or through
a pointer the same, whilst simplifying the latter.

You may have written the (*f) syntax deliberately to make your point
(and some people insist on using it regardless) but it is a sufficiently
curious corner of C that I thought it worth mentioning.
 
R

Rick C. Hodgin

What about the runtime state of the program? If you make a
catastrophic change (say, replacing an editor with a casino app or
vice versa), I don't see how you can possibly "continue" or what
that would even *mean*. You can "edit-compile-and-start-the-run-over",
which pretty much any debugger can do. But that's not "continue".

I'm not sure what you mean by "replacing an editor with a casino app or
vice versa", but edit-and-continue will make your changes for your and
will adjust the instruction pointer intelligently when it can, and it
will position it incorrectly at other times. In those cases you have to
be cognizant enough to recognize what just happened and correct it, or
simply restart your debugger session and begin again.

Most changes edit-and-continue are used for are these kind:

if (foo) {
// Do something
}

Which should've been:
if (!foo) {

Or:
if (foo and faa) {

Etc. In such cases, you simply make the change, apply changes, reset the
instruction pointer to the test (if need be), and continue on.

Edit and continue also lets you write a lot of code. You can modify the
body of a function, add new functions, do a lot. Microsoft's implementation
is not invincible though. In fact, it's not even bullet proof. :) It
is resilient, but it does have limits. The compiler will tell you that it
was unable to apply the change if it can't do so mechanically. But even
in those cases where it can mechanically complete the change, the developer
must be aware enough of the nature of the change to realize that the prior
data being tested is no longer valid, etc., requiring a restart.
If you delete the function where the program is currently running,
*WHERE* does it "continue"?

The instruction pointer is highlighted in Microsoft's Debugger, so you
can see where it went. There's also a call stack window which will show
you where it is. If you delete enough code that it cannot automatically
position the instruction pointer, you're often in a state which requires
a restart. In the alternative, just right-click on the line you want
and choose the "Set next statement" option from the menu.
Does "continue" mean "start over and go full speed to the point
where you stopped the program"?

That's Restart.
If so, does it replay user input
(just about any GUI must have one mouse click per loop) with all
of the timing kept intact?
No.

Mouse input is one of those things it's
almost impossible to re-do exactly, especially if the code saves
time-stamps of every click or mouse movement. And what happens if
"the point where you stopped the program" no longer exists in the
new code?

When you do a restart you've stopped the prior instance, unloaded the
program from memory, and restarted by loading the new program. All of
the new features will be available as you setup breakpoints, etc.
This is completely unresponsive as to what happens to the
*SAVED PROGRAM STATE* that it continues *FROM*. It's easy to
recompile and make the source and object match.

You can do that with edit-and-continue as well. You don't have to use
it. In fact, it's a command line option that the Visual Studio IDE
enables by default when you create a new project (presumably because it
is so useful and popular). However, you don't have to use it. The
option is called "Program Database for Edit-and-continue" and it is one
of the several options. The next option is just "Program Database"
which will reveal source code information, but it will not allow edit-
and-continue changes, but just runtime information about the running
program (as with traditional debuggers).
That doesn't make it easy to fix huge data structures in memory
that have all been mangled because of an initial mistake in generating
those data structures. Here you may WANT "edit-and-start-over".

As is true today, if the change was major enough you can always Restart.
When you restart it physically terminates the prior running program, does
any post-termination linking to permanently apply any code changes made
under edit-and-continue to the static object file, and then restarts a
new running instance.
I can see where it is easy to stop the program, change some values
of variables, (for which there may be no corresponding source change
since the value you are changing is not an initial value), and
continue. I do not see how it would be possible to stop on buffer
overflow, redeclare a char array to be of size 1024 rather than 80,
and then continue, especially a C program written in "pointers
galore" style where trying to relocate an already overflown buffer
will be a problem because of all the pointer variables pointing at
it..

Suppose your source line is:

char* foo = malloc(80);

And you realize that this buffer is too small. You can modify that souce
code to be:

char* foo = malloc(1024);

You set the instruction pointer back up to that line, execute the one line,
which now gives you a new pointer to 1024 bytes of memory, then set the
instruction pointer back to where it was and continue on with the larger
buffer.

You can also write little fixups in your code. Suppose you want to copy the
first 80 bytes. Instead of making the change above, make this change:

char* foonew = malloc(1024);
memcpy(foonew, foo, 80);
foo = foonew;

After you execute those three lines, delete the extraneous ones, and you've
never had to leave your runtime environment, and you can continue debugging.

Edit-and-continue helps in a lot of areas, but it's not perfect. It depends
largely on what type of program you're writing. If you have one that uses
a lot of logic tests, then it may be of great benefit because you can fixup
those easily and continue running. If it's a lot of computed data, then
it won't be of much use most likely (because the data has to be regenerated
using the new algorithm, and that is easier with a restart).

I plan for my edit-and-continue abilities to allow every change mechanically.
I do not plan to put any limits on the code changing abilities. Things that
go out of scope will simply remain as now stale variables, and so on.
However, there will still be limits in my implementation as per the nature
of the data being processed. In some cases you could write little fixup
code snippets, run those, and then delete them (as the sample above with
foonew). But, even in those cases it may be faster or easier to Restart.
It will depend on many factors, and that's what the developer needs to be
aware of (to choose for himself).

Best regards,
Rick C. Hodgin
 
B

BartC

Ben Bacarisse said:
<snip>

A curiosity:


You could (an I would) just write

int t = lookupfuncptr(a,b);
You may have written the (*f) syntax deliberately to make your point
(and some people insist on using it regardless)

It might stop some people wasting time looking for a function called
'lookupfuncptr'.
 
B

Ben Bacarisse

BartC said:
It might stop some people wasting time looking for a function called
lookupfuncptr'.

Yes, that's one reason some people like the * to stay, but in my limited
experience such calls are usually very near the code to find the
pointer. In fact the expression is rarely simply a named pointer. In
cases like

obj_class.constructor(args);
op-code[43](args);

there is no confusion and the removal of (*) help, I think.
 
D

David Brown

I don't think this is a universal view of C. C is derived (as I may
have said before :) ) from BCPL, and one of the stated aims of BCPL
was to eliminate hidden overheads. I believe C was also intended to
follow this philosphy. So users can expect their code to do no more
than what they have written, and this may well be one of the reasons
why they are using C in the first place.

I think this is one of the main conceptual differences between C and
C++, where in C++ the emphasis is indeed on what gets done as opposed
to how it gets done. For instance, a simple a = b + c; can involve
calling two functions, either or both of which might be highly
complex. C++ is not just C with a few extra features.

A simple "a = b + c" can involve many function calls in C too. For
example, if the variables are floating point, and you are compiling on a
target that does not have hardware floating point, then you will get
calls to a support library (or perhaps processor "floating point assist"
exceptions). On small enough processors, even a simple multiply can
mean library calls.
 
K

Keith Thompson

Paul N said:
I don't think this is a universal view of C. C is derived (as I may
have said before :) ) from BCPL, and one of the stated aims of BCPL
was to eliminate hidden overheads. I believe C was also intended to
follow this philosphy. So users can expect their code to do no more
than what they have written, and this may well be one of the reasons
why they are using C in the first place.

No, it's not a universal view; a lot of people think of C as some kind
of high-level assembly language. IMHO, this is unfortunate.
I think this is one of the main conceptual differences between C and
C++, where in C++ the emphasis is indeed on what gets done as opposed
to how it gets done. For instance, a simple a = b + c; can involve
calling two functions, either or both of which might be highly
complex. C++ is not just C with a few extra features.

If a, b, and c are of some built-in type, then `a = b + c;` cannot
involve any function calls in either C or C++.

Even in C, there's a lot of stuff going on implicitly. There could be
up to 2 implicit conversions, and the addition could be integer,
floating-point, or even complex. For that matter, the addition or one
of the conversions could, on some systems, require an implicit function
call. Storing the result in a could write to a memory location or to a
CPU register -- or it could be eliminated entirely if the compiler can
prove that the result is never used.

Yes, C++ code results in a lot more implicit actions that C code
typically does -- but there's a much wider semantic gap between assembly
language and C than between C and C++. And the gap between assembly and
C is primarily the difference between specifying CPU instructions and
specifying behavior.

Eliminating overheads doesn't mean you're specifying how the operations
are performed; that's still up to the compiler (and 99% of the time I,
as a C programmer, don't care how the compiler does it, as long as it
gets it right).
 
R

Rick C. Hodgin

No, it's not a universal view; a lot of people think of C as some kind
of high-level assembly language. IMHO, this is unfortunate.

From: http://www.gotw.ca/publications/c_family_interview.htm

-----
Question: "Why has the C family of languages become [so successful]?"

Dennis Ritchie:

"...There were also technical and semi-technical aspects: the
language turned out to be well-placed both for describing things
at a high enough level so that portability across hardware was
feasible, but simple enough in its requirements to make it cheap
to implement."

-----
Question: "What were your major original design goals for [C]?"

Ritchie:

"The point of C (as distinct from its immediate predecessor B) was
to take a language that was designed with word-oriented machines
in mind and adapt it to the newer hardware that became available,
specifically the PDP-11..."


-----
It seems very clear his goals were to remove the mechanics of assembly,
while providing a way to maintain an exceedingly easy port down to the
machine level, while expression ideas at a high enough level to be easily
communicable and, therefore, wielded by people.

Thinking of C as a high level assembly language does not seem to be an
unfortunate consideration by some, but rather at least a substantial
component of the reality of the invention.

Best regards,
Rick C. Hodgin
 
K

Keith Thompson

Rick C. Hodgin said:
No, it's not a universal view; a lot of people think of C as some kind
of high-level assembly language. IMHO, this is unfortunate.

From: http://www.gotw.ca/publications/c_family_interview.htm

-----
Question: "Why has the C family of languages become [so successful]?"

Dennis Ritchie:

"...There were also technical and semi-technical aspects: the
language turned out to be well-placed both for describing things
at a high enough level so that portability across hardware was
feasible, but simple enough in its requirements to make it cheap
to implement."

-----
Question: "What were your major original design goals for [C]?"

Ritchie:

"The point of C (as distinct from its immediate predecessor B) was
to take a language that was designed with word-oriented machines
in mind and adapt it to the newer hardware that became available,
specifically the PDP-11..."


-----
It seems very clear his goals were to remove the mechanics of assembly,
while providing a way to maintain an exceedingly easy port down to the
machine level, while expression ideas at a high enough level to be easily
communicable and, therefore, wielded by people.

Thinking of C as a high level assembly language does not seem to be an
unfortunate consideration by some, but rather at least a substantial
component of the reality of the invention.

Neither of those Ritchie quotations mentions assembly language.

C is not any kind of assembly language; neither were BCPL and B. It's a
tool that can be used to accomplish many of the things that assembly
language can be used for -- and it accomplishes those things in a quite
different way.
 
R

Rick C. Hodgin

Neither of those Ritchie quotations mentions assembly language.

C is not any kind of assembly language; neither were BCPL and B. It's a
tool that can be used to accomplish many of the things that assembly
language can be used for -- and it accomplishes those things in a quite
different way.

Ritchie mentions hardware specifically.

Hardware, in the context of software and programming is assembly language.
It's machine code to be explicit, but the way we typically access machine
code is through assembly language mnemonics, tokens, and related
peculiarities.

Best regards,
Rick C. Hodgin
 
K

Keith Thompson

Rick C. Hodgin said:
Ritchie mentions hardware specifically.
Yes.

Hardware, in the context of software and programming is assembly language.

Not really.
It's machine code to be explicit, but the way we typically access machine
code is through assembly language mnemonics, tokens, and related
peculiarities.

For a C programmer, assembly language (or machine language, if the
compiler doesn't generate assembly as an intermediate step) is a means
to an end. It's one of several steps between writing C source code and
obtaining an executable program that does what I want. 99% of the time,
I have no more need to be aware of which CPU instructions are being used
than of the voltage levels inside the CPU chip.

Note particularly that some compilers generate machine code directly,
without going through assembly language as an intermediate step. In
that case, there are no "assembly language mnemonics, tokens, and
related peculiarities" at all.
 
R

Rick C. Hodgin

Not really.

Yes really. :)
For a C programmer, assembly language (or machine language, if the
compiler doesn't generate assembly as an intermediate step) is a means
to an end. It's one of several steps between writing C source code and
obtaining an executable program that does what I want. 99% of the time,
I have no more need to be aware of which CPU instructions are being used
than of the voltage levels inside the CPU chip.

You don't need to know the mechanics, but regardless that *IS* what's
taking place under the hood.

I know that when I place a key in the ignition that the car starts. I
don't need to know about fuel injectors and flywheels ... but I know
that such things are in there. I'm not flying a glass bubble. I'm
driving a car, and there are limitations.
Note particularly that some compilers generate machine code directly,
without going through assembly language as an intermediate step. In
that case, there are no "assembly language mnemonics, tokens, and
related peculiarities" at all.

They still are ... I guarantee you that the developers of that compiler
which translates source code directly to machine code without the
intermediate step of writing out text-based assembler code (which to
my knowledge, by the way, is most of them), wrote those features out
and tested them before implementing them, which means they tie back
to assembly code. In fact, such diverse testing in different ways
for the various compiler switches: speed, size, etc.

I have used an intermediate language in the past, which conveys certain
abilities I've defined. These do not require assembly text output for
compilation, but rather use templates I've previously coded with offsets
for things like immediate values, or constant offsets, etc.

It's not hard to do ... but in my experience it requires a knowledge of
the assembly language, and not just the machine code, because people
deal better with "mov eax,5" than they do with the machine code bytes
underlying its mechanical presentation to the CPU.

Best regards,
Rick C. Hodgin
 
K

Keith Thompson

Rick C. Hodgin said:
Yes really. :)


You don't need to know the mechanics, but regardless that *IS* what's
taking place under the hood.

There are multiple levels of abstraction between C and assembly
language, and between assembly language and quantum physics.

So why do you pick assembly language as the one of the multitude of
intermediate steps that's the important one?

Certainly having some understanding of those steps (and how they can
vary from one implementation to another) is a good thing, and can even
be important in some circumstances.

[...]
They still are ... I guarantee you that the developers of that compiler
which translates source code directly to machine code without the
intermediate step of writing out text-based assembler code (which to
my knowledge, by the way, is most of them), wrote those features out
and tested them before implementing them, which means they tie back
to assembly code. In fact, such diverse testing in different ways
for the various compiler switches: speed, size, etc.

Certainly. Compiler writers worry about assembly and machine code *so I
don't have to*. (Actually that's not entirely true either; I've done
compiler development in the past, but on a mostly target-independent
level that did not directly involve assembly or machine code.)

[...]

My point is that C cannot reasonably be described as any kind of
assembly language. Do you actually disagree with that statement?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,076
Messages
2,570,565
Members
47,201
Latest member
IvyTeeter

Latest Threads

Top