Store value or calculate

J

jsguru72

I have a general question about best practices.

I think it would be best to use an example. Basic class to represent
a square.

public class Square {
int length;
int width;

//getters and setters
}

Now, let's say I wanted to add a method to obtain the area of the
square.

I have 2 options.

1) I could calculate it on the fly each time the area is requested.

int getArea() {
return (length * width);
}


2) I could calculate it whenever the length or width change, store the
value, and just return the stored value when requested.

int area;

void setLength(int length) {
this.length = length;
this.area = (this.length * width);
}



Method 1 is more streamlined coding, but requires a calculation be
performed each time the value is requested. Method 2 is a bit more
bulky, but saves having to do the calculation each time I need the
area.

Ignoring factors such as how often the input values would change or
how complicated the the calculation is, is there a best practice
baseline where one method is preferred over the other?


Thanks.
 
L

Lew

jsguru72 said:
Ignoring factors such as how often the input values would change or
how complicated the the calculation is, is there a best practice
baseline where one method is preferred over the other?

No. The best practice is not to ignore such factors, but to use them to
decide the question. The factors that you ask us to ignore are the very
factors that influence the decision! Why in the world would you ignore them?

If the class is Serializable, then one might make such cached values 'transient'.

The best practice is not to engage in premature optimization, but to get the
behavior of the class correct first and foremost. After that, and not until
then, one can measure performance to determine if there is a real need for
optimization, then measure the so-called "optimizations" to make sure they
actually help. In the target environment. (Different environments evince
different efficiencies and inefficiencies.)

By that reasoning, it likely would make more sense in your example not to
cache area if the length and width values are subject to change, but to cache
it if they are declared as 'final' instance variables.

You can't ignore the very factors that inform the judgment.
 
J

Joshua Cranmer

jsguru72 said:
I have a general question about best practices.

public class Square {
int length;
int width;
}

Typically, squares have equal lengths and widths, so at best you've got
redundant information. At worst you've got the potential to store
incorrect, inconsistent information.

Also, don't use public instance fields unless you've got a very, very
good reason to use them. If they're public, you have absolutely no
control over who sets them and therefore no control over their values.
I have 2 options.

No, you have 3.
1) I could calculate it on the fly each time the area is requested.
2) I could calculate it whenever the length or width change, store the
value, and just return the stored value when requested.

3) You could cache the value the first time you request the value and
invalidate on mutation.
Ignoring factors such as how often the input values would change or
how complicated the the calculation is, is there a best practice
baseline where one method is preferred over the other?

I presume you're just using area calculation as a contrived example,
since the time to do integer multiplication is only a few machine cycles.

There are several factors that impact the decision. Options 2 or 3 will
take up more memory. If you have large numbers of objects but use the
value extremely infrequently, the extra space may become painful. If the
calculation is extremely expensive and heavily used, options 2 or 3 are
probably ideal. Mutation versus calculation ratios is also a factor in 2
versus 3, although I'd be hard-pressed to find a reason to favor option
2 over option 3.

For your contrived example, I'd just use option 1. There's no reason to
cache something you'll never or rarely use. But as Lew said, there is no
silver bullet: profile your code and see what is more performant.
 
J

jsguru72

I know that considering those factors will be necessary to determine
the optimized configuration, but the reason I wished to ignore them is
purely to decide on a starting point when I first write the class.

As you said, and I agree, it is best to focus on the behavior of the
class and then optimize later. But, in order to do this, you have to
ignore the optimization factors.

Obviously in a trivial example like this there is not much to deciding
up front how the class will be used and which approach would be
better. But if I am writing a more complicated class that will
require analysis to make optimization decisions, I do not want to get
bogged down with such things up front. My first goal is to get a
working class.

When I first write the class and I am focused purely on the behavior,
I have to make a decision on which way to code it. I am just curious
if a best practice exists for such things.


Thanks.
 
M

Mike Schilling

jsguru72 said:
I know that considering those factors will be necessary to determine
the optimized configuration, but the reason I wished to ignore them
is
purely to decide on a starting point when I first write the class.

As you said, and I agree, it is best to focus on the behavior of the
class and then optimize later. But, in order to do this, you have
to
ignore the optimization factors.

Obviously in a trivial example like this there is not much to
deciding
up front how the class will be used and which approach would be
better. But if I am writing a more complicated class that will
require analysis to make optimization decisions, I do not want to
get
bogged down with such things up front. My first goal is to get a
working class.

Then do it the simplest say (i.e. do the calculation each time);
caching the value is an optimization you can make if it becomes
necessary.

If you do decide to optimize, the third way (invalidating the
calculated value and recalculating it only on request) is in almost
all circumstances going to out-perform recalculating whenever one of
the inputs changes.
 
M

Mark Space

jsguru72 said:
1) I could calculate it on the fly each time the area is requested.

int getArea() {
return (length * width);
}

So, reading the replies here, I think there is a best practice. The
best practice is to put caching under the domain of optimizations, and
follow the best practices for optimizing.

1. Don't do it.
2. Don't do it yet.
3. Experts only -- model the working application first to determine if
it's really needed.

So when you first write the class, your method 1. is definitely the way
to go.



“More computing sins are committed in the name of efficiency (without
necessarily achieving it) than for any other single reason - including
blind stupidity.” - W.A. Wulf

“We should forget about small efficiencies, say about 97% of the time:
premature optimization is the root of all evil. Yet we should not pass
up our opportunities in that critical 3%.” - Donald Knuth

“Bottlenecks occur in surprising places, so don't try to second guess
and put in a speed hack until you have proven that's where the
bottleneck is.” - Rob Pike

http://en.wikipedia.org/wiki/Optimization_(computer_science)
 
T

Tom Anderson

I know that considering those factors will be necessary to determine
the optimized configuration, but the reason I wished to ignore them is
purely to decide on a starting point when I first write the class.

As you said, and I agree, it is best to focus on the behavior of the
class and then optimize later. But, in order to do this, you have to
ignore the optimization factors.

Obviously in a trivial example like this there is not much to deciding
up front how the class will be used and which approach would be
better. But if I am writing a more complicated class that will
require analysis to make optimization decisions, I do not want to get
bogged down with such things up front. My first goal is to get a
working class.

When I first write the class and I am focused purely on the behavior,
I have to make a decision on which way to code it. I am just curious
if a best practice exists for such things.

In the vast majority of cases, you shouldn't optimise upfront. Start with
what's simple, and optimise only when you find there's a performance
problem. Yes, rewriting the code may take longer than writing it 'right'
the first time round. But the time you save by not unnecessarily
optimising code all over the place will more than make up for it.

Usually.

There *are* problems where getting the design wrong at the start leads to
huge costs to fix it. A classic example is in distributed computing: you
decide that the simplest way to make your app distributed is to make each
class a distributed object (remote interfaces and all that), and couple
everything via RMI, so you have complete flexibility as to where each
object lives on the network, effectively making the distribution
compeltely transparent (well, sort of). Sounds good. In practice, this
will result in a system which runs like cold treacle, making a network
roundtrip for every method call. To make it fast, you have to go back and
strip out all of the RMI code, and rearrange things into a more granular
structure, so you can distribute coarser components, which is a lot of
work. This example is a bit extreme, but making the wrong high-level
decisions about granularity of distribution, locks, threading, data
stores, etc, can be problems of this kind. You want to pick an approach
that you're pretty sure is going to work okay right at the start.

Knowing which problems fall into the 'optimise later' set, and which into
the 'get this right now' set is hard. It's very much an open question in
software engineering, and one that people haven't even done a lot of
thinking about, as far as i know. You just have to use common sense, or
engineer's intuition, or gut feeling, or whatever you call it, and that's
something which takes a long time to develop, an can still be wrong!

tom
 
A

Arne Vajhøj

jsguru72 said:
I think it would be best to use an example. Basic class to represent
a square.
Rectangle

public class Square {
int length;
int width;

//getters and setters
}

Now, let's say I wanted to add a method to obtain the area of the
square.

I have 2 options.

1) I could calculate it on the fly each time the area is requested.

int getArea() {
return (length * width);
}


2) I could calculate it whenever the length or width change, store the
value, and just return the stored value when requested.

int area;

void setLength(int length) {
this.length = length;
this.area = (this.length * width);
}

#1 because it reflects the reality - area is calculated
as length*width. Length and width are primary characteristics
of the rectangle - area is something that can be calculated.

It is extremely unlikely that there will be any difference
in performance of the complete solution between the two options,
so ignore that aspect.

Arne
 
L

Lew

Tom said:
Knowing which problems fall into the 'optimise later' set, and which
into the 'get this right now' set is hard. It's very much an open
question in software engineering, and one that people haven't even done
a lot of thinking about, as far as i know. You just have to use common
sense, or engineer's intuition, or gut feeling, or whatever you call it,
and that's something which takes a long time to develop, an can still be
wrong!

Indeed this is part of the Dark Arts of software development. However, there
is evidence that many high-level folks have done a nawful lot of thinking
about it. Numerical analysis abounds with examples, and all sorts of
publications have come out about algorithms and data structures, some by the
very title "Algorithms and Data Structures". The Sun documents for various
collections implementations describe the big-O for common operations like add
and iterate, all to help the up-front decision as to what implementation will
suit. It even comes out in common programming tutorials and introductory
texts, wherein the advice is to think carefully about the algorithm one will use.

Even the oft-quoted "Premature optimization is the root of all evil!"
intentionally begs the question of what constitutes "premature". Too many
people seem to interpret the saying to mean, "Early optimization is the root
of all evil," plainly a distortion. It's only evil when it's too early, and
certain optimizations do happen right at the start, so that is not too early.

The key is that up-front optimizations are about algorithms far more than
about specific method calls or concrete classes. If it is known that one way
to calculate requires O(n) operations on average, and another requires O(log
n), well, it would be foolish to choose the former, wouldn't it?

There are also many studies that show how different programmers (and teams
thereof) will come up with radically different solutions to the same
specifications, not always easily classifiable as better or worse than the
others. Software development is classified as a "wicked" problem in engineering.

(/Wicked Problems, Righteous Solutions/, DeGrace and Stahl, Yourdon Press,
1991, ISBN 0-13-590126-X,
<http://www.amazon.com/Wicked-Problems-Righteous-Solutions-Engineering/dp/013590126X>
expounds on software as a wicked problem and other software project
methodology matters. It changed my life.)

A lot of thought and much of it published has gone into the question of what
is too early or too late for different categories of optimization. All of it
I've seen indicates that getting this right is of the Dark Arts, or as Tom
called it, an open question in software engineering.
 
W

Wojtek

Lew wrote :
Even the oft-quoted "Premature optimization is the root of all evil!"
intentionally begs the question of what constitutes "premature". Too many
people seem to interpret the saying to mean, "Early optimization is the root
of all evil," plainly a distortion. It's only evil when it's too early, and
certain optimizations do happen right at the start, so that is not too early.

And the more programming you do, the more able you are to distinguish
the difference.

My descision to calculate on setter change or to calculate on getter
call is based on my experience, assumption about the cost of the
calculation, the expected frequency of the getter call, combined with
the assumed lifetime of the object (storage of the calculated amount).

I make the determination in about 1 second as I am crafting the class.
I re-visit the solution about 0.0000001% of the time.

And with today's hardware, unless it is a REALLY expensive calculation,
it is irrelevant.

I spent an idle day a few months ago changing a Hashmap lookup to a
direct array lookup. It took me the better part of five hours to make
the required changes. After calculating the difference I found that
about every 1,000,000 calls I would have saved 1 full second (or
something like that) on my development laptop.

Do what is more readable so that the maintenance programmer will have
an easier time. Comment weirdness. Have fun.
 
M

Mark Space

Wojtek said:
Do what is more readable so that the maintenance programmer will have an
easier time. Comment weirdness. Have fun.

Yup. 80% of software engineering boils down to this.
 
M

Mike Schilling

Tom said:
Sadly, of that 80%, 30% is the first of those, 65% the second, and
only 5% the third!


The sad part for how many developers doing stuff that makes the
maintenance programmer's life hell is the fun part.
 
L

Lew

Mike said:
The sad part for how many developers doing stuff that makes the
maintenance programmer's life hell is the fun part.

I am very gratified to see this emphasis on kindness to the maintainer.

This could be the key engineering principle in software development, or at
least #2, right behind, "Don't make the user remember that it's a computer,
not magic."
 
W

Wojtek

Lew wrote :
I am very gratified to see this emphasis on kindness to the maintainer.

Do enough programming and eventually you will need to modify something
that you had written a long time ago.

Then you look at your code and wonder "what was I thinking?"

About that time you stop being clever and start using useful
comments...

I brought this up to the instructor for Perl after seeing some piece of
clever syntax he was "teaching". He was offended that his code could be
considered unreadable: "But Perl allows it!"
 
M

Mike Schilling

Wojtek said:
I brought this up to the instructor for Perl after seeing some piece
of clever syntax he was "teaching". He was offended that his code
could be considered unreadable: "But Perl allows it!"

I thought "Perl allow it" was the *definition* of unreadable.
 
M

Mark Space

Wojtek said:
Do enough programming and eventually you will need to modify something
that you had written a long time ago.

Then you look at your code and wonder "what was I thinking?"

Psah. I do this to code I wrote 4 hours ago.
 
R

Roedy Green

Method 1 is more streamlined coding, but requires a calculation be
performed each time the value is requested. Method 2 is a bit more
bulky, but saves having to do the calculation each time I need the
area.

this is the classic space-time tradeoff. It has no straightforward
answer.

1. Which resource in more short supply CPU or RAM?

2. An experiment will settle the matter far more accurately than any
amount of beard stroking.

It will depend on how often you need the area, how often you change
the dimensions without needing the area, how you use the area. A
smart compiler will cache it for you even more efficiently than you
can if you reuse it in a small chunk of code.
--
Roedy Green Canadian Mind Products
http://mindprod.com
"Humanity is conducting an unintended, uncontrolled, globally pervasive experiment
whose ultimate consequences could be second only to global nuclear war."
~ Environment Canada (The Canadian equivalent of the EPA on global warming)
 
D

Daniel Pitts

jsguru72 said:
I have a general question about best practices.

I think it would be best to use an example. Basic class to represent
a square.

public class Square {
int length;
int width;

//getters and setters
}

Now, let's say I wanted to add a method to obtain the area of the
square.

I have 2 options.

1) I could calculate it on the fly each time the area is requested.

int getArea() {
return (length * width);
}


2) I could calculate it whenever the length or width change, store the
value, and just return the stored value when requested.

int area;

void setLength(int length) {
this.length = length;
this.area = (this.length * width);
}



Method 1 is more streamlined coding, but requires a calculation be
performed each time the value is requested. Method 2 is a bit more
bulky, but saves having to do the calculation each time I need the
area.

Ignoring factors such as how often the input values would change or
how complicated the the calculation is, is there a best practice
baseline where one method is preferred over the other?


Thanks.
The simpler one is better for two reasons, you might forget to update
the value when are in setWidth.

The usual approach for this kind of thing, where the calculation can be
expensive, is called lazy evaluation.

int getArea() {
if (oldWidth != width || oldLength != length) {
area = width * length;
oldWidth = width;
oldLength = length;
}
return area;
}

This approach has a minimal overhead when the value is retrieved, and
has the benefit of never calculating the expensive value if its not needed.

I still say go with your first example, and only if it is proven to be a
bottleneck (use a profiler!), go to lazy evaluation. This is true for
other types resources too, not just computation: Memory, Disk access,
Network access, Other I/O, etc...
 
D

Daniel Pitts

Tom said:
Sadly, of that 80%, 30% is the first of those, 65% the second, and only
5% the third!

tom
That is sad if you feel that way.
Perhaps you should seek a new employer :)
I find its about 90% catering to maintenance programmers, 10% commenting
weirdness (I avoid/fix weirdness where possible), and 80% fun.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,159
Messages
2,570,879
Members
47,416
Latest member
LionelQ387

Latest Threads

Top