alias for Integer

J

jrobinss

Hi all,

this is a simple question, so it may be silly...

I am processing structures that contain integers, structures such as matrixes used in statistical analysis. As I implement these as Maps of indexes, Iuse the class Integer, as in

// Map<matrix index, DB index> <- I'd like to remove this comment!
public static Map<Integer, Integer> myMap = ...;

Now what I'd like is to write
public static Map<MatrixIndex, DbIndex> myMap = ...;

The advantage is that the code is auto-documented, but even better that thecompiler will check that I never get mixed up in different indexes, by relying on strong typing.

Usually, I would do this by extending the relevant class. But here it's Integer, which is final.

Questions:
1. is this a bad good idea, and I should proceed with Integer?
2. if not, how would you implement this?
3. is there any performance issue in defining my own classes instead of Integer?

My current solution is to define my own classes for replacing Integer, suchas:

public final class MatrixIndex {
public final int value;
public MyIndex(int i) {value = i;}
}

I won't benefit from autoboxing, then... :-(
(I'm just hoping it doesn't break too much of the code, because even thoughit's mine to break, I don't have infinite time)

Note that (exceptionnally for me) performance *is* an issue here. I haven'tyet narrowed it down, but the code executes very slowly and eats up much too much memory at the moment. I'm starting to optimize it, that's why I'm starting with strongly typing it to prevent errors.

thanks for any tips!
JRobinss
 
M

markspace

3. is there any
performance issue in defining my own classes instead of Integer?


To me yes, there is a performance issue, and you should definitely
consider using something besides Integer and Map for this.

Note that (exceptionnally for me) performance *is* an issue here. I
haven't yet narrowed it down, but the code executes very slowly and
eats up much too much memory at the moment. I'm starting to optimize
it, that's why I'm starting with strongly typing it to prevent
errors.


Good, you're doing this the right way. You should look into profiling
your code. NetBeans has an excellent profiler built in, and will handle
most "user made" projects as well as its own format. You should look
into it.

The most important thing is to pinpoint where the code is slow and work
on those bits. I suspect that auto-boxing and unboxing is costing you
too much time, but I couldn't prove it. The profiler could.
 
R

Roedy Green

Now what I'd like is to write
public static Map<MatrixIndex, DbIndex> myMap =3D ...;

one way to do it would be to write

public static MatrixMap myMap = new MatrixMap ( 2000 );
 
L

Lew

jrobinss said:
I am processing structures that contain integers, structures such as matrixes used in statistical analysis. As I implement these as Maps of indexes,I use the class Integer, as in

// Map<matrix index, DB index> <- I'd like to remove this comment!
public static Map<Integer, Integer> myMap = ...;

Now what I'd like is to write
public static Map<MatrixIndex, DbIndex> myMap = ...;

The advantage is that the code is auto-documented, but even better that the compiler will check that I never get mixed up in different indexes, by relying on strong typing.

Usually, I would do this by extending the relevant class. But here it's Integer, which is final.

That is kind of an antipattern. Joshua Bloch suggest to "Prefer composition to inheritance" in /Effective Java/ (2nd ed.). You could write a wrapperfor 'Integer', but then 'Integer' already *is* a wrapper type.
Questions:
1. is this a bad good idea, and I should proceed with Integer?

It's not a terrible idea, necessarily, but it seems unnecessary.
2. if not, how would you implement this?

I would pick a type that actually helps. I don't see what you object to in'Integer', quite frankly.
3. is there any performance issue in defining my own classes instead of Integer?

How could anyone possibly know the answer to this question?
My current solution is to define my own classes for replacing Integer, such as:

public final class MatrixIndex {
public final int value;
public MyIndex(int i) {value = i;}
}

Congratulations, you just re-invented 'Integer', but without any of its features.

Now you have to go through all sorts of gyrations to translate your custom class to and from 'int'. Your code complexity goes up, and you have to maintain your own custom substitute for a fundamental API type. Thus your risk of bugs and code-maintenance costs skyrocket.

How again does this help you?
I won't benefit from autoboxing, then... :-(

No one does.
(I'm just hoping it doesn't break too much of the code, because even though it's mine to break, I don't have infinite time)

Why waste *any* time on this? Just use 'Integer'.
Note that (exceptionnally for me) performance *is* an issue here. I haven't yet narrowed it down, but the code executes very slowly and eats up muchtoo much memory at the moment. I'm starting to optimize it, that's why I'mstarting with strongly typing it to prevent errors.

"Optimize" and "prevent errors" are, at best, orthogonal, and at worst (andtypically in this kind of premature action) the former interferes with thelatter.

Yet you say them in the same breath as though they were the same thing.

They aren't.

Your "strong typing" isn't, really. 'Integer' already is a strong type.

As for what you choose to "optimize", what evidence (that is, factual data derived from actual tests whose protocols are publicized) do you have that you are attacking the slow parts?

IOW, what /actual/ tests have you performed, and how did you control the variables like system load, HotSpot warmup and application load?

You have a performance problem. So you decide to obfuscate random parts ofyour code base with zero foundation for your actions. Now you have two problems.
 
J

jrobinss

Aaah, it had been some time, but I'm not disapointed. :)

I'm sorry Lew, but I feel that I haven't been quite clear.

This is a large piece of code, not extraordinary but still a reasonable size, and I didn't write it. I end up manipulating indexes all over the place,so that I have for example a table (or map or whatever) that associates database indexes to matrix indexes, or identifiers to indexes, etc. These areall ints or Integers, and take part in ringamaroles of loops, indexes of indexes of arrays of arrays and such joyous constructs. In short, it's a lotof ints and hard to understand.

That these indexes and identifiers are all integers is in fact nearly a coincidence (it *is* very handy to reuse matrix libraries), conceptually they are not the same thing; the ID of an object could be a string, it just happens to be an int. I am certainly not obfuscating my code by replacing a type such as Integer with DbIndex or MyObjId, on the contrary I am stating to everyone, including the compiler, that this should be congruent with indexes, not to matrix sizes or number of chars in string of what-have-we-nots. The error I am trying to avoid is to use a DB index in place of a matrix index in the middle of a large set of loops.
(of course a nice side effect is that I could replace some IDs with stringsor objects or anything, benefiting from encapsulation, but here it's not the objective)

See this as the same as when you call a method that takes ten ints as entryparams: the main risk of error is to get confused in the order of parameters, and nothing will warn you except some strange bug a year later. A way to avoid it is to type strongly parameters, so that the caller may call
new Rect(new Rect.Length(x), new Rect.Width(y))
instead of
new Rect(x, y)
which is kind of verbose overkill in this particular example of course, butit's just an illustration.

Ok, so that's the reason why the typing.

As for the optimisation, I am perfectly aware that one of the most-cited mottos here is "premature optimization is the root of all evil" (with which Iwholeheartedly agree, BTW). I am not prematurely optimizing: I am about tooptimize some code that I didn't write, and it's certainly not premature because the code *is* slow. For me, this always starts with a bit of local rewriting in order to better understand the code and to guarantee that my changes won't break anything, in particular by letting the compiler help me out. The o-word, which acts as a kind of magnet for your reactions, was not the core subject of my post, which is probably why you feel you don't have any elements about it: I didn't provide any, because that was not my question. I merely stated that performances were potentially an issue, because that is generally one of the parameters to take into account when choosing a particular implementation.

I hope this clarifies.

Many thanks to you, Lew, and also to Roedy and Mark for answering. I'll keep your answers in mind while I progress.

For those who wonder about structures and matrixes etc, I'm starting by replacing types, but I may do another code writing iteration where I replace structures. So that it may go like this...
original code: Blob<Integer>
better: Blob<MyIndex>
even better: MyBlob extends Blob<Integer>, or extends Blob<MyIndex>
gettin' better: MyBlob with methods using ints, bye bye boxing :)

JRobinss
 
A

Andreas Leitgeb

jrobinss said:
This is a large piece of code, not extraordinary but still a
reasonable size, and I didn't write it. I end up manipulating
indexes all over the place, so that I have for example a table
(or map or whatever) that associates database indexes to matrix
indexes, or identifiers to indexes, etc. These are all ints or
Integers, and take part in ringamaroles of loops, indexes of
indexes of arrays of arrays and such joyous constructs. In short,
it's a lot of ints and hard to understand.

My comment will not be much of help for this part of your problem,
but I'll mention it anyway, for the other part of your problem.

There exist third-party libraries (I think from apache) that offer
variants of the usual JSL collection-classes - but for primitive types.
So, just in case it turns out, that most of the slowness comes from
boxing and unboxing, then having a look at those libraries might
help.

I haven't used them, myself, though, so do not really know, if
they are indeed faster...

just fwiw.
 
M

markspace

See this as the same as when you call a method that takes ten ints as
entry params: the main risk of error is to get confused in the order
of parameters,


This is actually a bit of an anti-pattern too. It's hard for anyone to
remember the order of more than about 4 parameters, according to
Effective Java.

and nothing will warn you except some strange bug a
year later. A way to avoid it is to type strongly parameters, so that
the caller may call new Rect(new Rect.Length(x), new Rect.Width(y))
instead of new Rect(x, y) which is kind of verbose overkill in this
particular example of course, but it's just an illustration.


This is also verbose, but one recommended pattern here is to use the
builder pattern

Rect r = new RectBuilder().length( 10 ).width( 12 ).make();

It gains value as you have more and more parameters to remember, and
also obviates the problem with remembering their order, because the
builder will accept them in any order. I personally would not use it
for Rect here as it only has two parameters. For a method or ctor that
has 10+ parameters, I would consider it.

Other standard patterns:

1. Use an IDE. I good idea will show the names of the parameters when
you type them in, so you don't have to remember their order. This is a
good form of reflection that costs you nothing at runtime, and doesn't
add any lines of code either.

2. Take a cue from the IDE, get a lexical parser for Java, and build
your own custom source code formatter. Break up long lists of
parameters automatically and comment them to include their names. This
again is a big win that has no runtime costs, but will keep your current
and future code base formatted according to a standard. This is a huge
win, imo.

Try to think outside of the "code" box. There's more to developing
software than things that run inside your code. Study the Unix
operating system and try to learn from its "tool building" examples.
 
J

Jim Janney

jrobinss said:
Hi all,

this is a simple question, so it may be silly...

I am processing structures that contain integers, structures such as matrixes used in statistical analysis. As I implement these as Maps of indexes, I use the class Integer, as in

// Map<matrix index, DB index> <- I'd like to remove this comment!
public static Map<Integer, Integer> myMap = ...;

Now what I'd like is to write
public static Map<MatrixIndex, DbIndex> myMap = ...;

The advantage is that the code is auto-documented, but even better that the compiler will check that I never get mixed up in different indexes, by relying on strong typing.

Usually, I would do this by extending the relevant class. But here it's Integer, which is final.

Questions:
1. is this a bad good idea, and I should proceed with Integer?
2. if not, how would you implement this?
3. is there any performance issue in defining my own classes instead of Integer?

My current solution is to define my own classes for replacing Integer, such as:

public final class MatrixIndex {
public final int value;
public MyIndex(int i) {value = i;}
}

I won't benefit from autoboxing, then... :-(
(I'm just hoping it doesn't break too much of the code, because even though it's mine to break, I don't have infinite time)

Note that (exceptionnally for me) performance *is* an issue here. I haven't yet narrowed it down, but the code executes very slowly and eats up much too much memory at the moment. I'm starting to optimize it, that's why I'm starting with strongly typing it to prevent errors.

thanks for any tips!
JRobinss

Switch to Ada? If that's not an option, you should probably just stick
with Integer. I think I understand what you're trying to do, but Java
just isn't good at expressing those kinds of constraints.

As far as performance is concerned, I think there are some open-source
projects that implement map-like structures with primitive keys, but I
don't have any experience using them.
 
R

Roedy Green

I don't what you are trying to do, but in general you can sometimes
replace a map by sorting two sets then processing them batch-style
sequentially much like a tape merge.

See http://mindprod.com/jgloss/products1.html#SORTED
for a classes that sort and process pairs of sets sequentially.

Giving the HashMap more RAM will improve its performance.
see http://mindprod.com/jgloss/hashmap.html

HashMaps are faster that Hashtables.

HashMaps are faster than TreeMaps.

Of course you want to prove that the HashMap lookup truly is the
bottleneck or anything you do may end up just slowing things down.

http://mindprod.com/jgloss/profiler.html
 
J

jrobinss

Thanks all for your answers. I didn't reply individually, but be sure
I read them with interest.

Back to code (or out of its box...)
JRobinss
 
R

Robert Klemme

public final class MatrixIndex {
public final int value;
public MyIndex(int i) {value = i;}
}

I'd rather use Integer but if you want a specific type you could
implement a class which inherits Number and works exactly same way
Integer does and has a generic type parameter for the container type.
While that type parameter would be otherwise useless it would help in
catching type errors.

Kind regards

robert
 
D

David Lamb

Now what I'd like is to write
public static Map<MatrixIndex, DbIndex> myMap = ...;

The advantage is that the code is auto-documented, but even better that the compiler will check that I never get mixed up in different indexes,
by relying on strong typing.

Usually, I would do this by extending the relevant class. But here it's Integer, which is final.

Questions:
1. is this a bad good idea, and I should proceed with Integer?
2. if not, how would you implement this?
3. is there any performance issue in defining my own classes instead of Integer?

My current solution is to define my own classes for replacing Integer, such as:

public final class MatrixIndex {
public final int value;
public MyIndex(int i) {value = i;}
}

To me the critical question is: in what way do each of these different
kinds of "integer" differ from each other and from the language-defined
"Integer"? Do they differ in upper and lower bounds, for example? is it
important to keep track of where each number came from? is one of them a
mere identifier (such as a lot number in a city plan) rather than
something you could add and subtract from each other? All of those
suggest that you need a new type(s) that "has an int" instead of "is an
Integer".
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,955
Messages
2,570,117
Members
46,705
Latest member
v_darius

Latest Threads

Top