StringBuilder Difficulties

B

blmblm

[ snip ]
Oh, I asked about that. One apparently can not pass a function
pointer parameter as in C. The ways that were posted involved lookup
every time AFIACS and I judged that it might swamp what I was
measuring (checking if a character were in a set). So, to my chagrin,
I had to go with cut-and-paste.

Without experimenting to find out, of course ....

It seems to me that virtual method invocations are so common in
Java that they would be well-optimized. But if you want to claim
that the code you eventually hope to produce won't have one, well,
yeah, that's true, so maybe it matters. Then again, wouldn't a
similar argument apply to C with function pointers? Maybe not.
I don't seem to be able to think this through as carefully as
I'd like.

Anyway, I was curious, so I ran your code and my revision [1], and
the results were -- surprising [2]. I noticed, by the way, that
all three of your parse methods make a call to SequentialSearch,
assigning the result to a variable that apparently isn't used.
Thinking that *might* be a mistake, I also tried your code and my
revision with that possibly-extra call removed.

UPDATE: I just re-read the previous threads and realized that
your code actually calls SequentialSearch in the method that's
supposed to be timing BinarySearch. Once I fix that ....
[1] Message-ID: <[email protected]>

[2] I tried this on several different systems, all Fedora Linux
running Java 1.6.0_21 but different hardware and different releases
of Fedora.

[ snip ]
Your code was fairly (but not 100%!) consistent in showing
treeset-based search to be fastest, followed by binary search and
then sequential search, though sometimes the difference between
sequential and binary was small. My code was -- well, this is where
it's surprising. On most of the systems where I tried it, treeset
search was fastest, but sequential search was faster than binary
search; on one system, however, the order was as for your code.
Your code was pretty consistently faster than mine, though usually
not by a lot (less than 1%).

Both programs (yours and mine) now consistently report sequential
search to be faster than binary search. Weird, but not as weird as
the two being different in that regard ....

[ snip ]
 
L

lewbloch

Gene said:
(e-mail address removed) wrote

     IOW, who cares if it is only a bit of inefficiency?  Agreed.


     It is.  My test code appends one character at a time. Switching
from String to StringBuilder cut the execution time by about 40%.

Lines like:

String thing = getResult() + " found by " + foundSource() +".";

get optimized under the hood, so use of something like 'StringBuilder'
on such one-liners is useless. The usefulness kicks in with the sort
of idiom blmblm described - multi-line expressions in loops where the
compiler builds lots of 'String' instances and can't figure out to use
'StringBuilder' by itself.

The efficiency doesn't come from the re-use of the 'StringBuilder'
object from iteration to iteration. The create/destroy mechanism for
short-lived objects favors creating the temporary object inside the
loop, not outside. The efficiency comes from not having to create so
many superfluous 'String' objects.
 
L

lewbloch

[snip]
It is relatively rare to decide, part way through a build, to throw away
the work so far and begin again with entirely different contents.

     I am going to be doing it in a loop.  why create a new object
each iteration?

To avoid promotion of the object to the tenured generation, which
makes GC stuff far less efficient. Quite often it's better to create
the temporary object inside the loop.

You should do what makes sense for the logic of the algorithm and not
try to second-guess and micromanage optimization issues. You totally
have your head in the wrong place here. If the scope of the object,
say the 'StringBuilder', is inside the loop, declare it inside the
loop. Only declare it outside the loop if it's needed outside the
loop. Don't try to be a human HotSpot engine, and stop trying to
defeat the GC optimizations.
 
R

Robert Klemme

The efficiency doesn't come from the re-use of the 'StringBuilder'
object from iteration to iteration. The create/destroy mechanism for
short-lived objects favors creating the temporary object inside the
loop, not outside. The efficiency comes from not having to create so
many superfluous 'String' objects.

Lew, I find that explanation a bit strange: only reuse of StringBuilder
makes it possible to not have to create all those superfluous String
instances. You make it sound as if reuse and not having to create
superfluous String objects are somehow antithetic or independent. If I
cannot use a StringBuilder and I want to construct a String from parts I
have to create superfluous String instances - so both facts are really
related.

Kind regards

robert
 
B

blmblm

On 7/2/2011 6:34 PM, (e-mail address removed) wrote:
...
...

Binary search can have a much higher constant multiplier on its time
than sequential search. If the searches are long enough, the O(log n) vs
O(n) complexity difference should dominate over the constant multiplier,
but for short searches the constant multiplier can dominate.

Of course -- I know better than to think computational complexity
is the whole story, especially in dealing with small problem sizes,
which I think is what we have here. (Why I notice this kind of
"what was I thinking?" only after posting, who knows.)
Binary search is more expensive than it looks because of branch
prediction.

Which makes sense ....

[ snip ]
The only way to know which way round this will come out is indeed to
measure, making sure the searches are as similar as possible to the real
workload.

Which is apparently by no means straightforward in Java! I recently
skimmed through the articles by Brian Goetz referenced in one of
Stefan Ram's posts from a while back, and -- fascinating stuff.
 
L

lewbloch

Lew, I find that explanation a bit strange: only reuse of StringBuilder
makes it possible to not have to create all those superfluous String
instances.  You make it sound as if reuse and not having to create
superfluous String objects are somehow antithetic or independent.  If I
cannot use a StringBuilder and I want to construct a String from parts I
have to create superfluous String instances - so both facts are really
related.

They are independent, and I deny that I made them sound antithetic.
That's on your inference, not my implication.

The reuse of the 'StringBuilder' is not needed in some loops,
depending on whether the building occurs inside the loop entirely or
not. That's independent of the notion that 'StringBuilder' saves on
intermediate 'String' objects. You can tell I didn't imply that the
two are antithetical because I explicitly said to put the
'StringBuilder' inside the loop if that's what the logic requires. If
you're paying attention you can tell, at least.

Consider:

while ( someCondition )
{
StringBuilder sb = new StringBuilder();
sb.append( getInitialStuff() );
Result result = doSomething();
sb.append( ". Result: " ).append( result.toString() );
Sumpn moreShite = doSomethingWith( result );
sb.append( ". Moreover: " ).append( moreShite.toString() );
output( sb );
}

Nothing in the logic of this example requires 'sb' to have scope
outside the loop, so it's foolish to put it outside the loop. This is
independent of, and entirely compatible with, the benefits of using
'StringBuilder' to avoid a lot of intermediate 'String' objects.
 
G

Gene Wirchenko

On 07/02/2011 02:29 AM, Gene Wirchenko wrote:
[snip]
My tools include manyyears of experience programming. I do not
think that Java is such a precious snowflake -- the same is true of
any language -- that I should have to throw all that experience away
in order to use the language.

As far as I can see nobody asked you to do that. If adjusting to a new
language's mindset requires you to throw away everything you've learned
so far then you probably better stick with the previous experience and
tools. That will be much more efficient and beneficial.

I do not name my variables the way some think I should. I do not
use exceptions the way some others think I should. And on and on.

There have been some very good posts which I have appreciated.
There have been others which were nitpicky and antagonistic.
If, on the other hand, you want to use a new language then you typically
get best results (or results at all) if you adjust to the environment
you find. You may have noticed that your issues with StringBuilder seem
to be quite unique - others posting here do not seem to have those
issues. In my experience this is usually an indication that I am doing
something wrong or haven't properly understood the new environment yet.

Well, no, they are not. I had a problem with StringBuilder
comparison with a string. I STWed and found a discussion that was
originally posted here. Roedy was involved in that discussion.

[snip]
The usual solution in Java is to factor out an API into an interface and
have several implementations of that interface. See Callable for
example - this basically encapsulates a "function" with no arguments and
a single return value:

It is horribly verbose. Java: COBOL++?

[snip]

Sincerely,

Gene Wirchenko
 
G

Gene Wirchenko

[snip]
Oh, I asked about that. One apparently can not pass a function
pointer parameter as in C. The ways that were posted involved lookup
every time AFIACS and I judged that it might swamp what I was
measuring (checking if a character were in a set). So, to my chagrin,
I had to go with cut-and-paste.

Without experimenting to find out, of course ....

You are putting the cart before the horse.

THE WHOLE POINT of this exercise was experimentation to test the
speed of the lookup. Until I got the code I was asking about running,
of course, I would not know.

[snip]
That people are not willing to offer further help when many of their
previous suggestions have been rejected should not be a total surprise.
<shrug>

My phrase "not so much help" does not refer to this newsgroup.
You can find texts and Websites on starting with Java. There is also
advanced material. In the middle is a sour spot. I am at the low end
of that sour spot.

Sincerely,

Gene Wirchenko
 
G

Gene Wirchenko

UPDATE: I just re-read the previous threads and realized that
your code actually calls SequentialSearch in the method that's
supposed to be timing BinarySearch. Once I fix that ....

Yeah, I caught that goof later, too. Or was that one pointed
out? There was at least one goof that got pointed out to me. Cut and
paste is a horrible way to have to do it.

[snip]
Both programs (yours and mine) now consistently report sequential
search to be faster than binary search. Weird, but not as weird as
the two being different in that regard ....

That is not the result that I got. Sequential search was
occasionally faster but usually a bit slower the binary search.

Someone posted links about the difficulty of Java benchmarking.
If it really mattered (checking for *small* differences), I would use
them and may well in a future test when it does. The difference
between the binary search and the Treeset search was sufficiently and
consistently different that I went with Treeset.

Sincerely,

Gene Wirchenko
 
R

Robert Klemme

It is horribly verbose. Java: COBOL++?

In part it's the price you pay for static typing (somewhere all those
type names have to appear). Tools are good enough these days to offload
you from most of the hand typing via completion. If you have not yet
used a modern IDE yet you should really try Eclipse or NetBeans both of
which are free of charge.

If you look for a language on the JVM with less overhead maybe Scala is
for you. Some people do find it too compact though.

Kind regards

robert
 
S

supercalifragilisticexpialadiamaticonormalizeringe

In part it's the price you pay for static typing

Wrong. Lots of strongly statically typed languages don't have nearly the
verbosity, including Haskell and the ML family of languages. Generally
what they have is very smart type inference capabilities in the
compilers, so you have to specify types quite rarely, e.g. specifying
the precision a literal number should be considered to have, and
specifying argument types to functions. In-function declarations and
sometimes return types are inferred by data flow analysis. A Java
equivalent would be if you could do

public foobar (int a, String b) {
x = a + b.length();
y = 1.3*x;
return y.toString();
}

and this would work, with the compiler first resolving .length() to
String's method and seeing that this returns int, seeing then that a
b.length() adds two ints so is of type int, and inferring x is an int;
then y is a double times an int so y is a double; then resolving
..toString(); to Double's method (post-autoboxing of y) and inferring
that foobar's return type is String.

pubic wontwork (int a, String ) {
x = a + b
}

blows up, meanwhile, because int + String is not defined anywhere.

In this case, x and y lack *manifest* typing but have compile-time types
more specific than Object.

The thing is, javac already does some type inference, and indeed decides
which .length() method is being called in line 1 and the type of the
expression a + b.length() in exactly the way described above. It just
won't go on to infer x's type if it's not specified. (Partly, also, the
absence of a type before x would make declaring and initializing a local
not look different from assigning to it. A nice rule would be that

mutable x = ...

declares and initializes a mutable local and

x = ...

declares and initializes an immutable local if no mutable x is within
scope, and assigns to the innermost in-scope mutable x otherwise. This
would favor less use of mutation and type inference would take much of
the pain out of any proposed syntax for lambdas in Java, making Java
more supportive of using a functional style in areas where that's
appropriate and avoiding some of the mutation "side-effect spaghetti
code" that's all too common these days.

The semantics above would be that if x isn't explicitly declared
"mutable", it acts like in current Java when it's declared "final".
Definite assignment rules would still apply, more-or-less unchanged,
with "mutable x;" being allowed as "int x;" is now and

if foo
x = baz;
else
quux.frobnicate();
return x + 3;

being a definite assignment violation if no local x existed above "if
foo" in that method body, a violation of finalness if a final x existed
above "if foo" in the same block, and OK if a mutable x existed above
"if foo" in the same scope and was definitely assigned to even if the
else clause executed (say because it was "mutable x = 0;" at the point
of declaration). An immutable x in a block enclosing the block but not
in the same block as the if would create a conundrum: depending on the
if branch taken there seem to be two different xs that the return could
mean. I'd say though that the x = baz; creates a local x in the directly
enclosing block that exists, but is not definitely assigned, at the
return statement. As for

x = foo;
mutable x = bar;

in the same block: illegal. But in nested blocks, inner shadows outer as
always.

Let me know if you find any obvious problem with the above ideas
(besides the high improbability that Oracle will ever implement them).
Tools are good enough these days to offload
you from most of the hand typing via completion.

A proof of concept that javac could do more of the type-inference heavy
lifting itself.
If you look for a language on the JVM with less overhead maybe Scala is
for you. Some people do find it too compact though.

Another functional language with type inference.

Define "too compact", given that it doesn't go to line-noisy extremes
like perl.
 
M

markspace

public foobar (int a, String b) {
x = a + b.length();
y = 1.3*x;


You go through quite a bit of verbiage to justify not typing int and
double in front of those last two lines.

(Incidentally, while we're talking excess verbiage, your nom de usenet
broke my reply button. Do you think you could adopt something that's
less than 40 characters?)

Define "too compact", given that it doesn't go to line-noisy extremes
like perl.


Here at least we agree--Perl is too "compact" to the extreme that it's
almost unreadable. Sure, I wouldn't mind if the Java compiler were a
bit smarter, but at the same time I don't feel your examples were
exactly compelling either.

It's just not that hard to specify the type of a variable explicitly.
No one's fingers will fall off from typing. I understand you don't care
for it, but it's not odious either. Those of you who really like this
sort of thing should check out lambda expressions for Java 8, there's
some additional type inference coming where it's really needed.
 
S

supercalifragilisticexpialadiamaticonormalizeringe

You go through quite a bit of verbiage to justify not typing int and
double in front of those last two lines.

Toy example.

And the JLS goes through a lot more veriage to explain the exact
semantics of every little construct in Java. With good reason.
(Incidentally, while we're talking excess verbiage, your nom de usenet
broke my reply button. Do you think you could adopt something that's
less than 40 characters?)

Do you think you could adopt a newsreader that doesn't have glaring
errors in it such as fixed-size char* buffers instead of
java.lang.Strings or similarly non-woefully-primitive implementation
datatypes? ;)
Here at least we agree--Perl is too "compact" to the extreme that it's
almost unreadable. Sure, I wouldn't mind if the Java compiler were a bit
smarter, but at the same time I don't feel your examples were exactly
compelling either.

It's just not that hard to specify the type of a variable explicitly.

Not when there are only two of them and they're "int" and "double".

It starts to seem a bit more tedious the fifty-thousandth time you type

ConcurrentNavigableMap<QuuxscapeTuttifrutticatorGUIDInstance<String>,
List<FrobnicatorFactoryServiceProviderFactoryImpl<Integer,String>>>
foo

in a three million LOC project. Particularly when you frequently follow
it up with

= new ConcurrentSkipListMap<QuuxScapeTuttifrutticatorGUIDInstance<String>,
List<FrobnicatorFactoryServiceProviderFactoryImpl<Integer,String>>>(new
Comparator<QuuxScapeTuttiFrutticatorGUIDInstance<String>>() {
public int compare (QuuxscapeTuttiFrutticatorGUIDInstance<String>
x, QuuxscapeTuttiFrutticatorGUIDInstance<String> y) {
; use the reverse of the usual order in this one
return compare(y.getToken(),x.getToken());
}
};

and then the damn thing won't compile because you have the wrong
capitalization for QuuxScape (or is it Quuxscape?) here and there.
No one's fingers will fall off from typing.

I beg to differ; I'm sewing two of mine back on right after I hit "send"
and that's just from writing a mere 12 lines of typical Java. Just
imagine how many times I'll have reattached them by the third million
LOC or so. ;)
I understand you don't care for it, but it's not odious either.

*ahem* ConcurrentNavigableMap<QuuxscapeTuttifrutticatorGUIDInstance<String>,
Those of you who really like this sort of thing should check out
lambda expressions for Java 8, there's some additional type
inference coming where it's really needed.

Oh, goody. Progress at last. At this rate maybe we'll even have type
inference in "new" expression initializations by about 2020 or so. Just
in time for my retirement. ;)
 
B

blmblm

[snip]
Oh, I asked about that. One apparently can not pass a function
pointer parameter as in C. The ways that were posted involved lookup
every time AFIACS and I judged that it might swamp what I was
measuring (checking if a character were in a set). So, to my chagrin,
I had to go with cut-and-paste.

Without experimenting to find out, of course ....

You are putting the cart before the horse.

THE WHOLE POINT of this exercise was experimentation to test the
speed of the lookup. Until I got the code I was asking about running,
of course, I would not know.


Maybe I misunderstood what you meant by "lookup" above --
I understood you to be referring to whatever has to be done
to find the method to be called, as opposed to looking up
individual characters in a list/set. As far as I can tell,
your benchmark concerns different approaches to the latter,
but assumes that looking up the method (i.e., using the Java
substitute for function pointers) will somehow skew the results
to the point where it shouldn't/can't be used. (You may actually
be right about that, much as it pains me to say so, because all
that duplicate code is, in my opinion and possibly yours as well,
both aesthetically displeasing and a source of potential trouble.)
[snip]
That people are not willing to offer further help when many of their
previous suggestions have been rejected should not be a total surprise.
<shrug>

My phrase "not so much help" does not refer to this newsgroup.
You can find texts and Websites on starting with Java. There is also
advanced material. In the middle is a sour spot. I am at the low end
of that sour spot.

Ah. Interesting. I don't seem to have a clear memory of how I got
from the beginning level to wherever I am now (intermediate?). I do
remember, though, learning quite a bit from following semi-random
discussions in this newsgroup. But that was, hm, a decade or so ago,
and it's not clear it would still work. <shrug>
 
B

blmblm

On 1 Jul 2011 20:47:47 GMT, (e-mail address removed)

On 30 Jun 2011 20:30:00 GMT, (e-mail address removed)
<[email protected]> wrote:

[ snip ]
UPDATE: I just re-read the previous threads and realized that
your code actually calls SequentialSearch in the method that's
supposed to be timing BinarySearch. Once I fix that ....

Yeah, I caught that goof later, too. Or was that one pointed
out? There was at least one goof that got pointed out to me. Cut and
paste is a horrible way to have to do it.

I think there was one mistake that was pointed out (wrong search) and
one that was not (extra use of SequentialSearch in all methods).
[snip]
Both programs (yours and mine) now consistently report sequential
search to be faster than binary search. Weird, but not as weird as
the two being different in that regard ....

That is not the result that I got. Sequential search was
occasionally faster but usually a bit slower the binary search.

Interesting. So maybe I should try this on a few more systems ....

When I do, I find that on newer systems indeed binary search was
at least a bit faster. On older ones, not so much. Patricia
Shanahan's theory (about why binary search could be slower)
makes sense to me and also provides a plausible explanation of
why results could vary among systems.
Someone posted links about the difficulty of Java benchmarking.
If it really mattered (checking for *small* differences), I would use
them and may well in a future test when it does. The difference
between the binary search and the Treeset search was sufficiently and
consistently different that I went with Treeset.

What I found is that HashSet was noticeably faster on all the
systems where I ran the benchmarks. Unless you need for the set
to be sorted (and it's not apparent from your code that you do),
why not .... ? (I'm curious too about why you chose TreeSet in
the first place. ? )
 
K

KitKat

Ah. Interesting. I don't seem to have a clear memory of how I got
from the beginning level to wherever I am now (intermediate?). I do
remember, though, learning quite a bit from following semi-random
discussions in this newsgroup.

One single 8000-post semi-random discussion in this newsgroup, if the
Giggle Gloops stats for your from address are to be believed.
 
M

markspace

ConcurrentNavigableMap<QuuxscapeTuttifrutticatorGUIDInstance<String>,
List<FrobnicatorFactoryServiceProviderFactoryImpl<Integer,String>>>


I'd like to see some actual examples from actual code, rather than these
obvious straw-man arguments.
 
B

blmblm

One single 8000-post semi-random discussion in this newsgroup, if the
Giggle Gloops stats for your from address are to be believed.

Strictly speaking, those posts (if they're the ones I think you mean)
were made from a different address. But the similarity between that
address and this one is deliberate -- myrealbox.com is no more, but
it seemed like a good idea to try to maintain some continuity.

Now, that discussion (again, if it's the one I think you mean)
was indeed quite educational, but little of it had anything to do
with Java. The learning about Java of which I spoke happened
during the period of several years when I followed the group but
did not post to it.
 
S

Stanimir Stamenkov

Tue, 05 Jul 2011 09:32:16 -0700, /markspace/:
You go through quite a bit of verbiage to justify not typing int and
double in front of those last two lines.

Just wanted to point out, Gosu [1] is statically typed language
(compiled to a JVM bytecode) with type inference, where one could write:

function foobar (a : int, b : String) {
var x = a + b.length();
var y = 1.3 * x;

The difference with the previously given example seems one still
needs variable declaration (using the 'var' keyword).

While 'double' and 'int' are short enough, with Gosu one could just
write:

var a = new MyVeryLongClassName<Foo,BarBaz>();

which appears even better than the Java 7 <> (diamond) operator.
Note Gosu supports explicit variable type declarations, also.

[1] http://en.wikipedia.org/wiki/Gosu_(programming_language)
 
G

Gene Wirchenko

On 5 Jul 2011 19:12:39 GMT, (e-mail address removed)

[snip]
What I found is that HashSet was noticeably faster on all the
systems where I ran the benchmarks. Unless you need for the set
to be sorted (and it's not apparent from your code that you do),
why not .... ? (I'm curious too about why you chose TreeSet in
the first place. ? )

1) I can output the set in order without having to do anything else.
My real program has a lot of debugging info dumping. (Read as "checks
that I have not done something wrong".)

2) When I read "hash", I think "collision", and I get nervous.
Nothing I read reassured me that that could not happen.

3) I had to pick something. If it works, I can change it later. If
it does not, I have not solved my problem yet. The former is safer.

Sincerely,

Gene Wirchenko
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top