Java 1.4.2, I need a set of unique strings

D

Donkey Hottie

I'm parsing data from disk, and need to keep a collection of Strings in
memory.

Java does not consider

String s1 = "Abba" ;
String s2 = "Abba" ;

the same, their reference in memory will be propably different. So two
times "Abba" will be allocated?

I may have tens of thousands of "Abba" read from disk to an "index"
object having a String containing the content.

I don't want tens of thousands of different "Abba" strings in the JVM
memory, but I want that

s1 == s2

My algorithm so far is that


SortedMap map = new TreeMap()
map.put("Abba", "Abba");

..

String parsedString ...
String s = (String)map.get(parsedString);

if (s == null)
{
map.put(parsedString, parsedString);
s = parsedString ;
}

myObject.set(s) ;

That way I get only one copy the string in memory. I may have tens of
thousends of "records" read, but the string parsed is mostly same.

MyObject is an index object, containing metadata about the info just read
(file name, position in the file, etc), and that string among others.

So my question is..

How to create a private String table, as Java does not do it? Better
solutions than TreeMap?

TreeSet would be cool, but it has no getter suitable.
 
P

Patricia Shanahan

Donkey said:
I'm parsing data from disk, and need to keep a collection of Strings in
memory.

Java does not consider

String s1 = "Abba" ;
String s2 = "Abba" ;

Since they are equal and both String constant expressions, they will be
represented by the same String object.
the same, their reference in memory will be propably different. So two
times "Abba" will be allocated?

I may have tens of thousands of "Abba" read from disk to an "index"
object having a String containing the content.

I don't want tens of thousands of different "Abba" strings in the JVM
memory, but I want that

s1 == s2

My algorithm so far is that
....

Why not use the String intern() method? See
http://java.sun.com/j2se/1.5.0/docs/api/java/lang/String.html#intern()

Patricia
 
D

Donkey Hottie

P

Patricia Shanahan

Donkey said:
No, they will not.

You can yeasily try this and see.

Well, alright, but I've seen this work correctly many, many times, read
the relevant sections of the Java Language Specification, and have no
particular reason to expect it to be broken for your choice of literals.
s1 == s2 will result to false.

public class StringEqualityTest {
public static void main(String[] args) {
String s1 = "Abba" ;
String s2 = "Abba" ;
System.out.println(s1==s2);
}
}

prints "true".

See the JLS, "15.28 Constant Expression",
http://java.sun.com/docs/books/jls/third_edition/html/expressions.html#15.28

'Compile-time constants of type String are always "interned" so as to
share unique instances, using the method String.intern.'

Patricia
 
D

Donkey Hottie

Donkey said:
No, they will not.

You can yeasily try this and see.

Well, alright, but I've seen this work correctly many, many times,
read the relevant sections of the Java Language Specification, and
have no particular reason to expect it to be broken for your choice of
literals.
s1 == s2 will result to false.

public class StringEqualityTest {
public static void main(String[] args) {
String s1 = "Abba" ;
String s2 = "Abba" ;
System.out.println(s1==s2);
}
}

prints "true".

It just prints: false.

It does. I run this under Java5 and Netbeans on Windows XP.
 
L

Lew

Donkey said:
No, they will not.

Yes, they will.
You can yeasily try this and see.

s1 == s2 will result to false.

As Patricia showed, that results in true.

Donkey said:
And in my application they will not be constants, they will be read from a
file.

<http://java.sun.com/javase/6/docs/api/java/lang/String.html#intern()>

Donkey said:
intern() is new to me, I'll study. Thanks!

It's in String, one of the most fundamental classes in Java! Second
only to Object, I'd say, or perhaps "array".
 
J

Joshua Cranmer

Donkey said:
No, they will not.

You can yeasily try this and see.

s1 == s2 will result to false.

The JLS guarantees that they must be true (more specifically, that a
constant string literal is the same references as its interned value);
furthermore, an analysis of the bytecode and structure of the JVM
immediately makes obvious that to not have them be the same reference
would be difficult.
And in my application they will not be constants, they will be read from a
file.

The purpose of intern. But be careful about interning since interned
strings are only garbage collected in Sun's JVM in versions 5 and above.
 
W

Wojtek

Donkey Hottie wrote :
I'm parsing data from disk, and need to keep a collection of Strings in
memory.
....

I don't want tens of thousands of different "Abba" strings in the JVM
memory, but I want that

Use a HashMap and use the String as the key with an empty Object as the
value. Then iterate throught the keys when you are done reading the
file.
 
M

Mike Schilling

Lew said:
It's in String, one of the most fundamental classes in Java! Second
only to Object, I'd say, or perhaps "array".

Or, per Murphy's law "Exception".
 
S

Steve Wampler

Wojtek said:
Use a HashMap and use the String as the key with an empty Object as the
value. Then iterate throught the keys when you are done reading the file.

Set?
 
T

Tom Anderson

I'm parsing data from disk, and need to keep a collection of Strings in
memory.

Java does not consider

String s1 = "Abba" ;
String s2 = "Abba" ;

the same, their reference in memory will be propably different. So two
times "Abba" will be allocated?

I haven't read the rest of this thread yet, but i'm pretty sure i know how
it goes. Just to ram the point home, i'm going to say it too:

http://java.sun.com/javase/6/docs/api/java/lang/String.html#intern()

tom
 
M

Mike Schilling

Eric said:
Or use a Set ...

If you want the eqivalent of String.intern(), yoiu need to map each
String to itself, so you can

String s = methodThatReturnsString();
String canon = map.get(s);
if (canon == null)
{
map.put(s, s);
canon = s;
}

Or you could just use String.intern() of course.
 
D

Donkey Hottie

Donkey Hottie, you are making of yourself a one-syllable synonym
of the first half of your name.

You are right, I immediately noticed my error, and canceled the post... but
I should have posted a followup instead..
Try it yourself. Here, just copy and paste:

public class Donkey {
public static void main(String[] unused) {
String s1 = "Abba" ;
String s2 = "Abba" ;
System.out.println(s1 == s2);
}
}

My code was like this
String s1 = new String("Abba") ;
String s2 = new String("Abba") ;
System.out.println(s1 == s2);

It was a sample from someone, propably wanting to show that == is no good
for strings. While I was testing other things, I forgot that test code to
its place...

intern() is good.
 
D

Donkey Hottie

The purpose of intern. But be careful about interning since interned
strings are only garbage collected in Sun's JVM in versions 5 and
above.

Thanks for this tip. I already was worrying that there might be some
gotcha. I can't use intern() if this is the case.
 
J

J. Stoever

Patricia said:
'Compile-time constants of type String are always "interned" so as to
share unique instances, using the method String.intern.'

He very specifically said that his Strings are read from a file, and not
compile time constants.
 
P

Patricia Shanahan

J. Stoever said:
He very specifically said that his Strings are read from a file, and not
compile time constants.

Somewhere in the line of quoting, my comment got separated from its
context. As originally interleaved, the quoted comment was a reply to
the specific claim that "s1 == s2 will result to false." when s1 and s2
were references to identical String literals.

Patricia
 
D

Donkey Hottie

Do you need to do anything with your strings other than keep track of
sets of strings you have seen or use them as keys in maps?

If not, just go ahead and use the java.util collections. Except for
IdentityHashMap, they are based on .equals equality, not == identity.

Patricia

I did it with a HashMap and it works nicely. Great success ;)

A HashMap is easy to throw away when not needed, better than String.intern
()
 
J

J. Stoever

Patricia said:
Somewhere in the line of quoting, my comment got separated from its
context. As originally interleaved, the quoted comment was a reply to
the specific claim that "s1 == s2 will result to false." when s1 and s2
were references to identical String literals.

After going back, I realize he really is at fault, since he provided
code that did one thing and a description that did another thing in his
OP. Typically a bad idea. In that spirit, both you and me are right,
depending on which part of his post we look ;)
 
D

Daniel Pitts

Donkey said:
I'm parsing data from disk, and need to keep a collection of Strings in
memory.

Java does not consider

String s1 = "Abba" ;
String s2 = "Abba" ;

the same, their reference in memory will be propably different. So two
times "Abba" will be allocated?

I may have tens of thousands of "Abba" read from disk to an "index"
object having a String containing the content.

I don't want tens of thousands of different "Abba" strings in the JVM
memory, but I want that

s1 == s2

My algorithm so far is that


SortedMap map = new TreeMap()
map.put("Abba", "Abba");

..

String parsedString ...
String s = (String)map.get(parsedString);

if (s == null)
{
map.put(parsedString, parsedString);
s = parsedString ;
}

myObject.set(s) ;

That way I get only one copy the string in memory. I may have tens of
thousends of "records" read, but the string parsed is mostly same.

MyObject is an index object, containing metadata about the info just read
(file name, position in the file, etc), and that string among others.

So my question is..

How to create a private String table, as Java does not do it? Better
solutions than TreeMap?

TreeSet would be cool, but it has no getter suitable.

First, try it without doing anything special, you might be prematurely
worrying about a problem that isn't really a problem.

Second, if you find it really is a problem, you can try using intern().

String parsedString = getParsedString();
String s = parsedString.intern();
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,812
Latest member
GracielaWa

Latest Threads

Top