TROVE – High Performance Collections for Java


TROVE is a Fast, lightweight implementations of the java.util Collections API. These implementations are designed to be pluggable replacements for their JDK equivalents.

Java – Download Free EBooks and Whitepapers

Gap in Java

Collections in Java accept only reference types as its element, not primitive datatypes. When trying to do so it produces a compile time error.  In java when we want to store primitive data types in collections we need to use wrapper classes.

All collection classes of java store memory location of the objects they collect. The primitive values do not fit in to the same definition.

To circumvent this problem, JDK5 and onwards have autoboxing – wherein the primitives are converted to appropriate objects and back when they are added or read from the collections.

Using java.util.HashMap, it is not possible to use Java language arrays as keys. For example, this code:

char[] foo, bar;
foo = new char[] {‘a’,’b’,’c’};
bar = new char[] {‘a’,’b’,’c’};
System.out.println(foo.hashCode() == bar.hashCode() ? “equal” : “not equal”);
System.out.println(foo.equals(bar) ? “equal” : “not equal”);
produces this output:

not equal
not equal

And so an entry stored in a java.util.HashMap with foo as a key could not be retrieved with bar, since there is no way to override hashCode() or equals() on language array objects.

Java™ Application Development on Linux® – Free 599 Page eBook

Enterprise Java Virtualization:

Understanding the TCO Implications

InfoWorld’s Java IDE Comparison Strategy Guide:

Java Essential Training

Apache Jakarta Commons: Reusable Java™ Components

Enabling Rapid ROI: With Java™ – Based Business Intelligence Applications:

Trove comes to Rescue

Trove provides collections for primitive types. Collections which store primitives directly will require less space and yield significant performance gains.

The Trove maps/sets use open addressing instead of the chaining approach taken by the JDK hashtables.

What is Open Addressing?

Open addressing, or closed hashing, is a method of collision resolution in hash tables. With this method a hash collision is resolved by probing, or searching through alternate locations in the array (the probe sequence) until either the target record is found, or an unused array slot is found, which indicates that there is no such key in the table.

Open Addressing vs. Chaining

  Chaining Open addressing
Collision resolution Using external data structure Using hash table itself
Memory waste Pointer size overhead per entry (storing list heads in the table) No overhead
Performance dependence on table’s load factor Directly proportional Proportional to (loadFactor) / (1 – loadFactor)
Allow to store more items, than hash table size Yes No. Moreover, it’s recommended to keep table’s load factor below 0.7
Hash function requirements Uniform distribution Uniform distribution, should avoid clustering
Handle removals Removals are ok Removals clog the hash table with “DELETED” entries
Implementation Simple Correct implementation of open addressing based hash table is quite tricky

The size of the tables used in Trove’s maps/sets is always a prime number, improving the probability of an optimal distribution of entries across the table, and so reducing the likelihood of performance-degrading collisions. Trove sets are not backed by maps, and so using a THashSet does not result in the allocation of an unused “values” array.

In a gnu.trove.THashMap, however, you can implement a TObjectHashingStrategy to enable hashing on arrays:

class CharArrayStrategy implements TObjectHashingStrategy {
public int computeHashCode(Object o) {
char[] c = (char[])o;
// use the shift-add-xor class of string hashing functions
// cf. Ramakrishna and Zobel, “Performance in Practice
// of String Hashing Functions”
int h = 31; // seed chosen at random
for (int i = 0; i < c.length; i++) { // could skip invariants
h = h ^ ((h << 5) + (h >> 2) + c[i]); // L=5, R=2 works well for ASCII input
}
return h;
}

public boolean equals(Object o1, Object o2) {
char[] c1 = (char[])o1;
char[] c2 = (char[])o2;
if (c1.length != c2.length) { // could drop this check for fixed-length keys
return false;
}
for (int i = 0, len = c1.length; i < len; i++) { // could skip invariants
if (c1[i] != c2[i]) {
return false;
}
}
return true;
}
}

Single Sign-On for Java and Web Applications

Bulletproof Java Code: A Practical Strategy for Developing Functional, Reliable, and Secure Java Code

Transforming a Generic Java IDE to Your Application Specific IDE:

The Java Virtual Appliance—No OS Required

BEA WebLogic® Operations Control: Application Virtualization for Enterprise Java

Enabling Rapid ROI: With Java™ – Based Business Intelligence Applications:

References:

http://trove.starlight-systems.com/

http://en.wikipedia.org/wiki/Open_addressing

Advertisements

2 thoughts on “TROVE – High Performance Collections for Java”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s