Web Developer's Diary: Sets

A set basically represents a mathematical set where the elements are unique.In Java the uniqueness of the element (an object) is determined by the way you implement the equals method.Hence if you define your own custom class say Customer , and you want to create a set of customers then you must implement the equals() method of your Customer class.This will ensure that two Customer objects which are equal are not added to the set thereby not breaking the rule of set.All java predefined classes such as Integer,String etc have their equals method implemented already and hence you need not worry about implementing it(infact you cannot as these classes are marked final..more later :)).

1] HashSet - This set uses hashing technique to store elements.Hence one need to implement the hascode() method along with the equals() method.It internally uses hastable to store the elements.

e.g. 1:

Consider the example of storing the 26 english alphabates in a hashtable.Each alphabate represents a unique number say from 0 to 25(hashcodes).Each hascode is stored in a bucket.So there will be 26 buckets each storing one of the 26 hascodes.Each hascode has its corresponding value stored which can be fetched in one operation.

HashCodes 0 1 2 3 ............................ 23 24 25

-------------------------------------------------------------------------

HashValues a b c d ........................... x y z

Now if i want to search for a character say 'x' then the following would be the steps -

1] Hash the element to be searched i.e 'x' producing a hascode i.e 23 .

2] Fetch the corresponding value from the location 23 i.e the stored value 'x' in the hastable.

Hence it takes near constant time to insert or fetch any character into/from hashtable which is very fast.In the above example a hascode is mapped to maximum one character hence its faster.

e.g. 2:

This involves shrinking the total number of buckets to fit all the elements.Continiuing with the above example we would have to have less than 26 buckets to store the 26 alphabates.

But how do you do that?

You can do that by hashing multiple values to the same buckets i.e breaking the rule of one bucket-one hascode-one value to one bucket-one hascode-multiple values.

So if i want to have only 10 buckets to store a character then my hashtable would look like this -

HashCodes 0 1 2 3 4 5 .............................. 9

---------------------------------------------------------------------

HashValues a b c d e f .............................. j

k l m n o p .............................. t

u v w x y z

Now each bucket has one hascode mapped to multiple values instead of just one and if i want to search for a character say 'x' then the following would be the new steps -

1] Hash the element to be searched i.e 'x' producing a hascode i.e 23.

2] Divide 23 by the number of buckets i.e 10, the remainder being 3.Hence th 3rd bucket (starting from 0) is the bucket containing my searched element.

3] Search through the 3rd bucket (hascode 3) again to find the exact match (as 3rd bucket contains multiple values now i.e d,n and x usually called the chain implemented as a linkedlist).

The second step above is the extra step often referred to as the compression function to compress the possible total number of buckets to a subset of buckets.

The third step indicates the implementation of the equals method for searching within the chain.

Performance impact -

It gives near constant time performance for all the operations.The length of the chain also determines the performance while searching a particular element.If the length of the chain is longer than it would take more time to search a particular element using the equals() method.

Ordering -

There is no guarantee of the order of the elements since it uses hashing to store the elements.Hence if you insert the elements in the following order 1|2|3 then there is no guarantee that the elements will be fetched in that order.

2] LinkedHashSet -

This implementation is same as the HashSet implementation.The only difference being it internally uses linked list to store the order of insertion.

Performance impact -

It gives near constant time performance for all the operations.The length of the chain also determines the performance while searching a particular element.

Ordering -

The order of the elements is maintained by the linkedlist implementation.Hence if you insert the elements in the following order 1|2|3 then the elements will be fetched in the same order.

3] TreeSet -

This implementation internally uses the binary tree data structure to store the elements.Hence the elements of this implementation are always sorted either using the natural order or custom order.

e.g. 1:

Consider that the following elements are needed to be inserted - 5,4,7,3,1,9.The following will be the structure of the tree -

When the tree is printed out the order of the elements printed is 1,3,4,5,7,9 (from left most node to the right most node).The construction of the tree is important as it dictates the order of the elements that would be printed out later.

In the above example, the construction of the tree follows the following rules -

a) Consider the first element to be inserted as the head node (5)

b) Starting from the second element insert the node

i) to the left of the head element (5) if its less than head (5) or

ii) to the right of the head element (5) if its greater than the head (5).

c) Repeat the 2nd step for all subsequent elements.

Thus you need to compare every element to be inserted with the head element and subsequent subtree elements to find its exact position of insertion.Hence you will need to compare two elements at every point of time to find its position.

e.g.

If the tree is partially constructed 5,4,7,3 then the structure of the tree would be as follows -

Now the following steps would be taken to insert the next element which is 1 in the tree -

a) Compare 1 with the head element which is 5 (1.compareTo(5)).Since 1 is less than 5 the possible place of insertion is left.

b) But on the left there is already 4.So now compare 1 with 4 (1.compareTo(4)).Since 1 is less than 4 the possible place of insertion is left.

c) But on the left there is already 3.So now compare 1 with 3 (1.compareTo(3)).Since 1 is less than 3 and there is no element to the left of 3 insert 1 to the left.

The following will be the new structure after inserting 1 -

So at each step you need to compare 2 elements (objects).One is the element to be inserted and the other already existing (current element) in the tree.In the coding terms this would get translated to the method as follows.

public int compareTo(Object element) {

// Compare the element to be inserted with the current element in the tree.

// return -1 if current element is less than the element to be inserted

// return 0 if current element is equal to the element to be inserted

// return 1 if current element is greater than the element to be inserted

}

Java provides the comparable interface which has the method compareTo(Object o).Thus one must implement the comparable interface and provide the implementation code for the element which needs to be added to the TreeSet.Thus every element that needs to be added to the TreeSet must be comparable (implement Comparable interface) or else you will get exception.

However there is another way in which you can specify the ordering of elements in the TreeSet without implementing Comparable interface.This can be done by implementing the Compare(Object o1,Object o2) of the Comparator interface.More on this later.

Web Developer's Diary

Saturday, November 8, 2008

Sets

No comments:

About Me

Followers