sábado, 7 de mayo de 2016

My TODO list (to do)

As a common tag that you will find while using Eclipse IDE or perhaps in other IDEs, TODO (to do) means an incomplete code, something that must be done. We can't say that our program is over if we left TODOs behind. That being said, here is my TODO list:
  • TimSort: A very used sorting algorithm that you can find in Java, Python, Android and many other libraries.
  • Dual-pivot quicksort: As smart as it sounds, quicksort with two pivots is faster! It is used since the JRE7 update.

Library sort

Sorting algorithms are one of the most basic problems that modern computing had to deal with. That is why here I am going to present you one of the modern sorting algorithms implementation called Library sort. This algorithm was presented at the Third International Conference on FUN WITH ALGORITHMS as an improvement of the classic insertion sort algorithm, in which its implementation only runs on O(n log n) time.

In short, library sort works with an auxiliary array in which sorted elements will be inserted sequentially. This auxiliary array called S will however have gaps between the elements. These gaps are made out of empty elements (null in java objects, -1 in an array of indices, etc). The objective is to make place for new elements that, in their sorted position, should be located in the middle of previously inserted items. In the worst case, elements will be translated one place to the right or one place to the left in order to make place for that new item. Additionally, to insert elements as fast as possible, binary search will be used to locate an intermediate position where to put the next element.

Here we have a nice pseudocode:

  1. Rebalance(Array S, Integer iniLen, Integer finLen)
  2.     k = finLen-1
  3.     step = finLen/iniLen
  4.     for j=iniLen-1 to 0:
  5.         S[k] = S[j]
  6.         S[j] = NONE
  7.         k = k-step
  8.     end for
  9.  
  10. LibrarySort(Array A, Integer n, Float epsilon, Array S)
  11.     goal = 1
  12.     pos = 0
  13.     sLen = (Integer)(1+epsilon)*n
  14.     while pos<n://For each round do this:
  15.         for i=1 to goal://Insert 'goal' elements to the sorted array S
  16.             //Search a position to insert A[pos]
  17.             insPos = binarySearch(A[pos], S, sLen)
  18.             if not IS_EMPTY(S[insPos]):
  19.                 //Move elements to the right or the left in order to free insPos
  20.                 freeSpace(insPos, S, sLen)
  21.             end if
  22.             S[insPos] = A[pos]//Insert new element
  23.             pos = pos + 1
  24.             if pos>n://All elements have been inserted
  25.                 return LibrarySort
  26.             end if
  27.         end for
  28.         prevLen = sLen
  29.         sLen = min( (2+2*epsilon)*goal, (1+epsilon)*n )
  30.         //Rebalance array S
  31.         Rebalance(S, prevLen, sLen)
  32.         goal = goal * 2
  33.     end while

As a short complexity analysis we can state the following:
  • There are log2 n rounds
  • In each round (n / log2 n) elements are inserted into the sorted array S in average.
  • To insert an element it takes O (log2 n) time and a bit more because of binary search with intermediate gaps in between the array.
  • At the end of each round, a rebalance is made that takes at most (1+ε)n cycles, that means O(n) time.
  • Conclusions:
    • Total insertion time takes: O (log2 n * (n / log2 n) * log2 n) = O (n log n)
    • Total rebalance time takes: O (log2 n * (1+ε) n) = O (n log n)
    • Library sort takes O (n log n) time.
I have made my own C implementation of this algorithm that works as expected: no more than O (n log n) time.

  1. #define MAX 10000
  2. #define MAX2 20000
  3.  
  4. #define IS_EMPTY(e) (e<0)
  5. #define NONE -1
  6.  
  7.  
  8. void prepareLibrarySort(int epsilon, int n, int S[MAX2], int * sLen) {
  9.     int i;
  10.     *sLen = (1+epsilon)*n;
  11.     for(i=0;i<*sLen;i++) S[i] = NONE;
  12. }
  13.  
  14. int searchFree(int e, int sorted[MAX2], int last) {
  15.     int first = 0;
  16.     int middle;
  17.  
  18.     while(last>=0 && IS_EMPTY(sorted[last])) last --;
  19.     while(first<=last && IS_EMPTY(sorted[first])) first ++;
  20.     while(first<=last) {
  21.         middle = (first+last)/2;
  22.         if(IS_EMPTY(sorted[middle])) {
  23.             int tmp = middle + 1;
  24.             //Look to the right
  25.             while(tmp<last && IS_EMPTY(sorted[tmp])) tmp ++;
  26.             if(sorted[tmp]>e) {
  27.                 tmp = middle - 1;
  28.                 while(middle>first && IS_EMPTY(sorted[middle])) middle --;
  29.                 //Look to the left
  30.                 if(sorted[middle]<e)//Found intermediate position
  31.                     return middle;
  32.                 last = middle - 1;
  33.             } else first = tmp + 1;
  34.         } else if(sorted[middle]<e) {
  35.             first = middle + 1;
  36.         } else {
  37.             last = middle - 1;
  38.         }
  39.     }
  40.     //If no position was found return -1 or if a lower position was found, return that
  41.     if(last>=0 && IS_EMPTY(sorted[last])) last --;
  42.     return last;
  43. }
  44.  
  45. void libSort(int A[MAX], int N, int S[MAX2], int EPSILON) {
  46.     if(N==0) return;
  47.  
  48.     int j, k, step;
  49.  
  50.     // ------ BASE CASE ------
  51.     //Goal: We want 'goal' elements to be inserted into S, for now..
  52.     int goal = 1;
  53.     //How many elements have already been inserted, its 1 for efficiency
  54.     int pos = 1;
  55.  
  56.     S[0] = A[0];//We insert element 0 at position 0
  57.  
  58.     //Initial size of array S
  59.     int sLen = max((1+EPSILON), goal + 1);
  60.  
  61.     // ------ CONDITION -------
  62.     //What has already been read must be less than the total array size
  63.     while(pos<N) {
  64.         // ------ ROUND ------
  65.         //Each round i will end with goal=2^i sorted elements. i starts with 1
  66.         for(j=0;j<goal;j++) {
  67.             //Search where to insert A[pos] (with binary search)
  68.             int insPos = searchFree(A[pos], S, sLen-1);
  69.  
  70.             //Because our binary search returns us the location of an smaller item than the one we search...
  71.             insPos ++;
  72.             if(!IS_EMPTY(S[insPos])) {//There is no place where we wanted to insert that element
  73.                 int nextFree = insPos + 1;//Search a free space forward
  74.                 while(!IS_EMPTY(S[nextFree])) nextFree ++;
  75.                 //At 'nextFree' there is a place, translate all elements one position to the right
  76.                 if(nextFree>=sLen) {//Wait! nextFree is out of bounds
  77.                     insPos --;
  78.                     if(!IS_EMPTY(S[insPos])) {
  79.                         //Search backward
  80.                         nextFree = insPos - 1;
  81.                         while(!IS_EMPTY(S[nextFree])) nextFree --;
  82.                         //Now we translate all the elements to the left
  83.                         while(nextFree<insPos) {
  84.                             S[nextFree] = S[nextFree+1];
  85.                             nextFree ++;
  86.                         }
  87.                     }
  88.                 } else {
  89.                     //Now we translate all the elements to the right
  90.                     while(nextFree>insPos) {
  91.                         S[nextFree] = S[nextFree-1];
  92.                         nextFree --;
  93.                     }
  94.                 }
  95.                 //Now nextFree is insPos; in other words, insPos is free
  96.             } else if(insPos>=sLen) {//insPos is out of bounds
  97.                 //Search a free space backwards
  98.                 insPos --;//This place must be between the limits
  99.                 int nextFree = insPos - 1;
  100.                 while(!IS_EMPTY(S[nextFree])) nextFree --;
  101.                 //Now we translate all the elements to the left
  102.                 while(nextFree<insPos) {
  103.                     S[nextFree] = S[nextFree+1];
  104.                     nextFree ++;
  105.                 }
  106.                 //Now nextFree is insPos; in other words insPos is free
  107.             }
  108.  
  109.             S[insPos] = A[pos ++];//We insert the element and increment our counter
  110.  
  111.             if(pos>=N)
  112.                 return;//That element was the last, return from the function
  113.         }
  114.  
  115.         // ----- REBALANCE -----
  116.         //It takes linear time. Tries to spread the elements as much as possible
  117.         for(j=sLen-1, k=min(goal*(2+2*EPSILON), (1+EPSILON)*N)-1,
  118.                 step=(k+1)/(j+1);j>=0;j--, k-=step) {
  119.             S[k] = S[j];
  120.             S[j] = NONE;
  121.         }
  122.  
  123.         //In each round insert the double of elements to the sorted array
  124.         // because there will be the double of free spaces after the rebalance
  125.         sLen = min(goal*(2+2*EPSILON), N*(1+EPSILON));
  126.         goal <<= 1;//We increment i
  127.     }
  128. }
  129.  
  130. void librarySort(int A[MAX], int n) {
  131.     int epsilon = 1;
  132.     int S[MAX2];
  133.     int sLen, i, j;
  134.     //This takes linear time
  135.     prepareLibrarySort(epsilon, n, S, &sLen);
  136.     //O (n log n)
  137.     libSort(A, n, S, epsilon);
  138.     //This takes linear time
  139.     for(i=0, j=0;i<sLen && j<n;i++)
  140.         if(!IS_EMPTY(S[i])) A[j++] = S[i];
  141. }
In this example, I have used ε=1 because after a few tests it was obvious that there will be no more efficient than that for this implementation.

ε Time (ms)
1 0.054474
2 0.094737
3 0.088421
4 0.131053
5 0.185922
6 0.213159

To compare efficiency between library sort and insertion sort, I made 2 tests. The first was a test with random numbers with n=[10, 10000]. The second one was the worst case scenario of both algorithms: numbers in reverse order. In the case of Library sort there is another as bad case as this one that is when the array is already sorted. Both are terrible for library sort. Here are the results:



Note that library sort generates a strange pattern. Every power of 2 input data size there is a cliff. The reason of that is because the sorted array size grows in a rate of a power of 2 (actually 2k+1-2). Then when the input data size is far to the left from a power of 2, the array S must have many empty spaces since it has just grown its size (check the goal variable). In that moment it is easier to insert data into the array.

In conclusion, there is no need to use insertion sort in most of the cases. Memory required for the implementation of library sort is O(3n). Unless the memory limitation is a very important factor, library sort will be the best solution.