• Home
  • Contents
  • About
​​​​​​​​​​​​​​​​

Monday 30 July 2012

Run length encoding


Compression makes data lighter and at the same time occupy less disk space.on this post i will be talking about a widely used algorithm called "Run Length Encoding".The algorithm itself is easy to understand and can be implemented in a very terse way.The Algorithm is best suited for compressing images.Wiki gives us a fair amount of information about its origin and its applications in the past.


what is Run length encoding?


Imagine you have some random data with you that look like below.

RRRRREEEDDISSSSBBBBLLAAACCCKK.

If i count all the characters starting from left to right ,i have a total of  28 characters in it(not a big sized data,but will keep that way for the sake of simplicity).So how do we compress such data or how do we represent the data in  a much shorter form without losing its meaning.

Run length Encoding tells us "If you have multiple occurrences of a character/element in the data,just keep one instance of that character and specify how many times it has appeared on the data".As you can see this can greatly reduce the duplicates.

Taking the previous thought into consideration we will end up with a shorter form of the same data(without losing its meaning) as below.

REDISBLACK2

we had R appear 5 times,E 3 times,D 2 times I 1 time,S 4 times and so on.

One important thing to note is ,if a character reappear on the data with a different character(s) in between them.you will have to start recounting the occurences again.

for e.g compressing

BBOBBB becomes B2O1B3

Deflating or decompressing is very simple as you have to just represent a char ,the times being specified.

B2 --> BB
O3 --> OOO.

More examples can be found on the wikipedia  page.

using System;
using System.Collections.Generic;
using System.Text;
using System.IO;

namespace RunLengthEncoder
{
class Program
{
    char[] dataArray;
    public void Run(String input)
    {
    int repititons = 0;
    int charIndex = 0;
    dataArray = input.ToCharArray();
    StringBuilder compressedData = new StringBuilder();

    while (charIndex < dataArray.Length)
    {
        var ch = dataArray[charIndex];
        while (charIndex < dataArray.Length 
               && !(ch != dataArray[charIndex]))
        {
            ch = dataArray[charIndex];
            repititons += 1;
            charIndex += 1;
        }
        compressedData.Append(repititons);
        compressedData.Append(ch);
        repititons = 0;
    }
    Console.WriteLine("Compressed data " + compressedData.ToString());
    Console.ReadLine();
    }

    static void Main(string[] args)
    {
        Console.WriteLine("Please enter some data");
        var input = Console.ReadLine();
        Program p = new Program();
        p.Run(input);
    }
}
}


Wednesday 25 July 2012

Radix Sort -Part II

In our previous post we looked at the fundamentals of Radix sorting .
we also ran few examples to see the algorithm at work.On today's post we will be looking at the low level implementation of this sorting mechanism.

Click here for PART -I  Fundamentals

What we need to do is to single out a digit from every number (could be a single digit) and then decide which bucket it belongs. The Algorithm makes an assumption that the bucket number cannot exceed number 9 (0<= buketNumber < 10). The most important player in the scheme of things is the modulo operator (%).

Before we dig in we need to look at what lsd's and msd's are.Could be trivial for most but is crucial.

What is an LSD and MSD?

They are least significant digit and most significant digit. In our example
LSD OF 435 is 5
MSD OF 435 is 4.
My algorithm starts from LSD although you can start from MSD and move towards right.

For a number 435 how do we move from left to right?


We do 435 % 10 = 5 so we place the number in bucket 5.

Now to move left on the next iteration you multiply the divider by 10.
435 %( 10* 10) = 435%100= 35

What we have got here is a 2 digit number (35) with one digit (5) already processed so we need to get rid 5 from it. To get rid you do below.
(35 – (35 %10))/10
35 -5 /10
30/10
3 (so you know you need to place the number in bucket 3)

Now in the next iteration,the divider is again multiplied with 10 to get below
435 % (10*10*10) =435%1000=435.

What we need is the leftmost digit 4 from 435.so we keep repeating the steps that we did earlier 
(435 – (435%10))/10 = 430/10= 43

(43-(43%10))/10=40/10 = 4

Finally we know we need to place the number in Bucket 4.

C# Implementation


       

       The sort method does all the hard work.


       Output




     Here is the complete working code :)


Cheers

POLYBIUS



Click here for other Topics





Radix Sort

Radix Sort - I

How it Works

Radix sort is one of the various sorting algorithms which is prevalently used to sort data.The algorithm is fairly simple and can be implemented on any language quite easily.In this post we will look at how to sort a set of numerical data and the program to achieve that would be in C#.

Lets say you want to sort the following numbers in sequential order using Radix

23  12  45   67  34

Radix sort employs a very simple way of looking at the unit of every element and then subsequently placing the element in a bucket that corresponds to the unit.What i mean is 

for a given number 43 i am suggesting here 4 and 3 are the 2 units that make up the whole number.so when you are looking at unit 4 the number 43 would belong to bucket 4 and for 3 it would be bucket 3.All this would be clear with a following example.

we will always start with the least significant digit(of every numeric element in this case) and gradually move towards left on every iteration.The numbers marked below are the least significant digits.we also maintain a bucket for every digit 0 - 9.

 23  12  45  13  67  34

After pass one our buckets and its containing elements will look like below. 

if we read out elements in the buckets 0 -9 we get the new sequence(may or may not be sorted).

12   23   13   34  45   67

Now we need to sort  this list again based on the next unit (move towards left).

12   23   13   34  45   67

After this second pass our Buckets  will look like below.















Now if we read out the elements we can see the elements are sorted

12   13   23   34   45   67

so to sort this list we needed two pass.But this may not be the case when you have mix of elements whose number of digits are different.we will see how we are handling this in second part where i also show the program written in C#.

More details on this algorithm can be found on this  page


Continue to Part -II (C# Implementation)