Compression makes data lighter and at the same time occupy less disk space.on this post i will be talking about a widely used algorithm called "Run Length Encoding".The algorithm itself is easy to understand and can be implemented in a very terse way.The Algorithm is best suited for compressing images.Wiki gives us a fair amount of information about its origin and its applications in the past.
Imagine you have some random data with you that look like below.
RRRRREEEDDISSSSBBBBLLAAACCCKK.
If i count all the characters starting from left to right ,i have a total of 28 characters in it(not a big sized data,but will keep that way for the sake of simplicity).So how do we compress such data or how do we represent the data in a much shorter form without losing its meaning.
Run length Encoding tells us "If you have multiple occurrences of a character/element in the data,just keep one instance of that character and specify how many times it has appeared on the data".As you can see this can greatly reduce the duplicates.
Taking the previous thought into consideration we will end up with a shorter form of the same data(without losing its meaning) as below.
R5 E3 D2 I1 S4 B4 L2 A3 C3 K2
we had R appear 5 times,E 3 times,D 2 times I 1 time,S 4 times and so on.
One important thing to note is ,if a character reappear on the data with a different character(s) in between them.you will have to start recounting the occurences again.
for e.g compressing
BBOBBB becomes B2O1B3
Deflating or decompressing is very simple as you have to just represent a char ,the times being specified.
B2 --> BB
O3 --> OOO.
More examples can be found on the wikipedia page.
what is Run length encoding?
RRRRREEEDDISSSSBBBBLLAAACCCKK.
If i count all the characters starting from left to right ,i have a total of 28 characters in it(not a big sized data,but will keep that way for the sake of simplicity).So how do we compress such data or how do we represent the data in a much shorter form without losing its meaning.
Run length Encoding tells us "If you have multiple occurrences of a character/element in the data,just keep one instance of that character and specify how many times it has appeared on the data".As you can see this can greatly reduce the duplicates.
Taking the previous thought into consideration we will end up with a shorter form of the same data(without losing its meaning) as below.
R5 E3 D2 I1 S4 B4 L2 A3 C3 K2
we had R appear 5 times,E 3 times,D 2 times I 1 time,S 4 times and so on.
One important thing to note is ,if a character reappear on the data with a different character(s) in between them.you will have to start recounting the occurences again.
for e.g compressing
BBOBBB becomes B2O1B3
Deflating or decompressing is very simple as you have to just represent a char ,the times being specified.
B2 --> BB
O3 --> OOO.
More examples can be found on the wikipedia page.
using System; using System.Collections.Generic; using System.Text; using System.IO; namespace RunLengthEncoder { class Program { char[] dataArray; public void Run(String input) { int repititons = 0; int charIndex = 0; dataArray = input.ToCharArray(); StringBuilder compressedData = new StringBuilder(); while (charIndex < dataArray.Length) { var ch = dataArray[charIndex]; while (charIndex < dataArray.Length && !(ch != dataArray[charIndex])) { ch = dataArray[charIndex]; repititons += 1; charIndex += 1; } compressedData.Append(repititons); compressedData.Append(ch); repititons = 0; } Console.WriteLine("Compressed data " + compressedData.ToString()); Console.ReadLine(); } static void Main(string[] args) { Console.WriteLine("Please enter some data"); var input = Console.ReadLine(); Program p = new Program(); p.Run(input); } } }