<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>anupom.toString( ); &#187; Algorithms</title>
	<atom:link href="http://anupom.wordpress.com/category/algorithms/feed/" rel="self" type="application/rss+xml" />
	<link>http://anupom.wordpress.com</link>
	<description>Web Development, Java, PHP, Ruby, Javascript and many more!</description>
	<lastBuildDate>Sat, 12 Sep 2009 05:44:32 +0000</lastBuildDate>
	<generator>http://wordpress.com/</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<cloud domain='anupom.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://www.gravatar.com/blavatar/3ff710d3ac1d9a52305c62be6e59c22a?s=96&#038;d=http://s.wordpress.com/i/buttonw-com.png</url>
		<title>anupom.toString( ); &#187; Algorithms</title>
		<link>http://anupom.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://anupom.wordpress.com/osd.xml" title="anupom.toString( );" />
		<item>
		<title>Huffman Coding</title>
		<link>http://anupom.wordpress.com/2006/10/09/huffman-coding/</link>
		<comments>http://anupom.wordpress.com/2006/10/09/huffman-coding/#comments</comments>
		<pubDate>Sun, 08 Oct 2006 20:33:54 +0000</pubDate>
		<dc:creator>anupom</dc:creator>
				<category><![CDATA[Algorithms]]></category>

		<guid isPermaLink="false">http://anupom.wordpress.com/2006/10/09/huffman-coding/</guid>
		<description><![CDATA[David A. Huffman invented a greedy algorithm that constructs an optimal prefix code called a Huffman Code for lossless data compression. Here’s the pseudocode for Huffman Coding:
Let, C is a Set of n characters.
Frequency of a character k is defined by f&#124;k&#124;
Q is a min-priority que keyed on f  ( Q is a list [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=anupom.wordpress.com&blog=295932&post=30&subd=anupom&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>David A. Huffman invented a greedy algorithm that constructs an optimal prefix code called a Huffman Code for lossless data compression. Here’s the pseudocode for Huffman Coding:<br />
Let, C is a Set of n characters.<br />
Frequency of a character k is defined by f|k|<br />
Q is a min-priority que keyed on f  ( Q is a list which sorted in ascending order by the frequency)</p>
<p><code>HUFFMAN( C )<br />
<span>1 n </span><span style="font-size:12pt;font-family:Wingdings;"><span></span></span><span> = | C<span> |</span></span><br />
2 Q = C<br />
3<strong> for</strong> i = 1 <strong>to</strong> n-1<br />
4<strong>     do</strong> allocate new node z<br />
5         left[z]   = x = EXTRACT-MIN(Q)<br />
6         right[z] = y = EXTRACT-MIN(Q)<br />
7         f(z)    = f(x) + f(y)<br />
8         INSERT(Q , z)<strong><br />
</strong>9<strong>     enddo<br />
</strong>10<strong> return</strong> EXTRACT-MIN(Q)</code></p>
<p>Lets analyze the pseudocode line by line:<code></code></p>
<p>HUFFMAN( C ) : C is a Set of characters<br />
<strong> 1</strong>  &#8216;n&#8217; is the total number of chars in &#8216;C&#8217;<br />
<strong>2</strong>  insert all the chars to the min-priority que &#8216;Q&#8217;<br />
<strong>3</strong> n-1 times<br />
<strong>4</strong> create a new node z<br />
<strong>5</strong> left child of z is the least frequent char popped frm Q<br />
<strong>6</strong> now pop another char from Q to create the right child<br />
<strong>7</strong> frequency of z is the sum of frequencies of its childrn<br />
<strong>8</strong> insert the newly created object into the min -priority que<br />
<strong>9</strong> loop ends<br />
<strong> 10</strong> return the root of the tree</p>
<p class="MsoNormal">Now, we will see an example of compressing a particular text file using the Huffman Coding algorithm. Suppose, we have a text file containing the words “COMPUTER SCIENCE AND ENGINEERING”. First, we have to calculate the frequencies of different characters and sort them in ascending order.</p>
<table cellpadding="0" cellspacing="0" width="87">
<tr>
<td valign="top" width="78">O</td>
<td valign="top" width="12">1</td>
</tr>
<tr>
<td valign="top" width="78">M</td>
<td valign="top" width="12">1</td>
</tr>
<tr>
<td valign="top" width="78">P</td>
<td valign="top" width="12">1</td>
</tr>
<tr>
<td valign="top" width="78">U</td>
<td valign="top" width="12">1</td>
</tr>
<tr>
<td valign="top" width="78">T</td>
<td valign="top" width="12">1</td>
</tr>
<tr>
<td valign="top" width="78">S</td>
<td valign="top" width="12">1</td>
</tr>
<tr>
<td valign="top" width="78">A</td>
<td valign="top" width="12">1</td>
</tr>
<tr>
<td valign="top" width="78">D</td>
<td valign="top" width="12">1</td>
</tr>
<tr>
<td valign="top" width="78">R R</td>
<td valign="top" width="12">2</td>
</tr>
<tr>
<td valign="top" width="78">G G</td>
<td valign="top" width="12">2</td>
</tr>
<tr>
<td valign="top" width="78">C C C</td>
<td valign="top" width="12">3</td>
</tr>
<tr>
<td valign="top" width="78">I I I</td>
<td valign="top" width="12">3</td>
</tr>
<tr>
<td valign="top" width="78">N N N N N</td>
<td valign="top" width="12">5</td>
</tr>
<tr>
<td valign="top" width="78">E E E E E</td>
<td valign="top" width="12">6</td>
</tr>
<tr>
<td colspan="2" valign="top">
<p align="right">29</p>
</td>
</tr>
</table>
<p class="MsoNormal">First, we will take the smallest 2 frequencies at once. And create a sub-tree as follow:</p>
<p class="MsoNormal" align="center"><span>     </span>       (2) OM<br />
<span></span>/   \ <span></span><span></span><br />
[1]<span>     </span>[1]<br />
<span></span>O    <span>              </span>M<span> </span></p>
<p class="MsoNormal" align="left"><!--[if !supportEmptyParas]-->2 least frequent objects are merged together and the result of the merger of these 2 object is a new node whose frequency is the sum of the frequencies of the 2 objects that were merged. All the objects that are added to the tree are then removed from the table. And all the new objects that are created after merging are then added to the table. The table should be again sorted.</p>
<p class="MsoNormal" align="left">We will continue this process until the frequency table has only one object &amp; that is the root of our desired tree. The following is the final tree that we will get after n-1 iterations:</p>
<p style="text-align:center;"><a href="http://anupom.files.wordpress.com/2006/10/huffman-tree.GIF" title="huffman-tree.GIF"><img src="http://anupom.files.wordpress.com/2006/10/tree-huffman.GIF" alt="huffman tree" /></a></p>
<p>&#8216;Leaves&#8217; are shown as rectangles[] and &#8216;internal nodes&#8217; are shown as circles(). From this tree we can get the variable code for each number. Each right edge is labeled as 1 and left edge is labeled as 0. The codeword for a letter is the sequence of labels on the edges connecting the root to the leaf for that letter. If we want to get the variable code for &#8216;E&#8217;, we will traverse edges while going to the leave of &#8216;E&#8217; from the root. The sequence of labels on the edges is the codeword for &#8216;E&#8217;, in our example it is 01. I f we check out all the codewords we will see that letter of higher frequency has got a smaller codeword.</p>
<p>We interpret the binary codeword for a character as the path from the root to the character. Where 0 means &#8216;go to the left child&#8217; and 1 means &#8216;go to the right child&#8217;. If we want to decode the codeword 11001, from the root our path will be right-right-left-left-right. Following the path we will reach to &#8216;U&#8217;, so 11001 is the codeword for &#8216;U&#8217;.</p>
<img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/anupom.wordpress.com/30/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/anupom.wordpress.com/30/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/anupom.wordpress.com/30/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/anupom.wordpress.com/30/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/anupom.wordpress.com/30/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/anupom.wordpress.com/30/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/anupom.wordpress.com/30/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/anupom.wordpress.com/30/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/anupom.wordpress.com/30/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/anupom.wordpress.com/30/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/anupom.wordpress.com/30/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/anupom.wordpress.com/30/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=anupom.wordpress.com&blog=295932&post=30&subd=anupom&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://anupom.wordpress.com/2006/10/09/huffman-coding/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/093e82f19edc6281344aa3b707c6e2b5?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">anupom</media:title>
		</media:content>

		<media:content url="http://anupom.files.wordpress.com/2006/10/tree-huffman.GIF" medium="image">
			<media:title type="html">huffman tree</media:title>
		</media:content>
	</item>
		<item>
		<title>Loss-less data compression using variable length code and greedy technique</title>
		<link>http://anupom.wordpress.com/2006/10/08/loss-less-data-compression-using-variable-length-code-and-greedy-technique/</link>
		<comments>http://anupom.wordpress.com/2006/10/08/loss-less-data-compression-using-variable-length-code-and-greedy-technique/#comments</comments>
		<pubDate>Sat, 07 Oct 2006 19:14:25 +0000</pubDate>
		<dc:creator>anupom</dc:creator>
				<category><![CDATA[Algorithms]]></category>

		<guid isPermaLink="false">http://anupom.wordpress.com/2006/10/08/loss-less-data-compression-using-variable-length-code-and-greedy-technique/</guid>
		<description><![CDATA[Suppose,  we have a  100-character data file and the file only contains 6 different characters from A to F. The frequency of those characters are:
A = 45
B = 15
C = 14
D = 16
E = 4
F = 6
&#8212;&#8212;&#8212;&#8212;&#8211;
100
It means, the total number of A in the file is 45 and so on..
Fixed Length Code [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=anupom.wordpress.com&blog=295932&post=28&subd=anupom&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class='snap_preview'><br /><p>Suppose,  we have a  100-character data file and the file only contains 6 different characters from A to F. The frequency of those characters are:</p>
<p>A = 45<br />
B = 15<br />
C = 14<br />
D = 16<br />
E = 4<br />
F = 6<br />
&#8212;&#8212;&#8212;&#8212;&#8211;<br />
100</p>
<p>It means, the total number of A in the file is 45 and so on..</p>
<p><strong>Fixed Length Code Representation:</strong><br />
Most of the binary character codes like ASCII, UTF-8 or GSM-7bit are fixed length codes. In ASCII we use 7 bits to represent 2^7 = 128 characters. For our file, to represent 6 characters using fixed length code, we need at least 3 bits for each character. The coding may look like as follow:</p>
<p>A = 000<br />
B = 001<br />
C = 010<br />
D = 011<br />
E = 100<br />
F = 101</p>
<p><strong>Using fixed length code, the size of our 100-character data file will be<br />
3*100  =  300 bits.</strong></p>
<p>Now we will see how we can reduce the file size using variable length code.</p>
<p><strong>Variable Length Code Representation:</strong><br />
Prefix-free code (sometimes also called prefix code) is a certain type of variable length code. To know what is a prefix-free code, first we will see an example. The coding for our data file using prefix-free code may look like the following:</p>
<p>A = 0<br />
B = 101<br />
C = 100<br />
D = 111<br />
E = 1101<br />
F = 1100</p>
<p>The specialty of prefix-free-code is, in this coding, no code-word is a prefix of any other code-word. As we can see here ‘101’ is a code-word, and ‘101’ is also not a prefix of any other code-words. And that is true for all other keywords too. To ensure data compression, we have followed another technique, and that is, we have assigned smaller code-words to the characters with higher frequencies. A nifty ‘Greedy Technique’! As in our file the number of occurrence of ‘A’ is the highest, ‘A’ has got the smallest code-word.</p>
<p><strong>Using prefix-free code and greedy algorithm, the size of our 100-character data file will be<br />
1*45 + 3*15 + 3*14 + 3*16 + 4*4 + 4*6  =  220 bits<br />
(which is approximately 25% less).<br />
</strong><br />
Later we will see how to create a variable length code for a particular data file using Huffman Coding.</p>
<img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/anupom.wordpress.com/28/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/anupom.wordpress.com/28/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/anupom.wordpress.com/28/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/anupom.wordpress.com/28/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/anupom.wordpress.com/28/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/anupom.wordpress.com/28/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/anupom.wordpress.com/28/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/anupom.wordpress.com/28/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/anupom.wordpress.com/28/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/anupom.wordpress.com/28/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/anupom.wordpress.com/28/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/anupom.wordpress.com/28/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=anupom.wordpress.com&blog=295932&post=28&subd=anupom&ref=&feed=1" /></div>]]></content:encoded>
			<wfw:commentRss>http://anupom.wordpress.com/2006/10/08/loss-less-data-compression-using-variable-length-code-and-greedy-technique/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/093e82f19edc6281344aa3b707c6e2b5?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">anupom</media:title>
		</media:content>
	</item>
	</channel>
</rss>