<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Manfredas Zabarauskas&#039; Blog &#187; Development</title>
	<atom:link href="http://blog.zabarauskas.com/category/development/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.zabarauskas.com</link>
	<description>We are what we repeatedly do; excellence, then, is not an act but a habit. -- Aristotle</description>
	<lastBuildDate>Tue, 12 Feb 2013 16:30:49 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.4.2</generator>
		<item>
		<title>Expectation-Maximization Algorithm for Bernoulli Mixture Models (Tutorial)</title>
		<link>http://blog.zabarauskas.com/expectation-maximization-tutorial/</link>
		<comments>http://blog.zabarauskas.com/expectation-maximization-tutorial/#comments</comments>
		<pubDate>Tue, 12 Feb 2013 03:05:53 +0000</pubDate>
		<dc:creator>Manfredas Zabarauskas</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Education]]></category>
		<category><![CDATA[bernoulli mixture models]]></category>
		<category><![CDATA[clustering]]></category>
		<category><![CDATA[expectation maximization]]></category>
		<category><![CDATA[tutorial]]></category>

		<guid isPermaLink="false">http://blog.zabarauskas.com/?p=1465</guid>
		<description><![CDATA[Even though the title is quite a mouthful, this post is about two really cool ideas: A solution to the "chicken-and-egg" problem (known as the Expectation-Maximization method, described by A. Dempster, N. Laird and D. Rubin in 1977), and An application of this solution to automatic image clustering by similarity, using Bernoulli Mixture Models. For [...]]]></description>
			<content:encoded><![CDATA[<p>Even though the title is quite a mouthful, this post is about two really cool ideas:</p>
<ol>
<li>A solution to the "chicken-and-egg" problem (known as the Expectation-Maximization method, described by A. Dempster, N. Laird and D. Rubin in 1977), and</li>
<li>An application of this solution to automatic image clustering by similarity, using Bernoulli Mixture Models.</li>
</ol>
<p>For the curious, an implementation of the automatic image clustering is shown in the video below. The source code (C#, Windows x86/x64) is also <a href="https://github.com/manfredzab/bernoulli-mixture-models" target="_blank">available for download</a>!</p>
<p><small><div class="wp-caption alignleft" style="width: 614px"><iframe width="600" height="338" src="http://www.youtube.com/embed/y8aEPz_c0XM?rel=0" frameborder="0" allowfullscreen></iframe><p class="wp-caption-text">Automatic clustering of handwritten digits from MNIST database using Expectation-Maximization algorithm</p></div></small></p>
<p>While automatic image clustering nicely illustrates the E-M algorithm, E-M has been successfully applied in a number of other areas: I have seen it being used for word alignment in automated machine translation, valuation of derivatives in financial models, and gene expression clustering/motif finding in bioinformatics.</p>
<p><i>As a side note, the notation used in this tutorial closely matches the one used in Christopher M. Bishop's "Pattern Recognition and Machine Learning". This should hopefully encourage you to check out his great book for a broader understanding of E-M, mixture models or machine learning in general.</i></p>
<p>Alright, let's dive in!</p>
<h4>1. Expectation-Maximization Algorithm</h4>
<p>Imagine the following situation. You observe some data set <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_ca340abf4b48dc6d816137fbadf58b53.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="\mathbf{X}" /></span><script type='math/tex'>\mathbf{X}</script> (e.g. a bunch of images). You hypothesize that these images are of <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_a5f3c6a11b03839d46af9fb43c97c188.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="K" /></span><script type='math/tex'>K</script> different objects... but you don't know which images represent which objects.</p>
<p>Let <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_ee4a26edc0110f441a40685aaad9ee97.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="\mathbf{Z}" /></span><script type='math/tex'>\mathbf{Z}</script> be a set of <i>latent</i> (hidden) variables, which tell precisely that: which images represent which objects.</p>
<p>Clearly, if you knew <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_ee4a26edc0110f441a40685aaad9ee97.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="\mathbf{Z}" /></span><script type='math/tex'>\mathbf{Z}</script>, you could group images into the clusters (where each cluster represents an object), and vice versa, if you knew the groupings you could deduce <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_ee4a26edc0110f441a40685aaad9ee97.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="\mathbf{Z}" /></span><script type='math/tex'>\mathbf{Z}</script>. A classical "chicken-and-egg" problem, and a perfect target for an Expectation-Maximization algorithm.</p>
<p>Here's a general idea of how E-M algorithm tackles it. First of all, all images are assigned to clusters arbitrarily. Then we use this assignment to modify the parameters of the clusters (e.g. we change what object is represented by that cluster) to <b>maximize</b> the clusters' ability to explain the data; after which we re-assign all images to the <b>expected</b> most-likely clusters. Wash, rinse, repeat, until the assignment explains the data well-enough (i.e. images from the same clusters are similar enough). </p>
<p><i>(Notice the words in bold in the previous paragraph: this is where the expectation and maximization stages in the E-M algorithm come from.)</i></p>
<p>To formalize (and generalize) this a bit further, say that you have a set of model parameters <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_857ab7c6046c342c686136a5b95e8e5a.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\mathbf{\theta}" /></span><script type='math/tex'>\mathbf{\theta}</script> (in the example above, some sort of cluster descriptions).</p>
<p>To solve the problem of cluster assignments we effectively need to find model parameters <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_1bd8e4ede7c72faf4916ad4c22302007.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\mathbf{\theta'}" /></span><script type='math/tex'>\mathbf{\theta'}</script> that maximize the likelihood of the observed data <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_ca340abf4b48dc6d816137fbadf58b53.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="\mathbf{X}" /></span><script type='math/tex'>\mathbf{X}</script>, or, equivalently, the model parameters that maximize the log likelihod<br />
<p style='text-align:center;'><span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_a8fa912ebe4e7ce5e9473d3f42184119.gif' style='vertical-align: middle; border: none;' class='tex' alt=" \mathbf{\theta'} = \underset{\mathbf{\theta}}{\text{arg max }} \ln \,\text{Pr} (\mathbf{X} | \mathbf{\theta}). " /></span><script type='math/tex;  mode=display'> \mathbf{\theta'} = \underset{\mathbf{\theta}}{\text{arg max }} \ln \,\text{Pr} (\mathbf{X} | \mathbf{\theta}). </script></p></p>
<p>Using some simple algebra we can show that for any latent variable distribution <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_d5a63c12ccf457c7896488e70bcc0f91.gif' style='vertical-align: middle; border: none; ' class='tex' alt="q(\mathbf{Z})" /></span><script type='math/tex'>q(\mathbf{Z})</script>, the log likelihood of the data can be decomposed as<br />
\begin{align}<br />
	\ln \,\text{Pr}(\mathbf{X} | \theta) = \mathcal{L}(q, \theta) + \text{KL}(q || p), \label{eq:logLikelihoodDecomp}<br />
\end{align}<br />
where <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_3b5991dc3e94af06894b8178cf425610.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\text{KL}(q || p)" /></span><script type='math/tex'>\text{KL}(q || p)</script> is the <a href="http://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence" target="_blank">Kullback-Leibler divergence</a> between <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_d5a63c12ccf457c7896488e70bcc0f91.gif' style='vertical-align: middle; border: none; ' class='tex' alt="q(\mathbf{Z})" /></span><script type='math/tex'>q(\mathbf{Z})</script> and the posterior distribution <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_a716452c065a209e2425bd8d363c54e8.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\,\text{Pr}(\mathbf{Z} | \mathbf{X}, \theta)" /></span><script type='math/tex'>\,\text{Pr}(\mathbf{Z} | \mathbf{X}, \theta)</script>, and<br />
\begin{align}<br />
	\mathcal{L}(q, \theta) := \sum_{\mathbf{Z}} q(\mathbf{Z}) \left( \mathcal{L}(\theta) - \ln q(\mathbf{Z}) \right)<br />
\end{align}<br />
with <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_3a2eec0d4b83d9b3d4c91552e2e4e7c9.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\mathcal{L}(\theta) := \ln \,\text{Pr}(\mathbf{X}, \mathbf{Z}| \mathbf{\theta})" /></span><script type='math/tex'>\mathcal{L}(\theta) := \ln \,\text{Pr}(\mathbf{X}, \mathbf{Z}| \mathbf{\theta})</script> being the "complete-data" log likelihood (i.e. log likelihood of both observed and latent data).</p>
<p>To understand what the E-M algorithm does in the expectation (E) step, observe that <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_487830afc9733ae778e8e10752132431.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\text{KL}(q || p) \geq 0" /></span><script type='math/tex'>\text{KL}(q || p) \geq 0</script> for any <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_d5a63c12ccf457c7896488e70bcc0f91.gif' style='vertical-align: middle; border: none; ' class='tex' alt="q(\mathbf{Z})" /></span><script type='math/tex'>q(\mathbf{Z})</script> and hence <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_3523c98642a0e1094107168839a64d4a.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\mathcal{L}(q, \theta)" /></span><script type='math/tex'>\mathcal{L}(q, \theta)</script> is a lower bound on <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_23792be11bf4cc4f2090b70f211b7d87.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\ln \,\text{Pr}(\mathbf{X} | \theta)" /></span><script type='math/tex'>\ln \,\text{Pr}(\mathbf{X} | \theta)</script>. </p>
<p>Then, in the E step, the gap between the <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_3523c98642a0e1094107168839a64d4a.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\mathcal{L}(q, \theta)" /></span><script type='math/tex'>\mathcal{L}(q, \theta)</script> and <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_23792be11bf4cc4f2090b70f211b7d87.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\ln \,\text{Pr}(\mathbf{X} | \theta)" /></span><script type='math/tex'>\ln \,\text{Pr}(\mathbf{X} | \theta)</script> is minimized by minimizing the Kullback-Leibler divergence <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_3b5991dc3e94af06894b8178cf425610.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\text{KL}(q || p)" /></span><script type='math/tex'>\text{KL}(q || p)</script> with respect to <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_d5a63c12ccf457c7896488e70bcc0f91.gif' style='vertical-align: middle; border: none; ' class='tex' alt="q(\mathbf{Z})" /></span><script type='math/tex'>q(\mathbf{Z})</script> (while keeping the parameters <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_2554a2bb846cffd697389e5dc8912759.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\theta" /></span><script type='math/tex'>\theta</script> fixed).</p>
<p>Since <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_3b5991dc3e94af06894b8178cf425610.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\text{KL}(q || p)" /></span><script type='math/tex'>\text{KL}(q || p)</script> is minimized at <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_d464a460559a7a0a72328786a39ef0d4.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\text{KL}(q || p) = 0" /></span><script type='math/tex'>\text{KL}(q || p) = 0</script> when <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_940433b42ddfc9385cf89bddf4040aab.gif' style='vertical-align: middle; border: none; ' class='tex' alt="q(\mathbf{Z}) = \,\text{Pr}(\mathbf{Z} | \mathbf{X}, \theta)" /></span><script type='math/tex'>q(\mathbf{Z}) = \,\text{Pr}(\mathbf{Z} | \mathbf{X}, \theta)</script>, at the E step <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_d5a63c12ccf457c7896488e70bcc0f91.gif' style='vertical-align: middle; border: none; ' class='tex' alt="q(\mathbf{Z})" /></span><script type='math/tex'>q(\mathbf{Z})</script> is set to the conditional distribution <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_a716452c065a209e2425bd8d363c54e8.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\,\text{Pr}(\mathbf{Z} | \mathbf{X}, \theta)" /></span><script type='math/tex'>\,\text{Pr}(\mathbf{Z} | \mathbf{X}, \theta)</script>.</p>
<p>To maximize the model parameters in the M step, the lower bound <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_3523c98642a0e1094107168839a64d4a.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\mathcal{L}(q, \theta)" /></span><script type='math/tex'>\mathcal{L}(q, \theta)</script> is maximized with respect to the parameters <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_2554a2bb846cffd697389e5dc8912759.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\theta" /></span><script type='math/tex'>\theta</script> (while keeping <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_940433b42ddfc9385cf89bddf4040aab.gif' style='vertical-align: middle; border: none; ' class='tex' alt="q(\mathbf{Z}) = \,\text{Pr}(\mathbf{Z} | \mathbf{X}, \theta)" /></span><script type='math/tex'>q(\mathbf{Z}) = \,\text{Pr}(\mathbf{Z} | \mathbf{X}, \theta)</script> fixed; notice that <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_2554a2bb846cffd697389e5dc8912759.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\theta" /></span><script type='math/tex'>\theta</script> in this equation corresponds to the old set of parameters, hence to avoid confusion let <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_766e8651e6370ffa7ae83d4419757d83.gif' style='vertical-align: middle; border: none; ' class='tex' alt="q(\mathbf{Z}) = \,\text{Pr}(\mathbf{Z} | \mathbf{X}, \theta^\text{old})" /></span><script type='math/tex'>q(\mathbf{Z}) = \,\text{Pr}(\mathbf{Z} | \mathbf{X}, \theta^\text{old})</script>).</p>
<p>The function <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_3523c98642a0e1094107168839a64d4a.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\mathcal{L}(q, \theta)" /></span><script type='math/tex'>\mathcal{L}(q, \theta)</script> that is being maximized w.r.t. <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_2554a2bb846cffd697389e5dc8912759.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\theta" /></span><script type='math/tex'>\theta</script> at the M step can be re-written as<br />
\begin{align*}<br />
	\theta^\text{new} &#038;= \underset{\mathbf{\theta}}{\text{arg max }} \left. \mathcal{L}(q, \theta) \right|_{q(\mathbf{Z}) = \,\text{Pr}(\mathbf{Z} | \mathbf{X}, \theta^\text{old})} \\<br />
	&#038;=  \underset{\mathbf{\theta}}{\text{arg max }} \left. \sum_{\mathbf{Z}} q(\mathbf{Z}) \left( \mathcal{L}(\theta) - \ln q(\mathbf{Z}) \right) \right|_{q(\mathbf{Z}) = \,\text{Pr}(\mathbf{Z} | \mathbf{X}, \theta^\text{old})} \\<br />
	&#038;=  \underset{\mathbf{\theta}}{\text{arg max }} \sum_{\mathbf{Z}} \,\text{Pr}(\mathbf{Z} | \mathbf{X}, \theta^\text{old}) \left( \mathcal{L}(\theta) - \ln \,\text{Pr}(\mathbf{Z} | \mathbf{X}, \theta^\text{old}) \right) \\<br />
	&#038;= \underset{\mathbf{\theta}}{\text{arg max }} \mathbb{E}_{\mathbf{Z} | \mathbf{X}, \theta^\text{old}} \left[ \mathcal{L}(\theta) \right] - \sum_{\mathbf{Z}} \,\text{Pr}(\mathbf{Z} | \mathbf{X}, \theta^\text{old}) \ln \,\text{Pr}(\mathbf{Z} | \mathbf{X}, \theta^\text{old}) \\<br />
	&#038;= \underset{\mathbf{\theta}}{\text{arg max }} \mathbb{E}_{\mathbf{Z} | \mathbf{X}, \theta^\text{old}} \left[ \mathcal{L}(\theta) \right] - (C \in \mathbb{R}) \\<br />
	&#038;= \underset{\mathbf{\theta}}{\text{arg max }} \mathbb{E}_{\mathbf{Z} | \mathbf{X}, \theta^\text{old}} \left[ \mathcal{L}(\theta) \right],<br />
\end{align*}</p>
<p>i.e. in the M step the expectation of the joint log likelihood of the complete data is maximized with respect to the parameters <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_2554a2bb846cffd697389e5dc8912759.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\theta" /></span><script type='math/tex'>\theta</script>.</p>
<p>So, just to summarize,</p>
<ul>
<li><b>Expectation</b> step: <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_c46948457ba89dc3c71b958a192fc19c.gif' style='vertical-align: middle; border: none; ' class='tex' alt="q^{t + 1}(\mathbf{Z}) \leftarrow \,\text{Pr}(\mathbf{Z} | \mathbf{X}, \mathbf{\theta}^t)" /></span><script type='math/tex'>q^{t + 1}(\mathbf{Z}) \leftarrow \,\text{Pr}(\mathbf{Z} | \mathbf{X}, \mathbf{\theta}^t)</script></li>
<li><b>Maximization</b> step: <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_ce39f6d35750f5ced3727687ba5d72c3.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\mathbf{\theta}^{t + 1} \leftarrow \underset{\mathbf{\theta}}{\text{arg max }} \mathbb{E}_{\mathbf{Z} | \mathbf{X}, \theta^\text{t}} \left[ \mathcal{L}(\theta) \right]" /></span><script type='math/tex'>\mathbf{\theta}^{t + 1} \leftarrow \underset{\mathbf{\theta}}{\text{arg max }} \mathbb{E}_{\mathbf{Z} | \mathbf{X}, \theta^\text{t}} \left[ \mathcal{L}(\theta) \right]</script> (where superscript <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_80d12415a9bdfb734dc90aed004c69d7.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\mathbf{\theta}^t" /></span><script type='math/tex'>\mathbf{\theta}^t</script> indicates the value of parameter <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_857ab7c6046c342c686136a5b95e8e5a.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\mathbf{\theta}" /></span><script type='math/tex'>\mathbf{\theta}</script> at time <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_e358efa489f58062f10dd7316b65649e.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="t" /></span><script type='math/tex'>t</script>).</li>
</ul>
<p>Phew. Let's go to the image clustering example, and see how all of this actually works. <span id="more-1465"></span></p>
<h4>2. Bernoulli Mixture Models for Image Clustering</h4>
<p>First of all, let's represent the image clustering problem in a more formal way.</p>
<p><b>2.1. Formal description</b></p>
<p>Say that we are given <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_8d9c307cb7f3c4a32822a51922d1ceaa.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="N" /></span><script type='math/tex'>N</script> same-sized training images <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_62347ba2602023c7e5bb2dd6a90124b5.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\mathbf{x_n} = (x_{n,1}, ..., x_{n,D})^T" /></span><script type='math/tex'>\mathbf{x_n} = (x_{n,1}, ..., x_{n,D})^T</script> for <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_69421821c977109b6300f0931b9792f0.gif' style='vertical-align: middle; border: none; ' class='tex' alt="n \in \{1, ..., N \}" /></span><script type='math/tex'>n \in \{1, ..., N \}</script>, each image containing <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_f623e75af30e62bbd73d6df5b50bb7b5.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="D" /></span><script type='math/tex'>D</script> binary pixels (i.e. <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_5ba4153151045361e1dac5ed4a7fc711.gif' style='vertical-align: middle; border: none; ' class='tex' alt="x_{n,i} \in \{ 0, 1 \}" /></span><script type='math/tex'>x_{n,i} \in \{ 0, 1 \}</script>).</p>
<p>Assuming that the pixels are conditionally independent from each other (i.e. that <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_ff0a6377fbfe2a11c02cbe9905dd10b3.gif' style='vertical-align: middle; border: none; ' class='tex' alt="x_{n, i}" /></span><script type='math/tex'>x_{n, i}</script> is conditionally independent from <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_98fe526d4520aea70a59aef2e0b11894.gif' style='vertical-align: middle; border: none; ' class='tex' alt="x_{n, j \neq i}" /></span><script type='math/tex'>x_{n, j \neq i}</script> for each <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_f1073288f8ab17b4837e406f354586f4.gif' style='vertical-align: middle; border: none; ' class='tex' alt="i, j \in \{ 1, ..., D \}" /></span><script type='math/tex'>i, j \in \{ 1, ..., D \}</script>), the probability distribution of the pixel <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_865c0c0b4ab0e063e5caa3387c1a8741.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="i" /></span><script type='math/tex'>i</script> over all images belonging to a component <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_8ce4b16b22b58894aa86c421e8759df3.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="k" /></span><script type='math/tex'>k</script> can be modelled using <a href="http://en.wikipedia.org/wiki/Bernoulli_distribution" target="_blank">Bernoulli distribution</a> with a parameter <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_9b1b0870f018f029c70910e55bf646f6.gif' style='vertical-align: middle; border: none; ' class='tex' alt="0 \leq \mu_{k, i} \leq 1" /></span><script type='math/tex'>0 \leq \mu_{k, i} \leq 1</script>.</p>
<p>To incorporate some prior knowledge about the image assignment to <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_a5f3c6a11b03839d46af9fb43c97c188.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="K" /></span><script type='math/tex'>K</script> clusters (e.g. the proportions of images in each cluster), the assignments can be treated as being sampled from the multivariate distribution with the parameters <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_dd586116d0f01aaf0607a092f1e60f3c.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="\pi_1, ..., \pi_K" /></span><script type='math/tex'>\pi_1, ..., \pi_K</script> (where <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_3cc677dff8854062e1567f1ad406209e.gif' style='vertical-align: middle; border: none; ' class='tex' alt="0 \leq \pi_i \leq 1" /></span><script type='math/tex'>0 \leq \pi_i \leq 1</script>, <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_f4dd64e79ff9bdabf71e83e0905bb841.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\sum_{i = 1}^K \pi_i = 1" /></span><script type='math/tex'>\sum_{i = 1}^K \pi_i = 1</script>). Each <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_8b7d5fed535e485e329547d73a395ba2.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\pi_i" /></span><script type='math/tex'>\pi_i</script> for <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_2464953fbfd513653d729ab2e0879a7f.gif' style='vertical-align: middle; border: none; ' class='tex' alt="i \in \{1, ..., K\}" /></span><script type='math/tex'>i \in \{1, ..., K\}</script> is called a <i>mixing coefficient</i> of cluster <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_865c0c0b4ab0e063e5caa3387c1a8741.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="i" /></span><script type='math/tex'>i</script>.</p>
<p>Let say that the model parameters include the pixel distributions of each cluster and the prior knowledge about the image assignments, i.e. <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_a56087f553b68a35567dbb91778daeeb.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\theta = (\mathbf{\mu}, \mathbf{\pi})" /></span><script type='math/tex'>\theta = (\mathbf{\mu}, \mathbf{\pi})</script>, where <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_3a56617aefbfe943911dd26c8271af09.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="\mathbf{\mu} := (\mathbf{\mu_1} \; \mathbf{\mu_2} \;... \;\mathbf{\mu_K} ) = \left( \begin{array}{cccc} \mu_{1, 1} & \mu_{2, 1} & ... & \mu_{K, 1} \\ \mu_{1, 2} & \mu_{2, 2} & ... & \mu_{K, 2} \\ \vdots & \vdots & \ddots & \vdots \\ \mu_{1, D} & \mu_{2, D} & ... & \mu_{K, D} \\ \end{array} \right)" /></span><script type='math/tex'>\mathbf{\mu} := (\mathbf{\mu_1} \; \mathbf{\mu_2} \;... \;\mathbf{\mu_K} ) = \left( \begin{array}{cccc} \mu_{1, 1} & \mu_{2, 1} & ... & \mu_{K, 1} \\ \mu_{1, 2} & \mu_{2, 2} & ... & \mu_{K, 2} \\ \vdots & \vdots & \ddots & \vdots \\ \mu_{1, D} & \mu_{2, D} & ... & \mu_{K, D} \\ \end{array} \right)</script> and <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_3eb51f1ea9a3e767348103a7b3eb7721.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\mathbf{\pi} := ( \pi_1, ..., \pi_K )^T" /></span><script type='math/tex'>\mathbf{\pi} := ( \pi_1, ..., \pi_K )^T</script>.</p>
<p>Then, the likelihood of a single training image <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_70e59a996bd69a0c21878b4093375e92.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="\mathbf{x}" /></span><script type='math/tex'>\mathbf{x}</script> is<br />
\begin{align}<br />
	\,\text{Pr}(\mathbf{x} | \theta) = \,\text{Pr}(\mathbf{x} | \mathbf{\mu}, \mathbf{\pi}) = \sum_{k = 1}^K \pi_k \,\text{Pr}(\mathbf{x}|\mathbf{\mu_k})<br />
\end{align}<br />
where the probability that <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_70e59a996bd69a0c21878b4093375e92.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="\mathbf{x}" /></span><script type='math/tex'>\mathbf{x}</script> is generated by cluster <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_8ce4b16b22b58894aa86c421e8759df3.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="k" /></span><script type='math/tex'>k</script> can be written as<br />
\begin{align}<br />
	\,\text{Pr}(\mathbf{x}|\mathbf{\mu_k}) = \prod_{i = 1}^D \mu_{k, i}^{x_i} (1 - \mu_{k, i})^{1 - x_i}.<br />
\end{align}</p>
<p>To model the assignment of images to clusters, associate a latent <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_a5f3c6a11b03839d46af9fb43c97c188.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="K" /></span><script type='math/tex'>K</script>-dimensional binary random variable <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_178e6ad9c60074e5779898df7c669e30.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\mathbf{z_i}" /></span><script type='math/tex'>\mathbf{z_i}</script> with each of the training examples <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_c782551757fc6d84716da8aeb1f891f7.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\mathbf{x_i}" /></span><script type='math/tex'>\mathbf{x_i}</script>. Say that <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_178e6ad9c60074e5779898df7c669e30.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\mathbf{z_i}" /></span><script type='math/tex'>\mathbf{z_i}</script> has a 1-of-<span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_a5f3c6a11b03839d46af9fb43c97c188.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="K" /></span><script type='math/tex'>K</script> representation, i.e. for <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_09e245587036367af8f12446b14d97b8.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\mathbf{z_i} := (z_{i, 1}, ..., z_{i, K})^T" /></span><script type='math/tex'>\mathbf{z_i} := (z_{i, 1}, ..., z_{i, K})^T</script> it must be the case that <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_f15cedfca74c7bcbe74888c8fdf694e4.gif' style='vertical-align: middle; border: none; ' class='tex' alt="z_{i, j} \in \{0, 1\}" /></span><script type='math/tex'>z_{i, j} \in \{0, 1\}</script> for <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_7bd2098b4df3b8f5a992c8c66d669b6c.gif' style='vertical-align: middle; border: none; ' class='tex' alt="i \in \{ 1, ..., N \}, j \in \{ 1, ..., K \}" /></span><script type='math/tex'>i \in \{ 1, ..., N \}, j \in \{ 1, ..., K \}</script> and <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_044f48cd383ed3cdbfcc7e3cf0c77229.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\sum_{j = 1}^{K} z_{i, j} = 1" /></span><script type='math/tex'>\sum_{j = 1}^{K} z_{i, j} = 1</script>.</p>
<p>Furthermore, let the marginal distribution over <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_178e6ad9c60074e5779898df7c669e30.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\mathbf{z_i}" /></span><script type='math/tex'>\mathbf{z_i}</script> be specified in terms of mixing coefficients <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_123008b3f7350bfe54c9616559cc267f.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="\mathbf{\pi}" /></span><script type='math/tex'>\mathbf{\pi}</script> s.t. <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_dcd8be66664c2bedd8d66b966552e16d.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\,\text{Pr}(z_{i, j} = 1) = \pi_j" /></span><script type='math/tex'>\,\text{Pr}(z_{i, j} = 1) = \pi_j</script>, then<br />
\begin{align}<br />
	\,\text{Pr}(\mathbf{z_n} | \mathbf{\pi}) = \prod_{i = 1}^K \pi_i^{z_{n, i}}.<br />
\end{align}</p>
<p>Similarly, let <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_d91b8f17bbe871fba2b089f0ef31d65e.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\,\text{Pr}(\mathbf{x_n} | z_{n, k} = 1) = \,\text{Pr}(\mathbf{x_n} | \mathbf{\mu_k})" /></span><script type='math/tex'>\,\text{Pr}(\mathbf{x_n} | z_{n, k} = 1) = \,\text{Pr}(\mathbf{x_n} | \mathbf{\mu_k})</script>, then<br />
\begin{align}<br />
	\,\text{Pr}(\mathbf{x_n} | \mathbf{z_n}, \mathbf{\mu}, \mathbf{\pi}) = \prod_{k = 1}^K \,\text{Pr}(\mathbf{x_n} | \mathbf{\mu_k})^{z_{n, k}}.<br />
\end{align}</p>
<p>By combining all latent variables <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_178e6ad9c60074e5779898df7c669e30.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\mathbf{z_i}" /></span><script type='math/tex'>\mathbf{z_i}</script> into a set <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_5509946a5da4ab17e8df8907a4a69882.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\mathbf{Z} := \{ \mathbf{z_1}, ..., \mathbf{z_N} \}" /></span><script type='math/tex'>\mathbf{Z} := \{ \mathbf{z_1}, ..., \mathbf{z_N} \}</script>, we can write<br />
\begin{equation} \label{eq:probZ}<br />
	\begin{split}<br />
	\,\text{Pr}(\mathbf{Z}|\mathbf{\pi}) &#038;= \prod_{n = 1}^N \,\text{Pr}(\mathbf{z_n}|\mathbf{\pi}) \\<br />
		 &#038;= \prod_{n = 1}^N \prod_{k = 1}^K \pi_k^{z_{n, k}},<br />
	\end{split}<br />
\end{equation}<br />
and, similarly, combining all training images <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_c782551757fc6d84716da8aeb1f891f7.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\mathbf{x_i}" /></span><script type='math/tex'>\mathbf{x_i}</script> into a set <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_adc20c0586599d4f1bb3ed503f92f93e.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\mathbf{X} := \{ \mathbf{x_1}, ..., \mathbf{x_N} \}" /></span><script type='math/tex'>\mathbf{X} := \{ \mathbf{x_1}, ..., \mathbf{x_N} \}</script>, we can express the marginal training data distribution as<br />
\begin{equation} \label{eq:probXgivZ}<br />
	\begin{split}<br />
	\,\text{Pr}(\mathbf{X}|\mathbf{Z}, \mathbf{\mu}, \mathbf{\pi}) &#038;= \prod_{n = 1}^N \,\text{Pr}(\mathbf{x_n}|\mathbf{z_n},\mathbf{\mu},\mathbf{\pi}) \\<br />
	&#038;= \prod_{n = 1}^N \prod_{k = 1}^K \,\text{Pr}(\mathbf{x_n} | \mathbf{\mu_k})^{z_{n, k}} \\<br />
	&#038;= \prod_{n = 1}^N \prod_{k = 1}^K \left( \prod_{i = 1}^D \mu_{k, i}^{x_{n, i}} (1 - \mu_{k, i})^{1 - x_{n, i}} \right)^{z_{n, k}}.<br />
	\end{split}<br />
\end{equation}</p>
<p>From the last two equations and the <a href="http://en.wikipedia.org/wiki/Chain_rule_(probability)" target="_blank">probability chain rule</a>, the complete data likelihood can be written as:<br />
	\begin{equation} \label{eq:probXandZ}<br />
		\begin{split}<br />
		\,\text{Pr}(\mathbf{X}, \mathbf{Z}| \mathbf{\mu}, \mathbf{\pi}) &#038;= \,\text{Pr}(\mathbf{X} | \mathbf{Z}, \mathbf{\mu}, \mathbf{\pi}) \,\text{Pr}(\mathbf{Z}| \mathbf{\mu}, \mathbf{\pi}) \\<br />
		&#038;= \prod_{n = 1}^N \prod_{k = 1}^K \left( \pi_k \prod_{i = 1}^D \mu_{k, i}^{x_{n, i}} (1 - \mu_{k, i})^{1 - x_{n, i}} \right)^{z_{n, k}},<br />
		\end{split}<br />
	\end{equation}	</p>
<p>and thus the complete data log likelihood <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_bf2125c11de7fe217910188c182f5288.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\mathcal{L}(\theta)" /></span><script type='math/tex'>\mathcal{L}(\theta)</script> can be obtained by taking a log of the equation above:<br />
\begin{equation}<br />
	\begin{split}<br />
	\mathcal{L}(\theta) &#038;= \ln \,\text{Pr}(\mathbf{X}, \mathbf{Z}| \mathbf{\mu}, \mathbf{\pi}) \\<br />
	&#038;= \sum_{n = 1}^N \sum_{k = 1}^K z_{n, k} \left( \ln \pi_k + \sum_{i = 1}^D x_{n, i} \ln \mu_{k, i} + (1 - x_{n, i}) \ln (1 - \mu_{k, i}) \right).<br />
	\end{split}<br />
\end{equation}</p>
<p><i>(Still following? Great. Take five, and below we will derive the E and M step update equations.)</i></p>
<p><b>2.2. E-M update equations for BMMs</b></p>
<p>In order to update the latent variable distribution (i.e. image assignment to clusters) at the <b>expectation</b> step, we need to set the probability distribution of <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_478a3af607f667c68c7fc285afbcca28.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\textbf{Z}" /></span><script type='math/tex'>\textbf{Z}</script> to <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_5fd636609e41f7ece6f7f1bb0b31f5d6.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\,\text{Pr}(\mathbf{Z} | \mathbf{X}, \mathbf{\theta})" /></span><script type='math/tex'>\,\text{Pr}(\mathbf{Z} | \mathbf{X}, \mathbf{\theta})</script>. </p>
<p>However, we cannot calculate this distribution exactly, hence we will have to approximate this assignment. A simple way of doing it is to replace the current values of <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_7c217b666d7254221603228d22b539db.gif' style='vertical-align: middle; border: none; ' class='tex' alt="z_{n, k}" /></span><script type='math/tex'>z_{n, k}</script> with the expected ones:</p>
<p>\begin{equation} \label{eq:z}<br />
\begin{split}<br />
z_{n, k}^\text{new} \leftarrow \mathbb{E}_{\mathbf{Z} | \mathbf{X}, \mathbf{\mu}, \mathbf{\pi}}[z_{n, k}] &#038;= \sum_{z_{n, k}} \,\text{Pr}(z_{n,k} | \mathbf{x_n}, \mathbf{\mu}, \mathbf{\pi}) \, z_{n,k}\\<br />
&#038;= \frac{\pi_k \,\text{Pr}(\mathbf{x_n} |\mathbf{\mu_k})}{\sum_{m = 1}^K \pi_m \,\text{Pr}(\mathbf{x_n} | \mathbf{\mu_m})} \\<br />
&#038;= \frac{\pi_k \prod_{i = 1}^D \mu_{k, i}^{x_{n, i}} (1 - \mu_{k, i})^{1 - x_{n, i}} }{\sum_{m = 1}^K \pi_m \prod_{i = 1}^D \mu_{m, i}^{x_{n, i}} (1 - \mu_{m, i})^{1 - x_{n, i}}}.<br />
\end{split}<br />
\end{equation}</p>
<p>(Notice that after this update <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_fce58a4d5937c4f0608afaa1379a8cf4.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="\mathbf{z_{n}}^\text{new}" /></span><script type='math/tex'>\mathbf{z_{n}}^\text{new}</script> is no longer represented as 1-of-<span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_a5f3c6a11b03839d46af9fb43c97c188.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="K" /></span><script type='math/tex'>K</script> vector, i.e. the same image can be "partially" assigned to multiple clusters.)</p>
<p>In the <b>maximization</b> step we need to maximize the model parameters (i.e. the mixing coefficients and the pixel distributions) using the update equation from earlier<br />
<p style='text-align:center;'><span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_7b765729a0303627aca0d776d313549f.gif' style='vertical-align: middle; border: none;' class='tex' alt=" \mathbf{\theta}^\text{new} \leftarrow \underset{\mathbf{\theta}}{\text{arg max }} \mathbb{E}_{\mathbf{Z} | \mathbf{X}, \theta^\text{old}} \left[ \mathcal{L}(\theta) \right]. " /></span><script type='math/tex;  mode=display'> \mathbf{\theta}^\text{new} \leftarrow \underset{\mathbf{\theta}}{\text{arg max }} \mathbb{E}_{\mathbf{Z} | \mathbf{X}, \theta^\text{old}} \left[ \mathcal{L}(\theta) \right]. </script></p></p>
<p>Observe that<br />
\begin{align}<br />
	\mathbb{E}_{\mathbf{Z} | \mathbf{X}, \theta^\text{old}} \left[ \mathcal{L}(\theta) \right] &#038;= \sum_{n = 1}^N \sum_{k = 1}^K \mathbb{E}_{\mathbf{Z} | \mathbf{X}, \mathbf{\mu}^\text{old}, \mathbf{\pi}^\text{old}} \left[ z_{n, k} \right] \left( \ln \pi_k + \sum_{i = 1}^D x_{n, i} \ln \mu_{k, i} + (1 - x_{n, i}) \ln (1 - \mu_{k, i}) \right).<br />
\end{align}<br />
The equation above can be maximized w.r.t. <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_a318d33072fa86e16309eab1bb04f68a.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\mathbf{\mu_k}" /></span><script type='math/tex'>\mathbf{\mu_k}</script> by simply setting its derivative to zero:<br />
\begin{align}<br />
\frac{\partial}{\partial \mu_{m, j}} \mathbb{E}_{\mathbf{Z} | \mathbf{X}, \theta^\text{old}} \left[ \mathcal{L}(\theta) \right] &#038;= \sum_{n = 1}^N \mathbb{E}_{\mathbf{Z} | \mathbf{X}, \mathbf{\mu}^\text{old}, \mathbf{\pi}^\text{old}} \left[ z_{n, m} \right] \left( \frac{x_{n, j}}{\mu_{m, j}} - \frac{1 - x_{n, j}}{1 - \mu_{m, j}} \right) \\<br />
&#038;= \sum_{n = 1}^N z_{n, m}^\text{new} \frac{x_{n, j} - \mu_{m, j}}{\mu_{m, j} (1 - \mu_{m, j})} = 0 \Leftrightarrow \\<br />
\mu_{m, j} &#038;= \frac{1}{N_m} \sum_{n = 1}^N x_{n, j} z_{n, m}^\text{new},<br />
\end{align}<br />
where <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_8c22adaa50be16e1e6d6be65240014f3.gif' style='vertical-align: middle; border: none; ' class='tex' alt="N_m = \sum_{n = 1}^N z_{n, m}^\text{new}" /></span><script type='math/tex'>N_m = \sum_{n = 1}^N z_{n, m}^\text{new}</script> is the effective number of images assigned to cluster <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_6f8f57715090da2632453988d9a1501b.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="m" /></span><script type='math/tex'>m</script>.</p>
<p>Then the full cluster <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_6f8f57715090da2632453988d9a1501b.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="m" /></span><script type='math/tex'>m</script> pixel distribution vector <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_aca5e0ac23ff5d1abf687318721a9aab.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\mathbf{\mu_m}" /></span><script type='math/tex'>\mathbf{\mu_m}</script> can be written as<br />
<p style='text-align:center;'><span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_da7d65d23d179e764bab11130b24c2ea.gif' style='vertical-align: middle; border: none;' class='tex' alt=" \mathbf{\mu_m} = \mathbf{\bar{x}_m}, " /></span><script type='math/tex;  mode=display'> \mathbf{\mu_m} = \mathbf{\bar{x}_m}, </script></p><br />
where <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_9f2c9cd71841770d5f19ab59abac7bdb.gif' style='vertical-align: middle; border: none; ' class='tex' alt=" \mathbf{\bar{x}_m} = \frac{1}{N_m} \sum_{n = 1}^N z_{n, m}^\text{new} \mathbf{x_n}" /></span><script type='math/tex'> \mathbf{\bar{x}_m} = \frac{1}{N_m} \sum_{n = 1}^N z_{n, m}^\text{new} \mathbf{x_n}</script> is the weighted mean of the images associated with cluster <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_6f8f57715090da2632453988d9a1501b.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="m" /></span><script type='math/tex'>m</script>.</p>
<p>To maximize <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_2bdca63d7b1c6b52f12beb127b6707eb.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\mathbb{E}_{\mathbf{Z} | \mathbf{X}, \theta^\text{old}} \left[ \mathcal{L}(\theta) \right]" /></span><script type='math/tex'>\mathbb{E}_{\mathbf{Z} | \mathbf{X}, \theta^\text{old}} \left[ \mathcal{L}(\theta) \right]</script> w.r.t. the mixing coefficients <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_123008b3f7350bfe54c9616559cc267f.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="\mathbf{\pi}" /></span><script type='math/tex'>\mathbf{\pi}</script> (subject to the constraint <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_ca751f69b539854a8ff61db1ac8d64a2.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\sum_{k = 1}^K \pi_k = 1" /></span><script type='math/tex'>\sum_{k = 1}^K \pi_k = 1</script>) we can use the <a href="http://en.wikipedia.org/wiki/Lagrange_multiplier" target="_blank">Lagrange multipliers</a>, yielding the following optimization problem:<br />
\begin{equation*}<br />
\Lambda(\theta, \lambda) := \mathbb{E}_{\mathbf{Z} | \mathbf{X}, \theta^\text{old}} \left[ \mathcal{L}(\theta) \right] + \lambda \left( \sum_{k = 1}^K \pi_k - 1 \right).<br />
\end{equation*}<br />
The optimizing solution can then be found again with simple partial derivatives:<br />
\begin{align}<br />
\frac{\partial}{\partial \pi_{m}} \Lambda(\theta, \lambda) &#038;= \frac{1}{\pi_m} \sum_{n = 1}^N z_{n,m}^\text{new} + \lambda = 0 \Leftrightarrow \\<br />
\pi_m &#038;= -\frac{N_m}{\lambda},<br />
\end{align}<br />
\begin{align}<br />
\frac{\partial}{\partial \lambda} \Lambda(\theta, \lambda) &#038;= \sum_{k = 1}^K \pi_k - 1 = 0 \Leftrightarrow \\<br />
\sum_{k = 1}^K \pi_k &#038;= 1.<br />
\end{align}<br />
By combining these two results <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_991fb25e41ade286085fc97eba9cba59.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\lambda = - \sum_{k = 1}^K N_k = - N" /></span><script type='math/tex'>\lambda = - \sum_{k = 1}^K N_k = - N</script>, and thus<br />
<p style='text-align:center;'><span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_4beae6eb6a80b777beba584dddd7ebc6.gif' style='vertical-align: middle; border: none;' class='tex' alt=" \pi_m = -\frac{N_m}{\lambda} = \frac{N_m}{N}." /></span><script type='math/tex;  mode=display'> \pi_m = -\frac{N_m}{\lambda} = \frac{N_m}{N}.</script></p></p>
<p>Done!</p>
<p><b>2.3. Summary</b></p>
<p>In summary, the update equations for Bernoulli Mixture Models using E-M are:</p>
<ul>
<li><b>Expectation</b> step:<br />
<p style='text-align:center;'><span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_6c2b98828e2bb17a0eb64788360b1189.gif' style='vertical-align: middle; border: none;' class='tex' alt=" z_{n, k} \leftarrow \frac{\pi_k \prod_{i = 1}^D \mu_{k, i}^{x_{n, i}} (1 - \mu_{k, i})^{1 - x_{n, i}} }{\sum_{m = 1}^K \pi_m \prod_{i = 1}^D \mu_{m, i}^{x_{n, i}} (1 - \mu_{m, i})^{1 - x_{n, i}}}. " /></span><script type='math/tex;  mode=display'> z_{n, k} \leftarrow \frac{\pi_k \prod_{i = 1}^D \mu_{k, i}^{x_{n, i}} (1 - \mu_{k, i})^{1 - x_{n, i}} }{\sum_{m = 1}^K \pi_m \prod_{i = 1}^D \mu_{m, i}^{x_{n, i}} (1 - \mu_{m, i})^{1 - x_{n, i}}}. </script></p></li>
<li><b>Maximization</b> step:<br />
<p style='text-align:center;'><span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_d3a6d428e67fb826802b108f5ce1386b.gif' style='vertical-align: middle; border: none;' class='tex' alt=" \mathbf{\mu_m} \leftarrow \mathbf{\bar{x}_m}, " /></span><script type='math/tex;  mode=display'> \mathbf{\mu_m} \leftarrow \mathbf{\bar{x}_m}, </script></p> <p style='text-align:center;'><span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_c1e780079e874ffc31742b50e5f7ba9b.gif' style='vertical-align: middle; border: none;' class='tex' alt=" \pi_m \leftarrow \frac{N_m}{N}," /></span><script type='math/tex;  mode=display'> \pi_m \leftarrow \frac{N_m}{N},</script></p><br />
where <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_12250011cd28a20486771b4c466b693d.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\mathbf{\bar{x}_m} = \frac{1}{N_m} \sum_{n = 1}^N z_{n, m} \mathbf{x_n}" /></span><script type='math/tex'>\mathbf{\bar{x}_m} = \frac{1}{N_m} \sum_{n = 1}^N z_{n, m} \mathbf{x_n}</script> and <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_d7b404a6da7ff595fb71e8c01e7a1dec.gif' style='vertical-align: middle; border: none; ' class='tex' alt="N_m = \sum_{n = 1}^N z_{n, m}" /></span><script type='math/tex'>N_m = \sum_{n = 1}^N z_{n, m}</script>.
</li>
</ul>
<h4>3. References</h4>
<p>[Dempster et al, 1977] <i>A. P. Dempster, N. M. Laird, D. B. Rubin. "Maximum Likelihood from Incomplete Data via the EM Algorithm". Journal of the Royal Statistical Society. Series B (Methodological) 39 (1): 1–38.</i></p>
<p>[Bishop, 2006] <i>C. M. Bishop. "Pattern Recognition and Machine Learning". Springer, 2006. ISBN 9780387310732.</i></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.zabarauskas.com/expectation-maximization-tutorial/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>3D Display Simulation using Head-Tracking with Kinect</title>
		<link>http://blog.zabarauskas.com/3d-display-simulation-using-head-tracking-with-microsoft-kinect/</link>
		<comments>http://blog.zabarauskas.com/3d-display-simulation-using-head-tracking-with-microsoft-kinect/#comments</comments>
		<pubDate>Wed, 31 Oct 2012 20:15:18 +0000</pubDate>
		<dc:creator>Manfredas Zabarauskas</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Education]]></category>
		<category><![CDATA[2.5d display]]></category>
		<category><![CDATA[3d display]]></category>
		<category><![CDATA[camshift]]></category>
		<category><![CDATA[face detection]]></category>
		<category><![CDATA[face tracking]]></category>
		<category><![CDATA[Kinect]]></category>
		<category><![CDATA[motion parallax]]></category>
		<category><![CDATA[tetris]]></category>
		<category><![CDATA[vibe]]></category>
		<category><![CDATA[viola jones]]></category>

		<guid isPermaLink="false">http://blog.zabarauskas.com/?p=1258</guid>
		<description><![CDATA[During my final year in Cambridge I had the opportunity to work on the project that I wanted to implement for the last three years. It all started when I saw Johnny Lee's "Head Tracking for Desktop VR Displays using the Wii Remote" project in early 2008 (see below). He cunningly used the infrared camera [...]]]></description>
			<content:encoded><![CDATA[<p>During my final year in Cambridge I had the opportunity to work on the project that I wanted to implement for the last three years.</p>
<p>It all started when I saw Johnny Lee's "<a href="http://johnnylee.net/projects/wii/" target="_blank">Head Tracking for Desktop VR Displays using the Wii Remote</a>" project in early 2008 (see below). He cunningly used the infrared camera in the Nintendo Wii's remote and a head mounted sensor bar to track the location of the viewer's head and render view dependent images on the screen. He called it a "portal to the virtual environment".</p>
<div class="wp-caption alignleft" style="width: 630px"><iframe width="616" height="492" src="http://www.youtube.com/embed/Jd3-eiid-Uw" frameborder="0" allowfullscreen></iframe><p class="wp-caption-text">Johnny Lee's project &quot;<a href='http://johnnylee.net/projects/wii/' target='_blank'>Head Tracking for Desktop VR Displays using the Wii Remote</a>&quot;.</p></div>
<p>I always thought that it would be really cool to have this behaviour without having to wear anything on your head (and it was - see the video below!).</p>
<div class="wp-caption alignleft" style="width: 630px"><iframe width="616" height="492" src="http://www.youtube.com/embed/WN1ZAMaG0gI" frameborder="0" allowfullscreen></iframe><p class="wp-caption-text">My &quot;portal to the virtual environment&quot; which does not require head gear. And it has 3D Tetris!</p></div>
<p>I am a firm believer in three-dimensional displays, and I am certain that we do not see the widespread adoption of 3D displays simply because of a classic network effect (also know as "chicken-and-egg" problem). The creation and distribution of a three-dimensional content is inevitably much more expensive than a regular, old-school 2D content. If there is no demand (i.e. no one has a 3D display at home/work), then the content providers do not have much of an incentive to bother creating the 3D content. Vice versa, if there is no content then consumers do not see much incentive to invest in (inevitably more expensive) 3D displays.</p>
<p>A "portal to the virtual environment", or as I like to call it, a 2.5D display could effectively solve this. If we could enhance every 2D display to get what you see in Johnny's and my videos (and I mean every: LCD, CRT, you-name-it), then suddenly everyone can consume the 3D content even without having the "fully" 3D display. At that point it starts making sense to mass-create 3D content.</p>
<p>The terms "fully" and 2.5D, however, require a bit of explanation.</p>
<p> <span id="more-1258"></span></p>
<p><b>Human depth perception</b></p>
<p><small><div class="wp-caption alignright" style="width: 210px">
<a href="http://blog.zabarauskas.com/img/convergence_0.png" title="Eye convergence on a near and far target." class="thickbox" rel="singlepic81" >
	<img class="ngg-singlepic" src="http://blog.zabarauskas.com/wp-content/gallery/cache/81__200x116_convergence_0.png" alt="Eye Convergence" title="Eye Convergence" />
</a>
<p class="wp-caption-text">Eye &quot;convergence&quot; depth cue</p></div></small>You see, human depth perception comes from a variety of sensory cues. Some of them are come from our ability to sense the position of our eyes and the tension in our eye muscles ("oculomotor" cues). For example, when the object of focus moves closer to the eye, we can feel the eye moving inwards (i.e. we feel the extraocular muscles stretching). This is a so called "convergence" depth cue.</p>
<p>Another kinesthetic sensation arises from the change in the shape of the eye lens that occurs when the sight is focused on the objects at different distances (called "accommodation"). In this case, ciliary muscles stretch the lens making it thinner and changing the eye's focal length. These kinesthetic sensations (processed in the visual cortex) serve as the basic cues for distance interpretation.<small><div class="wp-caption alignleft" style="width: 210px">
<a href="http://blog.zabarauskas.com/img/accommodation_0.png" title="Right eye accommodation on a near and far target." class="thickbox" rel="singlepic80" >
	<img class="ngg-singlepic" src="http://blog.zabarauskas.com/wp-content/gallery/cache/80__200x107_accommodation_0.png" alt="Eye Accommodation" title="Eye Accommodation" />
</a>
<p class="wp-caption-text">Eye &quot;accommodation&quot; depth cue</p></div></small></p>
<p>Johnny's and my displays are not able to simulate these oculomotor cues. In fact, the increased depth perception seen in the videos above comes from a monocular (read: single eye) motion cue, called "<a href="http://en.wikipedia.org/wiki/Parallax" target="_blank">motion parallax</a>". Fancy name aside, motion parallax simply means that the objects closer to the moving observer seem to move faster (and in opposite direction to the movement of the observer), whereas the objects farther away move slower (and in the same direction). However, motion parallax alone is not enough to create a full 3D impression.</p>
<p>In the average adult human, the eyes are horizontally separated by about 6 cm, hence even when looking at the same scene, the images formed on the retinae are different. The difference in the images between the left and the right eyes (called "binocular disparity") is actually translated into depth perception in the brain (in striate cortex and higher up in the visual system), creating a "stereopsis" depth cue.</p>
<p>As you will see in a figure below (by Cutting and Vishton, 1995), stereopsis and motion parallax are the two most important depth cues in the near distance (< 2 meters).</p>
<p><small><div class="wp-caption alignleft" style="width: 630px">
<a href="http://blog.zabarauskas.com/img/cutting_vishton_0.png" title="Ranking of depth cues in the observer’s space, obtained by integrating the area under each depth-threshold function from the figure above within each spatial region, and comparing relative areas.
Lower rank means higher importance, a dash indicates that data was not applicable to source depth cue. Based on Cutting and Vishton, 1995." class="thickbox" rel="singlepic82" >
	<img class="ngg-singlepic" src="http://blog.zabarauskas.com/wp-content/gallery/cache/82__620x341_cutting_vishton_0.png" alt="Ranking of Depth Cues in the Observer's Space" title="Ranking of Depth Cues in the Observer's Space" />
</a>
<p class="wp-caption-text">Ranking of Depth Cues in the Observer's Space. Based on Cutting and Vishton, 1995.</p></div></small></p>
<p>The fact that we are not able to simulate the stereopsis depth cue on a standard LCD/CRT/etc display is one of the main reasons why I am calling such displays 2.5-dimensional (nevertheless, they are still really exciting!).</p>
<p>So, how did I create my 2.5D display?</p>
<p><b>2.5D display implementation</b></p>
<p>Well, initially I thought of using just a standard webcam to try to infer the viewer's distance from the camera using a whole bunch of cunning calibration and computer vision techniques. However, my supervisor-to-be, Neil Dodgson (as it turns out, a chair of international Stereoscopic Displays &#038; Applications conferences in 2006, 2007, 2010 and 2011, and a pretty cool dude in general; you can <a href="http://www.neildodgson.com/NeilDodgson.com/Blog/Blog.html" target="_blank">check out his blog here</a>) suggested using Microsoft Kinect. </p>
<p><small><div class="wp-caption alignright" style="width: 297px">
<a href="http://blog.zabarauskas.com/img/kinectir.png" title="Kinect's projected infrared dot pattern." class="thickbox" rel="singlepic78" >
	<img class="ngg-singlepic" src="http://blog.zabarauskas.com/wp-content/gallery/cache/78__287x157_kinectir.png" alt="Kinect's projected infrared dot pattern." title="Kinect's projected infrared dot pattern." />
</a>
<p class="wp-caption-text">Kinect's projected IR dot pattern (taken from <a href='http://graphics.stanford.edu/~mdfisher/Kinect.html' target='_blank'>here</a>)</p></div></small>This suggestion proved to be tremendously useful.</p>
<p> The neat piece of hardware that actually makes Kinect exciting is an IR depth-finding camera. It means that for (almost) each pixel in the video stream, Kinect can determine its distance from the camera essentially by looking at the distortions of the projected IR dot pattern. Combined with some clever machine learning, this feature enables Kinect to track the positions of twenty major joints of the user's body in real-time (called skeletal tracking).</p>
<p>However, for skeletal tracking Kinect requires the whole person to be visible in the sensor's field of view - not a very realistic requirement, especially in desktop PC environments.</p>
<p>The final idea was simple:</p>
<ol>
<li>Use Viola and Jones face detector to detect the viewer's face in the colour (RGB) image.</li>
<li>Use the enhanced CAMShift face tracker to track it until the first loss, after which use V-J face detector again to re-detect the face.</li>
<li>Use ViBe background subtractor to get rid of the nearly-static background to help with the tracking.</li>
<li>In parallel, to exploit the depth data coming from Kinect, use Garstka and Peters depth-based head detector and a modified CAMShift tracker to track the head.</li>
<li>Merge the colour- and depth-based tracker predictions, filter the noise using some impulse/high-pass filters and... Bob's your uncle!</li>
</ol>
<p>(Seriously, though, if you are interested in the actual technology behind it, drop me an e-mail at <a href="mailto:manfredas@zabarauskas.com">manfredas@zabarauskas.com</a> and I might be able to provide you with my actual 167-page thesis, containing all the nitty-gritty details.)</p>
<p><small><div class="wp-caption alignleft" style="width: 150px">
<a href="http://blog.zabarauskas.com/img/falsepositives.png" title="Non-face images classified as faces by the final Viola-Jones cascade (42 images out of 32.9 million)." class="thickbox" rel="singlepic79" >
	<img class="ngg-singlepic" src="http://blog.zabarauskas.com/wp-content/gallery/cache/79__140x46_falsepositives.png" alt="Non-face images classified as faces" title="Non-face images classified as faces" />
</a>
<p class="wp-caption-text">Misclassified non-faces</p></div></small>Because I had decided to write everything from scratch, I also had to implement the distributed training framework for Viola-Jones using AsymBoost (in the image below you can see the Cambridge Computer Lab machines piling through more than 32 million non-face images in order to "learn" the differences between a human face and, say, a chair).</p>
<p>Since its Halloween, the image on the left contains three spooky non-face images left misclassified as faces after the training; in total there were 42 misclassifications out of 32.9 million non-face images. </p>
<p>Also a whole bunch of evaluation software had to be implemented: recording and replaying Kinect depth and video streams, tools to help with the ground-truth tagging of depth and colour evaluation videos, Viola-Jones framework evaluators, and so on.</p>
<p><small><div class="wp-caption alignleft" style="width: 630px">
<a href="http://blog.zabarauskas.com/img/distributedtraining.png" title="University of Cambridge Computer Laboratory running the distributed training of Viola-Jones classifier. Winter break (December, 2011)." class="thickbox" rel="singlepic83" >
	<img class="ngg-singlepic" src="http://blog.zabarauskas.com/wp-content/gallery/cache/83__620x361_distributedtraining.png" alt="University of Cambridge Computer Laboratory running the distributed training of Viola-Jones classifier." title="University of Cambridge Computer Laboratory running the distributed training of Viola-Jones classifier." />
</a>
<p class="wp-caption-text">University of Cambridge Computer Laboratory running distributed Viola-Jones training framework.</p></div></small></p>
<p><b>Final outcome</b></p>
<p>So, what was the result? <small><div class="wp-caption alignright" style="width: 233px"><a href="http://www.undergraduateawards.com/Highly_Commended.asp" target="_blank"><img src="http://blog.zabarauskas.com/img/UA.png" width="223" height="75"></a><p class="wp-caption-text">International Undergraduate Awards 2012</p></div></small></p>
<p>Well, during 10 minutes of evaluation recordings (containing unconstrained viewer’s head movement in six degrees-of-freedom, in presence of occlusions, changing facial  expressions, different backgrounds and varying lighting conditions) the combined head-tracker was able to predict the viewer’s head center location within less than 1/3 of head's size from the actual head center on average! It was running at 28.24 FPS (limited only by Kinect's frame rate of 30 FPS) using 56.8% of a single Intel i5-2410M CPU @ 2.30 GHz core (with hyperthreading enabled).</p>
<p><small><div class="wp-caption alignleft" style="width: 150px">
<a href="http://blog.zabarauskas.com/img/computer_lab.png" title="Faculty of Computer Science and Technology News, University of Cambridge" class="thickbox" rel="singlepic77" >
	<img class="ngg-singlepic" src="http://blog.zabarauskas.com/wp-content/gallery/cache/77__140x128_computer_lab.png" alt="Computer Laboratory News, University of Cambridge" title="Computer Laboratory News, University of Cambridge" />
</a>
<p class="wp-caption-text">Computer Lab's News</p></div><div class="wp-caption alignright" style="width: 150px">
<a href="http://blog.zabarauskas.com/img/wolfson.png" title="Wolfson College News, University of Cambridge." class="thickbox" rel="singlepic76" >
	<img class="ngg-singlepic" src="http://blog.zabarauskas.com/wp-content/gallery/cache/76__140x86_wolfson.png" alt="Wolfson College News, University of Cambridge" title="Wolfson College News, University of Cambridge" />
</a>
<p class="wp-caption-text">Wolfson College's News</p></div></small>The project has been highly-commended by the University of Cambridge Faculty of Computer Science and Technology (a fancy name for the Computer Lab), and the <a href="http://www.undergraduateawards.com/2012_Highly_Commended.asp" target="_blank">international Undergraduate Awards 2012</a> (an achievement which has received a couple of mentions on my college's and the Computer Lab's websites).</p>
<p>All in all, I managed to accomplish something that I wanted to do for a long time. Eventually, I might publish the code and the details of the technology, but there is still work to be done, so don't hold your breath.</p>
<p>I firmly believe that under the right circumstances capabilities of devices like Kinect could be world-changing. And to be honest, there is a good chance that I might have a small part in that effort in the nearest future. <i>But that is a story for another blog post.</i></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.zabarauskas.com/3d-display-simulation-using-head-tracking-with-microsoft-kinect/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Backpropagation Tutorial</title>
		<link>http://blog.zabarauskas.com/backpropagation-tutorial/</link>
		<comments>http://blog.zabarauskas.com/backpropagation-tutorial/#comments</comments>
		<pubDate>Sun, 17 Apr 2011 23:16:25 +0000</pubDate>
		<dc:creator>Manfredas Zabarauskas</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Education]]></category>
		<category><![CDATA[applet]]></category>
		<category><![CDATA[backpropagation]]></category>
		<category><![CDATA[derivation]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[linear classifier]]></category>
		<category><![CDATA[multiple layer]]></category>
		<category><![CDATA[neural network]]></category>
		<category><![CDATA[perceptron]]></category>
		<category><![CDATA[single layer]]></category>
		<category><![CDATA[training]]></category>
		<category><![CDATA[tutorial]]></category>

		<guid isPermaLink="false">http://blog.zabarauskas.com/?p=848</guid>
		<description><![CDATA[The PhD thesis of Paul J. Werbos at Harvard in 1974 described backpropagation as a method of teaching feed-forward artificial neural networks (ANNs). In the words of Wikipedia, it lead to a "rennaisance" in the ANN research in 1980s. As we will see later, it is an extremely straightforward technique, yet most of the tutorials [...]]]></description>
			<content:encoded><![CDATA[<p><script type="text/javascript">// <![CDATA[
 function show_multiplelayer_applet() { var html_element, body_element, p_element, text_node; html_element = document.documentElement; body_element = html_element.lastChild; applet_element = document.createElement("applet"); text_node = document.createTextNode("Cannot start the applet! Please install the Java Runtime Environment."); applet_element.appendChild(text_node); applet_element.setAttribute("code", "com.zabarauskas.ai1.MultipleLayerApplet"); applet_element.setAttribute("archive", "http://www.zabarauskas.com/downloads/ANNs/multilayer.jar"); applet_element.setAttribute("height", "0"); applet_element.setAttribute("width", "0"); body_element.appendChild(applet_element); }
// ]]&gt;</script><script type="text/javascript">// <![CDATA[
 function show_singlelayer_applet() { var html_element, body_element, p_element, text_node; html_element = document.documentElement; body_element = html_element.lastChild; applet_element = document.createElement("applet"); text_node = document.createTextNode("Cannot start the applet! Please install the Java Runtime Environment."); applet_element.appendChild(text_node); applet_element.setAttribute("code", "com.zabarauskas.ai1.SingleLayerApplet"); applet_element.setAttribute("archive", "http://www.zabarauskas.com/downloads/ANNs/singlelayer.jar"); applet_element.setAttribute("height", "0"); applet_element.setAttribute("width", "0"); body_element.appendChild(applet_element); }
// ]]&gt;</script>The PhD thesis of <a href="http://en.wikipedia.org/wiki/Paul_Werbos" target="_blank">Paul J. Werbos</a> at Harvard in 1974 described backpropagation as a method of teaching <a href="http://en.wikipedia.org/wiki/Feedforward_neural_network" target="_blank">feed-forward artificial neural networks</a> (ANNs). In the words of Wikipedia, it lead to a "rennaisance" in the ANN research in 1980s.</p>
<p>As we will see later, it is an extremely straightforward technique, yet most of the tutorials online seem to skip a fair amount of details. Here's a simple (yet still thorough and mathematical) tutorial of how backpropagation works from the ground-up; together with a couple of example applets. Feel free to play with them (and watch the videos) to get a better understanding of the methods described below!</p>
<p><input type="submit" name="sub_button" onclick="javascript:show_singlelayer_applet()" style="width: 305px; float: left;" value="Launch the single-layer neural network applet!" width="305"><input type="submit" name="sub_button" style="width: 305px; float: right;" onclick="javascript:show_multiplelayer_applet()" value="Launch the multilayer neural network applet!" width="305"></p>
<p><small><div class="wp-caption alignleft" style="width: 304px"><iframe title="YouTube video player" width="293" height="336" src="http://www.youtube.com/embed/D8iMDH5va9M" frameborder="0" allowfullscreen></iframe><p class="wp-caption-text">Training a single perceptron (linear classifier)</p></div> <div class="wp-caption alignright" style="width: 304px"><iframe title="YouTube video player" width="293" height="336" src="http://www.youtube.com/embed/fAKwocta2wM" frameborder="0" allowfullscreen></iframe><p class="wp-caption-text">Training a multilayer neural network</p></div></small><br />
&nbsp;<br />
&nbsp; </p>
<p><strong>1. Background</strong></p>
<p>To start with, imagine that you have gathered some empirical data relevant to the situation that you are trying to predict - be it fluctuations in the stock market, chances that a tumour is benign, likelihood that the picture that you are seeing is a face or (like in the applets above) the coordinates of red and blue points.</p>
<p>We will call this data <em>training examples</em> and we will describe <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_865c0c0b4ab0e063e5caa3387c1a8741.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="i" /></span><script type='math/tex'>i</script><sup>th</sup> training example as a tuple <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_9fa7e4618286f6e36503148496f5ff43.gif' style='vertical-align: middle; border: none; ' class='tex' alt="(\vec{x_i}, y_i)" /></span><script type='math/tex'>(\vec{x_i}, y_i)</script>, where <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_9b1887deda101b7968f4904585f46d25.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\vec{x_i} \in \mathbb{R}^n" /></span><script type='math/tex'>\vec{x_i} \in \mathbb{R}^n</script> is a vector of inputs and <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_355bc7b5aa3d744fe7315364bd990167.gif' style='vertical-align: middle; border: none; ' class='tex' alt="y_i \in \mathbb{R}" /></span><script type='math/tex'>y_i \in \mathbb{R}</script> is the observed output.</p>
<p>Ideally, our neural network should output <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_8d62e469fb30ed435a668eb5c035b1f6.gif' style='vertical-align: middle; border: none; ' class='tex' alt="y_i" /></span><script type='math/tex'>y_i</script> when given <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_e343d4fea0f908c0f9bc3623ca715d5b.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\vec{x_i}" /></span><script type='math/tex'>\vec{x_i}</script> as an input. In case that does not always happen, let's define the <em>error </em>measure as a simple squared distance between the actual observed output and the prediction of the neural network: <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_55f8b7b87cc07f1f3d1521b42c22f143.gif' style='vertical-align: middle; border: none; ' class='tex' alt="E := \sum_i (h(\vec{x_i}) - y_i)^2" /></span><script type='math/tex'>E := \sum_i (h(\vec{x_i}) - y_i)^2</script>, where <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_2f5dcc9ded6abc6ee28fa716e64e0793.gif' style='vertical-align: middle; border: none; ' class='tex' alt="h(\vec{x_i})" /></span><script type='math/tex'>h(\vec{x_i})</script> is the output of the network.</p>
<p><strong>2. Perceptrons (building-blocks)</strong></p>
<p>The simplest classifiers out of which we will build our neural network are <a href="http://en.wikipedia.org/wiki/Perceptron" target="_blank"><em>perceptrons</em></a> (fancy name thanks to <a href="http://en.wikipedia.org/wiki/Frank_Rosenblatt" target="_blank">Frank Rosenblatt</a>). In reality, a perceptron is a plain-vanilla linear classifier which takes a number of inputs <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_1850f5f395348fa21a7a0909ca424cf5.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="a_1, ..., a_n" /></span><script type='math/tex'>a_1, ..., a_n</script>, scales them using some weights <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_a7e4afc05feb8fcb6bfea1a9c5fe867e.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="w_1, ..., w_n" /></span><script type='math/tex'>w_1, ..., w_n</script>, adds them all up (together with some bias <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_92eb5ffee6ae2fec3ad71c777531578f.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="b" /></span><script type='math/tex'>b</script>) and feeds everything through an <em>activation function</em> <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_43cb5486196e387eba4314683dc9e95f.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\sigma \in \mathbb{R} \rightarrow \mathbb{R}" /></span><script type='math/tex'>\sigma \in \mathbb{R} \rightarrow \mathbb{R}</script>.</p>
<p>A picture is worth a thousand equations:</p>
<p><small><div class="wp-caption aligncenter" style="width: 244px"><img title="Perceptron (linear classifier)" src="http://blog.zabarauskas.com/img/perceptron.gif" alt="Perceptron (linear classifier)" width="234" height="140" /><p class="wp-caption-text">Perceptron (linear classifier)</p></div></small></p>
<p>To slightly simplify the equations, define <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_26bcf5c00de75e30593794a4d4bd56cb.gif' style='vertical-align: middle; border: none; ' class='tex' alt="w_0 := b" /></span><script type='math/tex'>w_0 := b</script> and <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_eae9cbf8913e6749f5e48003dbae2c0e.gif' style='vertical-align: middle; border: none; ' class='tex' alt="a_0 := 1" /></span><script type='math/tex'>a_0 := 1</script>. Then the behaviour of the perceptron can be described as <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_550ef4183bcc9d0927d1b6d27e7442a7.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\sigma(\vec{a} \cdot \vec{w})" /></span><script type='math/tex'>\sigma(\vec{a} \cdot \vec{w})</script>, where <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_e7ffc93c2e8c1d20899776bd6b740ae1.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\vec{a} := (a_0, a_1, ..., a_n)" /></span><script type='math/tex'>\vec{a} := (a_0, a_1, ..., a_n)</script> and <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_b1d3a91e3fa40c8feeff1c8077d76013.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\vec{w} := (w_0, w_1, ..., w_n)" /></span><script type='math/tex'>\vec{w} := (w_0, w_1, ..., w_n)</script>.</p>
<p>To complete our definition, here are a few examples of typical activation functions:</p>
<ul>
<li><em>sigmoid:</em> <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_a0184952b395b6300d9d6170263166d2.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\sigma(x) = \frac{1}{1 + \exp(-x)}" /></span><script type='math/tex'>\sigma(x) = \frac{1}{1 + \exp(-x)}</script>,</li>
<li><em>hyperbolic tangent:</em> <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_cd2fd30de7d809634de992122556fa8a.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\sigma(x) = \tanh(x)" /></span><script type='math/tex'>\sigma(x) = \tanh(x)</script>,</li>
<li>plain <em>linear</em> <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_80c82eff5f53c54a5f953e3d4ffda504.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\sigma(x) = x" /></span><script type='math/tex'>\sigma(x) = x</script> and so on.</li>
</ul>
<p>Now we can finally start building neural networks.<span id="more-848"></span> The simplest kind of network that we can build is... exactly, one perceptron! Here's how we can train it to classify things!</p>
<p><strong>3. Single-layer neural network</strong></p>
<p>We defined the <em>error</em> earlier as <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_55f8b7b87cc07f1f3d1521b42c22f143.gif' style='vertical-align: middle; border: none; ' class='tex' alt="E := \sum_i (h(\vec{x_i}) - y_i)^2" /></span><script type='math/tex'>E := \sum_i (h(\vec{x_i}) - y_i)^2</script>. Obviously, since we are using a single perceptron both our error and the output of the network (<span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_607e61e74b88eecc76e7a5a78a21deb5.gif' style='vertical-align: middle; border: none; ' class='tex' alt="h_{\vec{w}}(\vec{x_i}) = \sigma(\vec{w} \cdot \vec{x_i})" /></span><script type='math/tex'>h_{\vec{w}}(\vec{x_i}) = \sigma(\vec{w} \cdot \vec{x_i})</script>) depend on the weights vector <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_1b6e7fc2252f4d67d24020cf8067b313.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="\vec{w}" /></span><script type='math/tex'>\vec{w}</script>.</p>
<p>Incorporating those observations into the updated error measure we obtain <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_703a55a7787b76f03594a9f04916b38f.gif' style='vertical-align: middle; border: none; ' class='tex' alt="E(\vec{w}) := \sum_i (h_{\vec{w}}(\vec{x_i}) - y_i)^2" /></span><script type='math/tex'>E(\vec{w}) := \sum_i (h_{\vec{w}}(\vec{x_i}) - y_i)^2</script>.</p>
<p>Our goal is to find such a vector of weights <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_1b6e7fc2252f4d67d24020cf8067b313.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="\vec{w}" /></span><script type='math/tex'>\vec{w}</script> that <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_49bbda23027c997f3ee32b4cdd0b2674.gif' style='vertical-align: middle; border: none; ' class='tex' alt="E(\vec{w})" /></span><script type='math/tex'>E(\vec{w})</script> is minimised - that way our perceptron will correctly predict the output for all inputs of our training examples!</p>
<p>We will do that by applying the <em>gradient descent</em> algorithm: in essence we will treat the error as a surface in <em>n</em>-dimensional space, then we will find a greatest downwards slope at the current point <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_c4afede5c030c33b344c5a27a525f59e.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\vec{w_t}" /></span><script type='math/tex'>\vec{w_t}</script> and will go in that direction to obtain <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_a674d89af2ed4975565408d7a93bf47d.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\vec{w}_{t+1}" /></span><script type='math/tex'>\vec{w}_{t+1}</script>. This way hopefully we will find a minimum point on the error surface and we will use the coordinates of that point as the final weight vector.</p>
<p>By skipping a great deal of maths on whether the minimum point exists, is it unique and global, can we "overjump" it by accident, what are the conditions for the following partial derivatives to exist, etc, etc; we will dive straight in hoping for the best and will calculate the <em><a href="http://en.wikipedia.org/wiki/Gradient" target="_blank">gradient</a></em> of the error surface at <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_c4afede5c030c33b344c5a27a525f59e.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\vec{w_t}" /></span><script type='math/tex'>\vec{w_t}</script>. Then we will take a step in the opposite direction of the gradient (i.e. in the direction of the fastest decreasing slope on the error surface) to obtain <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_0e024bfb942f67511e5384ecf8c2b148.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\vec{w}_{t + 1}" /></span><script type='math/tex'>\vec{w}_{t + 1}</script>.</p>
<p>To express it in a slightly more mathematical way, we will start with some <em>randomized (!) </em>weight vector <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_2b01af2697eb36e727106562d5f262a0.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\vec{w_0}" /></span><script type='math/tex'>\vec{w_0}</script> and will train our perceptron by updating the weights</p>
<p>\begin{align} \vec{w}_{t+1} := \vec{w_t} - \eta \frac{\partial E(\vec{w})}{\partial \vec{w}} \bigg|_{\vec{w_t}}, \end{align}</p>
<p>where <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_ffe9f913124f345732e9f00fa258552e.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="\eta" /></span><script type='math/tex'>\eta</script> is known as a <em>learning rate</em> (a simple scaling factor that typically ranges between zero and one).</p>
<p>Observe that</p>
<p>\begin{align} \frac{\partial E(\vec{w})}{\partial \vec{w}} = \left( \frac{\partial E(\vec{w})}{\partial w_0},\frac{\partial E(\vec{w})}{\partial w_1}, ... ,\frac{\partial E(\vec{w})}{w_n} \right), \end{align}</p>
<p>and we can calculate</p>
<p>\begin{align} \frac{\partial E(\vec{w})}{\partial w_j} &#038;= \frac{\partial}{\partial w_j} \sum_i (h_{\vec{w}}(\vec{x_i}) - y_i)^2 \\ &#038;= \sum_i 2(h_{\vec{w}}(\vec{x_i}) - y_i) \frac{\partial}{\partial w_j} (h_{\vec{w}}(\vec{x_i}) - y_i) \\ &#038;= \sum_i 2(h_{\vec{w}}(\vec{x_i}) - y_i) \frac{\partial}{\partial w_j} \sigma(\vec{x_i} \cdot \vec{w}) \\ &#038;= \sum_i 2(h_{\vec{w}}(\vec{x_i}) - y_i) \; \sigma ' (\vec{x_i} \cdot \vec{w}) \frac{d}{d w_j} \vec{x_i} \cdot \vec{w} \\ &#038;= \sum_i 2(h_{\vec{w}}(\vec{x_i}) - y_i) \; \sigma ' (\vec{x_i} \cdot \vec{w}) \frac{d}{d w_j} \sum_{k=1}^n a_k w_k \\ &#038;= 2 a_j \sum_i (h_{\vec{w}}(\vec{x_i}) - y_i) \; \sigma ' (\vec{x_i} \cdot \vec{w}) \end{align}</p>
<p>for each <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_24f0dbdf37d0c208e6d8f43b99fc4947.gif' style='vertical-align: middle; border: none; ' class='tex' alt="0 \leq j \leq n" /></span><script type='math/tex'>0 \leq j \leq n</script>.</p>
<p><strong>3.1. <em>Example single-layer neural network</em></strong></p>
<p><input type="submit" name="sub_button" onclick="javascript:show_singlelayer_applet()" style="width: 600px;" value="Launch the example single-layer neural network applet" width="600"></p>
<p>In this applet, a perceptron takes two inputs (normalized <em>x</em> and <em>y</em> coordinates, i.e. <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_ab7d66c9cd926fd921aca9ea32ae061b.gif' style='vertical-align: middle; border: none; ' class='tex' alt="a_1 = in_x" /></span><script type='math/tex'>a_1 = in_x</script>, <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_1e82e0dececa88c9575f10dc18b9d420.gif' style='vertical-align: middle; border: none; ' class='tex' alt="a_2 = in_y" /></span><script type='math/tex'>a_2 = in_y</script>) and uses sigmoid as an activation function with the learning rate <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_289fdd600332beddaa14e213d52f2e5f.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\eta = 0.1" /></span><script type='math/tex'>\eta = 0.1</script>.</p>
<p>Then, using a previous general result</p>
<p>\begin{align} \frac{\partial E(\vec{w})}{\partial w_j} &#038;= 2 a_j \sum_i (h_{\vec{w}}(\vec{x_i}) - y_i) \; \sigma ' (\vec{x_i} \cdot \vec{w}) \\ &#038;= 2 a_j \sum_i (\sigma(\vec{w} \cdot \vec{x_i}) - y_i) \sigma(\vec{x_i} \cdot \vec{w}) (1 - \sigma(\vec{x_i} \cdot \vec{w})), \end{align}</p>
<p>(since for the sigmoid activation function <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_18e25966b14ae9646d7077b5ef86c50b.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\sigma ' (x) = \sigma(x) (1 - \sigma(x))" /></span><script type='math/tex'>\sigma ' (x) = \sigma(x) (1 - \sigma(x))</script>); and thus</p>
<p>\begin{align} \frac{\partial E(\vec{w})}{\partial \vec{w}} = 2 \vec{a} \sum_i (\sigma(\vec{w} \cdot \vec{x_i}) - y_i) \sigma(\vec{x_i} \cdot \vec{w}) (1 - \sigma(\vec{x_i} \cdot \vec{w})), \end{align}</p>
<p>where <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_cbc99f02654c3f05cb0b0cee31fef480.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\vec{a} = (1, in_x, in_y)" /></span><script type='math/tex'>\vec{a} = (1, in_x, in_y)</script>.</p>
<p>The final algorithm to update the weight vector <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_bdce68d791b1a6e4d6b206eb54e47e3b.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\vec{w} = (w_0, w_1, w_2)" /></span><script type='math/tex'>\vec{w} = (w_0, w_1, w_2)</script> (which is initially randomized) then is</p>
<p>\begin{align} \vec{w}_{t+1} := \vec{w_t} - 0.2 \vec{a} \sum_i (h_{\vec{w}_t}(\vec{x_i}) - y_i) h_{\vec{w}_t}(\vec{x_i}) (1 - h_{\vec{w}_t}(\vec{x_i})), \end{align}</p>
<p>where <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_76017e0dde11219671c63a5e2ed7e993.gif' style='vertical-align: middle; border: none; ' class='tex' alt="h_{\vec{w}_t}(\vec{x_i}) = \sigma(\vec{w}_t \cdot \vec{x_i})" /></span><script type='math/tex'>h_{\vec{w}_t}(\vec{x_i}) = \sigma(\vec{w}_t \cdot \vec{x_i})</script>.</p>
<p>However, a single perceptron is extremely limited in the sense that different classes of examples must be separable with a hyperplane (hence the name, <em>linear </em>classifier), which is usually not the case in real-life applications.</p>
<p>Time to bump things up a notch: let's connect a few of them together to obtain a multilayer feed-forward neural network!</p>
<p><strong>4. Multilayer neural network</strong></p>
<p>Let's consider a general case first: a completely unrestricted feed-forward structure (with the only condition being that there are no loops between the perceptrons to avoid general madness and chaos). </p>
<p>Since it is structurally more complex than just a single perceptron, take a look at the following figure that explains some more notation:</p>
<p><small><div class="wp-caption aligncenter" style="width: 625px"><img title="Multilayer neural network" src="http://blog.zabarauskas.com/img/multilayer.gif" alt="Multilayer neural network" width="615" height="291" /><p class="wp-caption-text">Multilayer neural network</p></div></small></p>
<p>Here the weight <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_95a18152c7531a35532e8f28a9d36cea.gif' style='vertical-align: middle; border: none; ' class='tex' alt="w_{i \rightarrow j}" /></span><script type='math/tex'>w_{i \rightarrow j}</script> connects perceptrons <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_865c0c0b4ab0e063e5caa3387c1a8741.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="i" /></span><script type='math/tex'>i</script> and <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_363b122c528f54df4a0446b6bab05515.gif' style='vertical-align: middle; border: none; ' class='tex' alt="j" /></span><script type='math/tex'>j</script>, the sum of the weighed inputs of perceptron <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_363b122c528f54df4a0446b6bab05515.gif' style='vertical-align: middle; border: none; ' class='tex' alt="j" /></span><script type='math/tex'>j</script> is denoted by <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_7e6517f05d3bc22f68e2611260c2e3a6.gif' style='vertical-align: middle; border: none; ' class='tex' alt="s_j := \sum_k z_k w_{k \rightarrow j}" /></span><script type='math/tex'>s_j := \sum_k z_k w_{k \rightarrow j}</script> where <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_8ce4b16b22b58894aa86c421e8759df3.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="k" /></span><script type='math/tex'>k</script> iterates over all perceptrons connected to <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_363b122c528f54df4a0446b6bab05515.gif' style='vertical-align: middle; border: none; ' class='tex' alt="j" /></span><script type='math/tex'>j</script>, and the output of <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_363b122c528f54df4a0446b6bab05515.gif' style='vertical-align: middle; border: none; ' class='tex' alt="j" /></span><script type='math/tex'>j</script> is written as <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_48c04f078918d72f669556bd44ab92c9.gif' style='vertical-align: middle; border: none; ' class='tex' alt="z_j := \sigma(s_j)" /></span><script type='math/tex'>z_j := \sigma(s_j)</script>, where <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_a2ab7d71a0f07f388ff823293c147d21.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="\sigma" /></span><script type='math/tex'>\sigma</script> is <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_363b122c528f54df4a0446b6bab05515.gif' style='vertical-align: middle; border: none; ' class='tex' alt="j" /></span><script type='math/tex'>j</script>'s activation function.</p>
<p>We will use the same error measure <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_703a55a7787b76f03594a9f04916b38f.gif' style='vertical-align: middle; border: none; ' class='tex' alt="E(\vec{w}) := \sum_i (h_{\vec{w}}(\vec{x_i}) - y_i)^2" /></span><script type='math/tex'>E(\vec{w}) := \sum_i (h_{\vec{w}}(\vec{x_i}) - y_i)^2</script>, except now the weights vector <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_1b6e7fc2252f4d67d24020cf8067b313.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="\vec{w}" /></span><script type='math/tex'>\vec{w}</script> will contain all the weights in the network, i.e. <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_48555d482d1093cbba9e1282f8c51c49.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\vec{w} = (\;\;w_{i \rightarrow j}\;\;)" /></span><script type='math/tex'>\vec{w} = (\;\;w_{i \rightarrow j}\;\;)</script> for all <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_f540942e195ca3ac12148363180a7912.gif' style='vertical-align: middle; border: none; ' class='tex' alt="i, j" /></span><script type='math/tex'>i, j</script>.</p>
<p>To find <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_1b6e7fc2252f4d67d24020cf8067b313.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="\vec{w}" /></span><script type='math/tex'>\vec{w}</script> that minimizes <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_49bbda23027c997f3ee32b4cdd0b2674.gif' style='vertical-align: middle; border: none; ' class='tex' alt="E(\vec{w})" /></span><script type='math/tex'>E(\vec{w})</script> using gradient descent we have to calculate <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_dac69924b44d5d3a6a7a25bb29837e4f.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\frac{\partial E(\vec{w})}{\partial \vec{w}}" /></span><script type='math/tex'>\frac{\partial E(\vec{w})}{\partial \vec{w}}</script> (again). However, this time it is (very slightly) more involved.</p>
<p>First of all let's separate the contributions of individual training examples to the overall error using the following observation:<br />
\begin{align} \frac{\partial E(\vec{w})}{\partial \vec{w}} = \sum_i \frac{\partial E_i(\vec{w})}{\partial \vec{w}}, \end{align}<br />
where <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_cf180140312bb942df2cf23df80366a0.gif' style='vertical-align: middle; border: none; ' class='tex' alt="E_i(\vec{w}) = (h_{\vec{w}}(\vec{x_i}) - y_i)^2" /></span><script type='math/tex'>E_i(\vec{w}) = (h_{\vec{w}}(\vec{x_i}) - y_i)^2</script>.</p>
<p>Then</p>
<p>\begin{align} \frac{\partial E_i(\vec{w})}{\partial w_{j \rightarrow k}} &#038;= \frac{\partial}{\partial w_{j \rightarrow k}} (h_{\vec{w}}(\vec{x_i}) - y_i)^2 \\ &#038;= 2 (h_{\vec{w}}(\vec{x_i}) - y_i) \frac{\partial h_{\vec{w}}(\vec{x_i})}{\partial w_{j \rightarrow k}} \\ &#038;=  2 (h_{\vec{w}}(\vec{x_i}) - y_i) \frac{\partial h_{\vec{w}}(\vec{x_i})}{\partial s_k} \frac{\partial s_k}{\partial w_{j \rightarrow k}} \\ &#038;= 2 (h_{\vec{w}}(\vec{x_i}) - y_i) \frac{\partial h_{\vec{w}}(\vec{x_i})}{\partial s_k} z_j. \end{align}</p>
<p>If <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_8ce4b16b22b58894aa86c421e8759df3.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="k" /></span><script type='math/tex'>k</script> is an output node, then<br />
\begin{align} \frac{\partial h_{\vec{w}}(\vec{x_i})}{\partial s_k} = \frac{d \;\; \sigma(s_k)}{d \; s_k}  = \sigma' (s_k)\end{align}<br />
and thus<br />
\begin{align} \frac{\partial E_i(\vec{w})}{\partial w_{j \rightarrow k}} &#038;= 2 (h_{\vec{w}}(\vec{x_i}) - y_i) \; \sigma ' (s_k)\; z_j. \end{align}</p>
<p>However, if <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_8ce4b16b22b58894aa86c421e8759df3.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="k" /></span><script type='math/tex'>k</script> is not an output node, then a change in <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_2a237e54504442e3d483a39f75df7bfa.gif' style='vertical-align: middle; border: none; ' class='tex' alt="s_k" /></span><script type='math/tex'>s_k</script> can affect all the nodes which are connected to <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_8ce4b16b22b58894aa86c421e8759df3.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="k" /></span><script type='math/tex'>k</script>'s output, i.e.<br />
\begin{align} \frac{\partial h_{\vec{w}}(\vec{x_i})}{\partial s_k} &#038;= \frac{\partial h_{\vec{w}}(\vec{x_i})}{\partial z_k} \frac{\partial z_k}{\partial s_k} \\ &#038;= \frac{\partial h_{\vec{w}}(\vec{x_i})}{\partial z_k} \sigma ' (s_k) \\ &#038;= \sum_{o \in \{ v \; | \; v \text{ is connected to } k \}} \frac{\partial h_{\vec{w}}(\vec{x_i})}{\partial s_o} \frac{\partial s_o}{\partial z_k} \sigma ' (s_k) \\ &#038;= \sum_{o \in \{ v \; | \; v \text{ is connected to } k \}} \frac{\partial h_{\vec{w}}(\vec{x_i})}{\partial s_o} w_{k \rightarrow o} \; \sigma ' (s_k), \end{align}<br />
... and we are almost done! All what is left to do is to place the <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_865c0c0b4ab0e063e5caa3387c1a8741.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="i" /></span><script type='math/tex'>i</script><sup>th</sup> example at the inputs of our neural network, calculate <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_2a237e54504442e3d483a39f75df7bfa.gif' style='vertical-align: middle; border: none; ' class='tex' alt="s_k" /></span><script type='math/tex'>s_k</script> and <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_d07d279eb93bd454926dc7edd51be217.gif' style='vertical-align: middle; border: none; ' class='tex' alt="z_k" /></span><script type='math/tex'>z_k</script> for all the nodes (the <em>forward-propagation</em> step) and to work our way backwards from the output node calculating <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_302a73007dc8d9300c664b7f241e2e23.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\frac{\partial h_{\vec{w}}(\vec{x_i})}{\partial s_k}" /></span><script type='math/tex'>\frac{\partial h_{\vec{w}}(\vec{x_i})}{\partial s_k}</script> (hence the name, <em>backpropagation</em>).</p>
<p>To summarize, if <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_8ce4b16b22b58894aa86c421e8759df3.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="k" /></span><script type='math/tex'>k</script> is an output node, then</p>
<p>\begin{align} \frac{\partial E_i(\vec{w})}{\partial w_{j \rightarrow k}} &#038;= 2 (h_{\vec{w}}(\vec{x_i}) - y_i) \; \sigma ' (s_k)\; z_j, \end{align}</p>
<p>otherwise</p>
<p>\begin{align} \frac{\partial E_i(\vec{w})}{\partial w_{j \rightarrow k}} &#038;= 2 (h_{\vec{w}}(\vec{x_i}) - y_i) \; \sigma ' (s_k)\; z_j \sum_{o \in \{ v \; | \; v \text{ conn. to } k \}} \frac{\partial h_{\vec{w}}(\vec{x_i})}{\partial s_o} w_{k \rightarrow o}. \end{align}</p>
<p>Then after the following is obtained<br />
\begin{align} \frac{\partial E_i(\vec{w})}{\partial \vec{w}} = \left( \; \; \frac{\partial E_i(\vec{w})}{\partial w_{j \rightarrow k}}   \; \; \right), \forall j, k \end{align}<br />
the weight vector can either be updated in one go (<em>batch</em> update)<br />
\begin{align} \vec{w}_{t+1} := \vec{w_t} - \eta \frac{\partial E(\vec{w})}{\partial \vec{w}} \bigg|_{\vec{w_t}} =  \vec{w_t} - \eta \sum_i \frac{\partial E_i(\vec{w})}{\partial \vec{w}}\bigg|_{\vec{w_t}}, \end{align}<br />
or it can be updated <em>sequentially</em> using one training example at a time:<br />
\begin{align} \vec{w}_{t+1} := \vec{w_t} - \eta \frac{\partial E_i(\vec{w})}{\partial \vec{w}} \bigg|_{\vec{w_t}}.\end{align}</p>
<p><strong>4.1. <em>Example multilayer network</em></strong></p>
<p><input type="submit" name="sub_button" onclick="javascript:show_multiplelayer_applet()" style="width: 600px;" value="Launch the example multilayer neural network applet" width="600"></p>
<p>If you launch and play with the applet above, you will see that it is able to separate classes non-linearly (indicating that it's using more than one perceptron). It is built using this two-layer neural network:</p>
<p><small><div class="wp-caption aligncenter" style="width: 440px"><img title="Two-layer neural network example" src="http://blog.zabarauskas.com/img/multilayer_example.gif" alt="Two-layer neural network example" width="430" height="297" /><p class="wp-caption-text">Two-layer neural network example</p></div></small></p>
<p>The weights vector <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_1b6e7fc2252f4d67d24020cf8067b313.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="\vec{w}" /></span><script type='math/tex'>\vec{w}</script> contains all the weights in the network, i.e.<br />
\begin{align} \vec{w} = ( w_{in_1 \rightarrow 1}, w_{in_x \rightarrow 1}, w_{in_y \rightarrow 1}, w_{in_1 \rightarrow 2}, ..., w_{in_y \rightarrow 5}, w_{in_1 \rightarrow 6}, w_{1 \rightarrow 6}, w_{2 \rightarrow 6}, ..., w_{5 \rightarrow 6}). \end{align}</p>
<p>Each perceptron is using <i>sigmoid</i> as its activation function and the output of the perceptron <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_1679091c5a880faf6fb5e6087eb1b2dc.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="6" /></span><script type='math/tex'>6</script> is the output for the whole network, i.e. <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_0b205b393b46e96670a36edc23fa56c5.gif' style='vertical-align: middle; border: none; ' class='tex' alt="h_{\vec{w}}(\vec{x_i}) = z_6" /></span><script type='math/tex'>h_{\vec{w}}(\vec{x_i}) = z_6</script>.</p>
<p>Then an individual point <i>i</i> (with <i>x</i> and <i>y</i> coordinates normalized) is considered as an <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_865c0c0b4ab0e063e5caa3387c1a8741.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="i" /></span><script type='math/tex'>i</script><sup>th</sup> training example and fed through the network. While it's being propagated, each <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_e406ac4d7c470823a8619c13dd7101be.gif' style='vertical-align: middle; border: none; ' class='tex' alt="s_i" /></span><script type='math/tex'>s_i</script> and <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_5a5ae0760dc3dac91e546c0ea25586b0.gif' style='vertical-align: middle; border: none; ' class='tex' alt="z_i" /></span><script type='math/tex'>z_i</script> for <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_ec3775092d23fe7fb5dcd3f7ed176cc6.gif' style='vertical-align: middle; border: none; ' class='tex' alt="i = 1, ..., 6" /></span><script type='math/tex'>i = 1, ..., 6</script> are stored.</p>
<p>Then the gradient of an <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_865c0c0b4ab0e063e5caa3387c1a8741.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="i" /></span><script type='math/tex'>i</script><sup>th</sup> error surface is calculated as follows:<br />
\begin{align}<br />
\frac{\partial E_i(\vec{w})}{\partial \vec{w}} &#038;= \left( \frac{\partial E_i(\vec{w})}{\partial w_{in_1 \rightarrow 1}},\frac{\partial E_i(\vec{w})}{\partial w_{in_x \rightarrow 1}}, ..., \frac{\partial E_i(\vec{w})}{\partial w_{in_y \rightarrow 5}},\frac{\partial E_i(\vec{w})}{\partial w_{in_1 \rightarrow 6}},\frac{\partial E_i(\vec{w})}{\partial w_{1 \rightarrow 6}},\frac{\partial E_i(\vec{w})}{\partial w_{2 \rightarrow 6}}, ..., \frac{\partial E_i(\vec{w})}{\partial w_{5 \rightarrow 6}} \right) , \end{align}<br />
where<br />
\begin{align} \frac{\partial E_i(\vec{w})}{\partial w_{in_1 \rightarrow 1}} &#038;= 2 (h_{\vec{w}}(\vec{x_i}) - y_i) \; \sigma ' (s_1)\; \frac{\partial h_{\vec{w}}(\vec{x_i})}{\partial s_6} w_{1 \rightarrow 6} \\<br />
&#038;= 2 (z_6 - y_i) \; \sigma (s_1) \; (1 -  \sigma (s_1)) \; \sigma (s_6) \; (1 - \sigma (s_6)) \; w_{1 \rightarrow 6}, \\<br />
\frac{\partial E_i(\vec{w})}{\partial w_{in_x \rightarrow 1}} &#038;= 2 (z_6 - y_i) \; \sigma (s_1) \; (1 -  \sigma (s_1)) \; {in}_x \; \sigma (s_6) \; (1 - \sigma (s_6)) \; w_{1 \rightarrow 6}, \\<br />
&#038; \vdots \\<br />
\frac{\partial E_i(\vec{w})}{\partial w_{in_y \rightarrow 5}} &#038;= 2 (z_6 - y_i) \; \sigma (s_5) \; (1 -  \sigma (s_5)) \; {in}_y \; \sigma (s_6) \; (1 - \sigma (s_6)) \; w_{5 \rightarrow 6}, \\<br />
\frac{\partial E_i(\vec{w})}{\partial w_{in_1 \rightarrow 6}} &#038;= 2 (h_{\vec{w}}(\vec{x_i}) - y_i) \; \sigma ' (s_6) \\<br />
&#038;= 2 (z_6 - y_i) \; \sigma (s_6) \; (1 -  \sigma (s_6)) , \\<br />
\frac{\partial E_i(\vec{w})}{\partial w_{1 \rightarrow 6}} &#038;= 2 (z_6 - y_i) \; \sigma (s_6) \; (1 -  \sigma (s_6)) \; z_1, \\<br />
\frac{\partial E_i(\vec{w})}{\partial w_{2 \rightarrow 6}} &#038;= 2 (z_6 - y_i) \; \sigma (s_6) \; (1 -  \sigma (s_6)) \; z_2, \\<br />
&#038; \vdots \\<br />
\frac{\partial E_i(\vec{w})}{\partial w_{5 \rightarrow 6}} &#038;= 2 (z_6 - y_i) \; \sigma (s_6) \; (1 -  \sigma (s_6)) \; z_5.<br />
\end{align}</p>
<p>Finally, the network is sequentially trained with the learning rate <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_9ce685799750707610f87c6907146736.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\eta = 0.5" /></span><script type='math/tex'>\eta = 0.5</script> (starting with a random initial weight vector <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_ac1052c8c41fa0e8d67714e0723a068b.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="w_0" /></span><script type='math/tex'>w_0</script>)<br />
\begin{align} \vec{w}_{t+1} := \vec{w_t} - 0.5 \frac{\partial E_i(\vec{w})}{\partial \vec{w}} \bigg|_{\vec{w_t}}.\end{align}</p>
<p>That's it, I hope it sheds some light on the backpropagation!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.zabarauskas.com/backpropagation-tutorial/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Halfway There</title>
		<link>http://blog.zabarauskas.com/halfway-there/</link>
		<comments>http://blog.zabarauskas.com/halfway-there/#comments</comments>
		<pubDate>Tue, 12 Apr 2011 12:57:39 +0000</pubDate>
		<dc:creator>Manfredas Zabarauskas</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Education]]></category>
		<category><![CDATA[Life]]></category>
		<category><![CDATA[cambridge university]]></category>
		<category><![CDATA[formal]]></category>
		<category><![CDATA[internship]]></category>
		<category><![CDATA[lent]]></category>
		<category><![CDATA[microsoft]]></category>
		<category><![CDATA[redmoind]]></category>
		<category><![CDATA[sdet]]></category>
		<category><![CDATA[seattle]]></category>
		<category><![CDATA[software development engineer in test]]></category>
		<category><![CDATA[truemobilecoverage]]></category>

		<guid isPermaLink="false">http://blog.zabarauskas.com/?p=729</guid>
		<description><![CDATA[Another term in Cambridge has gone by - four out of nine to go. In the meantime, here's a quick update of what I've been up to in the past few months. 1. Microsoft internship In January I had the opportunity to visit Microsoft's headquarters in Redmond, WA, to interview for the Software Development Engineer [...]]]></description>
			<content:encoded><![CDATA[<p><em>Another term in Cambridge has gone by - four out of nine to go. In the meantime, here's a quick update of what I've been up to in the past few months.</em></p>
<p><strong>1. Microsoft internship</strong></p>
<p><small><div class="wp-caption alignright" style="width: 160px">
<a href="http://blog.zabarauskas.com/img/seattle2.jpg" title="Redmond, WA, 2011" class="thickbox" rel="singlepic39" >
	<img class="ngg-singlepic" src="http://blog.zabarauskas.com/wp-content/gallery/cache/39__150x113_seattle2.jpg" alt="Redmond, WA, 2011" title="Redmond, WA, 2011" />
</a>
<p class="wp-caption-text">Redmond, WA, 2011</p></div></small></p>
<p>In January I had the opportunity to visit <a href="http://www.microsoft.com" target="_blank">Microsoft</a>'s headquarters in Redmond, WA, to interview for the <i>Software Development Engineer in Test</i> intern position in the <a href="http://office.microsoft.com" target="_blank">Office</a> team. In short - a great trip, in every aspect.</p>
<p>I left London Heathrow on January 11th, 2:20 PM and landed in Seattle Tacoma at 4:10 PM (I suspect that there might have been a few time zones in between those two points). I arrived in Mariott Redmond roughly an hour later, which meant that because of my anti-jetlag technique (<i>"do not go to bed until 10-11 PM in the new timezone no matter what"</i>) I had a few hours to kill. Ample time to unpack, grab a dinner in Mariott's restaurant and go for a short stroll around Redmond before going to sleep.</p>
<p>On the next day I had four interviews arranged. The interviews themselves were absolutely stress-free, it felt more like a chance to meet and have a chat with some properly smart (and down-to-earth) folks.<small><div class="wp-caption alignright" style="width: 310px">
<a href="http://blog.zabarauskas.com/img/seattle1_full.jpg" title="Top of the Space Needle. Seattle, WA, 2011" class="thickbox" rel="singlepic38" >
	<img class="ngg-singlepic" src="http://blog.zabarauskas.com/wp-content/gallery/cache/38__300x225_seattle1_full.jpg" alt="Top of the Space Needle. Seattle, WA, 2011" title="Top of the Space Needle. Seattle, WA, 2011" />
</a>
<p class="wp-caption-text">Top of the <a href='http://en.wikipedia.org/wiki/Space_Needle' target='_blank'>Space Needle</a>. Seattle, WA, 2011</p></div></small>  The structure of the interviews seemed fairly typical: each interview consisted of some algorithm/data structure problems, a short discussion about the past experience and the opportunity to ask questions (obviously a great chance to learn more about the team/company/company culture, etc). Since this was my third round of summer internship applications (I have worked as a software engineer for <a href="http://www.wolfsonmicro.com" target="_blank">Wolfson Microelectronics</a> in '09 and <a href="http://www.ms.com" target="_blank">Morgan Stanley</a> in '10), everything made sense and was pretty much what I expected.</p>
<p>My trip ended with a quick visit to Seattle on the next day: a few pictures of the Space Needle, a cup of Seattle's Best Coffee and there I was on my flight back to London, having spent $0.00 (yeap, Microsoft paid for everything - flights, hotel, meals, taxis, etc). Even so, the best thing about Microsoft definitely seemed to be the people working there; since I have received and accepted the offer, we'll see if my opinion remains unchanged after this summer!</p>
<p><strong>2. Lent term v2.0</strong></p>
<p><small><div class="wp-caption alignleft" style="width: 130px"><a href="http://truemobilecoverage.com" target="_blank"><img alt="TrueMobileCoverage group project" src="http://blog.zabarauskas.com/img/tmc1.jpg" title="TrueMobileCoverage group project" width="120" height="120" /></a><p class="wp-caption-text"><a href='http://truemobilecoverage.com' target='_blank'>TrueMobileCoverage</a> group project</p></div></small></p>
<p>Well, things are still picking up the speed. Seven courses with twenty-eight supervisions in under two months, plus managing a <a href="http://truemobilecoverage.com" target="_blank">group project</a> (crowd-sourcing mobile network signal strength, the link is on the left), a few basketball practices each week on top of that and you'll see a reason why this blog has not been updated for a couple of months.</p>
<p>It's not all doom and gloom, of course. Courses themselves are great, lecturers make some decently convoluted material understandable in minutes and an occasional <a href="http://en.wikipedia.org/wiki/Formal_%28university%29" target="_blank">formal hall</a> (e.g. below) also helps. </p>
<p>All in all, my opinion, that Cambridge provides a great opportunity to learn a huge amount of material in a very short timeframe, remains unchanged.</p>
<p><i>There will be more to come about some cool things that I've learnt in separate posts, but now speaking of learning - it's revision time... &#58;-)</i></p>
<p><small><div class="wp-caption alignleft" style="width: 610px"><img alt="Me and Ada at the CompSci formal. Cambridge, England, 2011" src="http://blog.zabarauskas.com/img/formal1.jpg" title="Me and Ada at the CompSci formal hall. Cambridge, England, 2011" width="600" height="500" /><p class="wp-caption-text">Me and Ada at the CompSci formal. Cambridge, England, 2011</p></div></small></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.zabarauskas.com/halfway-there/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Conway&#039;s Game of Life (cont.)</title>
		<link>http://blog.zabarauskas.com/conways-game-of-life-cont/</link>
		<comments>http://blog.zabarauskas.com/conways-game-of-life-cont/#comments</comments>
		<pubDate>Fri, 12 Nov 2010 16:40:30 +0000</pubDate>
		<dc:creator>Manfredas Zabarauskas</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[altera]]></category>
		<category><![CDATA[conway]]></category>
		<category><![CDATA[de2]]></category>
		<category><![CDATA[fpga]]></category>
		<category><![CDATA[game of life]]></category>

		<guid isPermaLink="false">http://blog.zabarauskas.com/?p=697</guid>
		<description><![CDATA[&#34;Beauty in things exists in the mind which contemplates them.&#34; - David Hume (1711-1776) Conway's Game of Life theme continues. Here is a short video with the Game of Life, this time running on Altera DE2 FPGA board with custom soft MIPS CPU.]]></description>
			<content:encoded><![CDATA[<p><i>
<p style="text-align: right;">&quot;Beauty in things exists in the mind which contemplates them.&quot;<br />
- David Hume (1711-1776)</p>
<p></i></p>
<p><a href="http://conwaylife.com/wiki/index.php?title=Main_Page" target="_blank">Conway's Game of Life</a> theme <a href="http://blog.zabarauskas.com/conways-game-of-life/" target="_blank">continues</a>. Here is a short video with the Game of Life, this time running on Altera DE2 <a href="http://en.wikipedia.org/wiki/Field-programmable_gate_array" target="_blank">FPGA</a> board with custom soft MIPS CPU.</p>
<p><small><div class="wp-caption alignleft" style="width: 610px"><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="600" height="344" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://www.youtube.com/v/FRU_v3cGjmo?fs=1&amp;hl=en_US&amp;rel=0" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="600" height="344" src="http://www.youtube.com/v/FRU_v3cGjmo?fs=1&amp;hl=en_US&amp;rel=0" allowscriptaccess="always" allowfullscreen="true"></embed></object><p class="wp-caption-text">Game of Life running on Altera DE2 FPGA board.</p></div></small></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.zabarauskas.com/conways-game-of-life-cont/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Morgan Stanley Internship</title>
		<link>http://blog.zabarauskas.com/morgan-stanley-internship/</link>
		<comments>http://blog.zabarauskas.com/morgan-stanley-internship/#comments</comments>
		<pubDate>Sat, 23 Oct 2010 21:39:55 +0000</pubDate>
		<dc:creator>Manfredas Zabarauskas</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Life]]></category>
		<category><![CDATA[globalaxe]]></category>
		<category><![CDATA[internship]]></category>
		<category><![CDATA[majorca]]></category>
		<category><![CDATA[mallorca]]></category>
		<category><![CDATA[morgan stanley]]></category>

		<guid isPermaLink="false">http://blog.zabarauskas.com/?p=649</guid>
		<description><![CDATA[Last week I received a short e-mail from my former manager at Morgan Stanley: "Hi Manfred, Just to let you know that GlobalAxe all went live last week and so far no issues at all." Since the people on the trading floor started using my system and it seems to be standing on its feet [...]]]></description>
			<content:encoded><![CDATA[<p><small><div class="wp-caption alignright" style="width: 310px">
<a href="http://blog.zabarauskas.com/img/ms2.jpg" title="After work (Canary Wharf, 2010)" class="thickbox" rel="singlepic17" >
	<img class="ngg-singlepic" src="http://blog.zabarauskas.com/wp-content/gallery/cache/17__300x400_ms2.jpg" alt="After work (Canary Wharf, 2010)" title="After work (Canary Wharf, 2010)" />
</a>
<p class="wp-caption-text">After work (Canary Wharf, 2010)</p></div></small>Last week I received a short e-mail from my former manager at Morgan Stanley:</p>
<p><em>"Hi Manfred,</em></p>
<p><em>Just to let you know that GlobalAxe all went live last week and so far no issues at all."</em></p>
<p>Since the people on the trading floor started using my system and it seems to be standing on its feet so far, it probably is a good time to recap on what had happened over my ten week internship at Morgan Stanley.</p>
<p>I was working as technology analyst in <a href="http://en.wikipedia.org/wiki/Repurchase_agreement" target="_blank">repo</a> trading team (in institutional securities group). My task was to develop and integrate a new screen into trading software, to create an associated e-mail subsystem generating daily/weekly reports for senior executives and to code a website which would provide access to the data for executives/sales people without the trading software on their machines.</p>
<p>Development-wise it involved working with quite a wide range of technologies, such as C# and <a href="http://en.wikipedia.org/wiki/Composite_UI_Application_Block" target="_blank">CAB</a> for UI development, Java/Spring for e-mail report generation/server backend, MVC under ASP.NET for the website, Transact-SQL for Sybase DB backend; everything interconnected with SOAP/XML and distributed locally over in-house <a href="http://en.wikipedia.org/wiki/Publish/subscribe" target="_blank">pubsub</a> systems or through IBM's MQ for inter-continental data transactions.</p>
<p>Even though working and learning about all these technologies was fun on it's own right, the best thing I would say about my experience was the people. <small><div class="wp-caption alignleft" style="width: 316px">
<a href="http://blog.zabarauskas.com/img/ms1.jpg" title="Night at Canary Wharf, 2010" class="thickbox" rel="singlepic15" >
	<img class="ngg-singlepic" src="http://blog.zabarauskas.com/wp-content/gallery/cache/15__306x440_ms1.jpg" alt="Night at Canary Wharf, 2010" title="Night at Canary Wharf, 2010" />
</a>
<p class="wp-caption-text">Night at Canary Wharf, 2010</p></div></small>There is no better feeling than having a quick call with traders in New York demoing them the stuff that you just wrote, then dropping an e-mail to Tokyo checking if your recent changes made it through to their database, discussing the architecture of your system with the guys in your team and then going to the global team video-meeting; all in the same day.</p>
<p>And sometimes you feel the need to pinch yourself, because the level of responsibility that you get as an intern is staggering. You have the same rights and responsibilities as any other team member: a screw up in your code can block sixty people from submitting their code before the end of the iteration, a failure to convince the head of traders in NY that what you are doing is going to help them will affect the name of the whole team, and so on.</p>
<p>But then, you <em>own</em> your project: you make the final design decisions, you implement it and you give it to the end-users, who often appear to be bigshots. And that more than makes up for a few late nights in the office. Plus, Canary Wharf is absolutely beautiful at night.</p>
<p>Without expanding too much (and breaching too many non-disclosure agreements) - it was definitely the best experience so far: in terms of team, project, technology, skill, involvement and everything else. And it seems like I will have a chance to repeat it again: I have already received an unconditional offer for the internship at MS next summer!</p>
<p>Oh, and regarding the summer days spent in glass, steel and stone towers... well, Majorca more than made up for it!</p>
<div class="ngg-galleryoverview" id="ngg-gallery-3-649">


	
	<!-- Thumbnails -->
		
	<div id="ngg-image-44" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://blog.zabarauskas.com/wp-content/gallery/mallorca/ms31.jpg" title="Majorca, 2010" class="thickbox" rel="set_3" >
								<img title="Majorca, 2010" alt="Majorca, 2010" src="http://blog.zabarauskas.com/wp-content/gallery/mallorca/thumbs/thumbs_ms31.jpg" width="112" height="84" />
							</a>
		</div>
	</div>
	
		
 		
	<div id="ngg-image-45" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://blog.zabarauskas.com/wp-content/gallery/mallorca/ms41.jpg" title="Majorca, 2010" class="thickbox" rel="set_3" >
								<img title="Majorca, 2010" alt="Majorca, 2010" src="http://blog.zabarauskas.com/wp-content/gallery/mallorca/thumbs/thumbs_ms41.jpg" width="112" height="84" />
							</a>
		</div>
	</div>
	
		
 		
	<div id="ngg-image-46" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://blog.zabarauskas.com/wp-content/gallery/mallorca/ms51.jpg" title="Majorca, 2010" class="thickbox" rel="set_3" >
								<img title="Majorca, 2010" alt="Majorca, 2010" src="http://blog.zabarauskas.com/wp-content/gallery/mallorca/thumbs/thumbs_ms51.jpg" width="112" height="84" />
							</a>
		</div>
	</div>
	
		
 		
	<div id="ngg-image-47" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://blog.zabarauskas.com/wp-content/gallery/mallorca/ms61.jpg" title="Majorca, 2010" class="thickbox" rel="set_3" >
								<img title="Majorca, 2010" alt="Majorca, 2010" src="http://blog.zabarauskas.com/wp-content/gallery/mallorca/thumbs/thumbs_ms61.jpg" width="112" height="84" />
							</a>
		</div>
	</div>
	
		
 		
	<div id="ngg-image-48" class="ngg-gallery-thumbnail-box"  >
		<div class="ngg-gallery-thumbnail" >
			<a href="http://blog.zabarauskas.com/wp-content/gallery/mallorca/ms71.jpg" title="Majorca, 2010" class="thickbox" rel="set_3" >
								<img title="Majorca, 2010" alt="Majorca, 2010" src="http://blog.zabarauskas.com/wp-content/gallery/mallorca/thumbs/thumbs_ms71.jpg" width="112" height="84" />
							</a>
		</div>
	</div>
	
		
 	 	
	<!-- Pagination -->
 	<div class='ngg-clear'></div>
 	
</div>

]]></content:encoded>
			<wfw:commentRss>http://blog.zabarauskas.com/morgan-stanley-internship/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Conway&#039;s Game of Life</title>
		<link>http://blog.zabarauskas.com/conways-game-of-life/</link>
		<comments>http://blog.zabarauskas.com/conways-game-of-life/#comments</comments>
		<pubDate>Sun, 04 Apr 2010 11:20:34 +0000</pubDate>
		<dc:creator>Manfredas Zabarauskas</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Education]]></category>
		<category><![CDATA[cellular automata]]></category>
		<category><![CDATA[conway]]></category>
		<category><![CDATA[game of life]]></category>
		<category><![CDATA[java]]></category>

		<guid isPermaLink="false">http://blog.zabarauskas.com/?p=454</guid>
		<description><![CDATA[Description In 1970s John Horton Conway (British mathematician and University of Cambridge graduate) opened a whole new field of mathematical research by publishing a revolutionary paper on the cellular automaton called the Game of Life. Suffice it to say that the game which he has described with four simple rules has the power of a [...]]]></description>
			<content:encoded><![CDATA[<p><script type="text/javascript">// <![CDATA[
 function show_applet() { var html_element, body_element, p_element, text_node; html_element = document.documentElement; body_element = html_element.lastChild; applet_element = document.createElement("applet"); text_node = document.createTextNode("Cannot start the applet! Please install the Java Runtime Environment."); applet_element.appendChild(text_node); applet_element.setAttribute("code", "uk.ac.cam.mz297.tick6star.GuiLifeApplet"); applet_element.setAttribute("archive", "http://www.zabarauskas.com/downloads/Game%20of%20Life/GameOfLife.jar"); applet_element.setAttribute("height", "0"); applet_element.setAttribute("width", "0"); body_element.appendChild(applet_element); }
// ]]&gt;</script></p>
<h3>Description</h3>
<p>In 1970s <a href="http://en.wikipedia.org/wiki/John_Horton_Conway">John Horton Conway</a> (British mathematician and University of Cambridge graduate) opened a whole new field of mathematical research by publishing a revolutionary paper on the <a href="http://en.wikipedia.org/wiki/Cellular_automaton">cellular automaton</a> called the <em>Game of Life</em>. Suffice it to say that the game which he has described with four simple rules has the power of a <a href="http://en.wikipedia.org/wiki/Universal_Turing_machine">universal Turing machine</a>, i.e. anything that can be computed algorithmically can be computed within Conway's Game of Life (outlines of a proof for given by Berlekamp et al; implemented by Chapman as a universal register machine within the Game of Life in 2002).<br />
<small><div class="wp-caption alignright" style="width: 81px"><a href="javascript:show_applet();"><img title="Launch the Game of Life..." src="http://blog.zabarauskas.com/img/gol_thumb.jpg" alt="Launch the Game of Life..." width="71" height="71" /></a><p class="wp-caption-text">Glider in the Game of Life</p></div></small><br />
The Game of Life is a zero-player game, i.e. the player interacts only by creating an initial configuration on a two-dimensional grid of square cells and then observing how it evolves. Every new generation of cells (which can be either live or dead) is a pure function of the previous generation and is described by this set of rules:</p>
<ol>
<li>Any live cell with fewer than two live neighbours dies, as if caused by underpopulation.</li>
<li>Any live cell with more than three live neighbours dies, as if by overcrowding.</li>
<li>Any live cell with two or three live neighbours lives on to the next generation.</li>
<li>Any dead cell with exactly three live neighbours becomes a live cell.</li>
</ol>
<p>For more information, patterns and current news about the research involving Game of Life check out the brilliant <a href="http://conwaylife.com/wiki/index.php?title=Main_Page">LifeWiki at conwaylife.com</a>.<br />
&nbsp;</p>
<h3>Implementation</h3>
<p>The following applet visualising the Game of Life has been developed as part of the coursework for Object-Oriented Programming at the University of Cambridge, all code was written and compiled in Sun's Java SE 1.6. </p>
<p>Click on any of the screenshots or the button below to launch the Game of Life (and if nothing shows up, make sure that you have the <a href="http://www.java.com/en/download/index.jsp">Java Runtime Environment (JRE)</a> installed).</p>
<form name="gol" action="javascript:show_applet()" method="get">
<input type="submit" name="sub_button" onClick="this.disabled=true; this.value='The Game of Life is loading, please wait...';" style="width: 600px;" value="Launch the Game of Life!" width="600"><br />
</form>
<p><small><div class="wp-caption alignleft" style="width: 610px"><a href="javascript:show_applet();"><img title="Spacefiller in the Game of Life" src="http://blog.zabarauskas.com/img/gol.jpg" alt="Game of Life Implementation by Manfredas Zabarauskas" width="600" height="419" /></a><p class="wp-caption-text">Spacefiller (Game of Life applet)</p></div></small><br />
<span id="more-454"></span><br />
<small><div class="wp-caption alignleft" style="width: 610px"><a href="javascript:show_applet();"><img title="Game of Life Implementation by Manfredas Zabarauskas" src="http://blog.zabarauskas.com/img/gol2.jpg" alt="Traffic circle in the Game of Life" width="600" height="419" /></a><p class="wp-caption-text">Traffic circle (Game of Life applet)</p></div></small></p>
<p><small><div class="wp-caption alignleft" style="width: 610px"><a href="javascript:show_applet();"><img title="Game of Life intial pattern editor" src="http://blog.zabarauskas.com/img/gol3.jpg" alt="Game of Life Implementation by Manfredas Zabarauskas" width="600" height="419" /></a><p class="wp-caption-text">Pattern editor (Game of Life applet)</p></div></small><br />
&nbsp;</p>
<h3>References</h3>
<p>1. Berlekamp, E. R.; Conway, J. H.; and Guy, R. K. "What Is Life?" Ch. 25 in Winning Ways for Your Mathematical Plays, Vol. 2: Games in Particular. London: Academic Press, 1982.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.zabarauskas.com/conways-game-of-life/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>It Literally Pays Off to Do Homework in Cambridge</title>
		<link>http://blog.zabarauskas.com/it-literally-pays-off-to-do-homework-in-cambridge/</link>
		<comments>http://blog.zabarauskas.com/it-literally-pays-off-to-do-homework-in-cambridge/#comments</comments>
		<pubDate>Tue, 24 Nov 2009 20:18:07 +0000</pubDate>
		<dc:creator>Manfredas Zabarauskas</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Education]]></category>
		<category><![CDATA[Life]]></category>
		<category><![CDATA[algorithm]]></category>
		<category><![CDATA[assembly]]></category>
		<category><![CDATA[mips]]></category>
		<category><![CDATA[prize]]></category>
		<category><![CDATA[sort]]></category>

		<guid isPermaLink="false">http://blog.zabarauskas.com/?p=377</guid>
		<description><![CDATA[&#160; 45 minutes, 28 MIPS instructions and £25. Computer Science FTW. &#160; # Copyright Manfredas Zabarauskas, 2009. # MIPS routine that reads an array of ten integers # and prints the sorted array to console. .text main: sub $t7, $sp, 40 l_read: li $v0, 5 syscall sw $v0, 0($t7) add $t7, $t7, 4 bne $t7, [...]]]></description>
			<content:encoded><![CDATA[<p><small><div class="wp-caption alignleft" style="width: 590px"><img title="Well done! - joint shortest traditional (and practical :-) sort for the OS1a prize tick." src="http://blog.zabarauskas.com/img/sort.jpg" alt="Well done! - joint shortest traditional (and practical :-) sort for the OS1a prize tick." width="580" height="258" /><p class="wp-caption-text">&quot;Well done! - joint shortest traditional (and practical :<b></b>-) sort for the OS1a prize tick&quot;. Cambridge, 2009.</p></div></small></p>
<div style="clear: both;">&nbsp;<br />
45 minutes, 28 MIPS instructions and £25.<br />
Computer Science FTW.<br />
&nbsp;</div>
<pre># Copyright Manfredas Zabarauskas, 2009.
# MIPS routine that reads an array of ten integers 
# and prints the sorted array to console.
.text
main:   sub $t7, $sp, 40
l_read: li $v0, 5
        syscall
        sw $v0, 0($t7)
        add $t7, $t7, 4
        bne $t7, $sp, l_read
l_out:  sub $t8, $sp, 36
        sub $t7, 40
l_inn:  add $t8, $t8, 4
        lw $t2, -8($t8)
        lw $t3, -4($t8)
        ble $t2, $t3, no_swp
        sw $t2, -4($t8)
        sw $t3, -8($t8)
        move $t7, $sp
no_swp: bne $t8, $sp, l_inn
        beq $t7, $sp, l_out
l_prnt: li $v0, 11
        li $a0, 10
        syscall
        li $v0, 1
        lw $a0, 0($t7)
        syscall
        add $t7, $t7, 4
        bne $t7, $sp, l_prnt
        li $v0, 10
        syscall
</pre>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.zabarauskas.com/it-literally-pays-off-to-do-homework-in-cambridge/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Eigenfaces Tutorial</title>
		<link>http://blog.zabarauskas.com/eigenfaces-tutorial/</link>
		<comments>http://blog.zabarauskas.com/eigenfaces-tutorial/#comments</comments>
		<pubDate>Fri, 02 Oct 2009 16:43:22 +0000</pubDate>
		<dc:creator>Manfredas Zabarauskas</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Education]]></category>
		<category><![CDATA[eigenface]]></category>
		<category><![CDATA[eigenfaces]]></category>
		<category><![CDATA[face detection]]></category>
		<category><![CDATA[face recognition]]></category>
		<category><![CDATA[pca]]></category>
		<category><![CDATA[pentland]]></category>
		<category><![CDATA[turk]]></category>
		<category><![CDATA[tutorial]]></category>

		<guid isPermaLink="false">http://blog.zabarauskas.com/?p=286</guid>
		<description><![CDATA[The main purpose behind writing this tutorial was to provide a more detailed set of instructions for someone who is trying to implement an eigenface based face detection or recognition systems. It is assumed that the reader is familiar (at least to some extent) with the eigenface technique as described in the original M. Turk [...]]]></description>
			<content:encoded><![CDATA[<p><i>The main purpose behind writing this tutorial was to provide a more detailed set of instructions for someone who is trying to implement an eigenface based face detection or recognition systems. It is assumed that the reader is familiar (at least to some extent) with the eigenface technique as described in the original M. Turk and A. Pentland papers (see "References" for more details). </i></p>
<h3>Introduction</h3>
<p>The idea behind eigenfaces is similar (to a certain extent) to the one behind the periodic signal representation as a sum of simple oscillating functions in a <a href="http://en.wikipedia.org/wiki/Fourier_series" target="_blank">Fourier decomposition</a>. The technique described in this tutorial, as well as in the original papers, also aims to represent a face as a linear composition of the base images (called the eigenfaces).</p>
<p>The recognition/detection process consists of initialization, during which the eigenface basis is established and face classification, during which a new image is projected onto the "face space" and the resulting image is categorized by the weight patterns as a known-face, an unknown-face or a non-face image.</p>
<h3>Demonstration</h3>
<p>To <a href="http://www.zabarauskas.com/downloads/Eigenfaces.zip">download</a> the software shown in video for 32-bit x86 platform, click <a href="http://www.zabarauskas.com/downloads/Eigenfaces.zip">here</a>. It was compiled using Microsoft Visual C++ 2008 and uses <a href="http://www.gnu.org/software/gsl/" target="_blank">GSL</a> for Windows.</p>
<p><object width="480" height="385"><param name="movie" value="http://www.youtube.com/v/YWRiF7FAuKE&#038;hl=en&#038;fs=1&#038;"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/YWRiF7FAuKE&#038;hl=en&#038;fs=1&#038;" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="480" height="385"></embed></object></p>
<h3>Establishing the Eigenface Basis</h3>
<p>First of all, we have to obtain a training set of <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_69691c7bdcc3ce6d5d8a1361f22d04ac.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="M" /></span><script type='math/tex'>M</script> grayscale face images  <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_89a934fc176ed09fa49252cf10f8957c.gif' style='vertical-align: middle; border: none; ' class='tex' alt="I_1, I_2, ..., I_M" /></span><script type='math/tex'>I_1, I_2, ..., I_M</script>. They should be:</p>
<ol>
<li> face-wise aligned, with eyes in the same level and faces of the same scale,</li>
<li> normalized so that every pixel has a value between 0 and 255 (i.e. one byte per pixel encoding), and</li>
<li>of the same <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_cd94a3641bb6ba72c90dd0d8f4d2e199.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="N \times N" /></span><script type='math/tex'>N \times N</script> size.</li>
</ol>
<p>So just capturing everything formally, we want to obtain a set <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_d1b806dc65ca4bcb76dfc062ae060be5.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\{ I_1, I_2, ..., I_M \}" /></span><script type='math/tex'>\{ I_1, I_2, ..., I_M \}</script>, where \begin{align} I_k = \begin{bmatrix} p_{1,1}^k &#038; p_{1,2}^k &#038; ... &#038; p_{1,N}^k \\ p_{2,1}^k &#038; p_{2,2}^k &#038; ... &#038; p_{2,N}^k \\ \vdots \\ p_{N,1}^k &#038; p_{N,2}^k &#038; ... &#038; p_{N,N}^k \end{bmatrix}_{N \times N} \end{align} and <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_f025d7635bbe720a4b66adccea31ddd3.gif' style='vertical-align: middle; border: none; ' class='tex' alt="0 \leq p_{i,j}^k \leq 255." /></span><script type='math/tex'>0 \leq p_{i,j}^k \leq 255.</script></center></p>
<p>Once we have that, we should change the representation of a face image <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_bc6d0de85e84afdaf232791d9aafa398.gif' style='vertical-align: middle; border: none; ' class='tex' alt="I_k" /></span><script type='math/tex'>I_k</script> from a <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_cd94a3641bb6ba72c90dd0d8f4d2e199.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="N \times N" /></span><script type='math/tex'>N \times N</script> matrix, to a <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_067beafbfe4c5e47df74c436264c5493.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\Gamma_k" /></span><script type='math/tex'>\Gamma_k</script> point in <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_ec4ad94a9be87109217fcd9d10ebcd52.gif' style='vertical-align: middle; border: none; ' class='tex' alt="N^2" /></span><script type='math/tex'>N^2</script>-dimensional space. Now here is how we do it: <span id="more-286"></span>we concatenate all the rows of the matrix <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_bc6d0de85e84afdaf232791d9aafa398.gif' style='vertical-align: middle; border: none; ' class='tex' alt="I_k" /></span><script type='math/tex'>I_k</script> into one big vector of dimension <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_ec4ad94a9be87109217fcd9d10ebcd52.gif' style='vertical-align: middle; border: none; ' class='tex' alt="N^2" /></span><script type='math/tex'>N^2</script>. Can it get any more simpler than that?</p>
<p>This is how it looks formally:</p>
<p><center><span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_39c24b1755e3c06f42b365c62a282ccb.gif' style='vertical-align: middle; border: none; padding-bottom:2px;' class='tex' alt="\Gamma_k = \begin{bmatrix} p_{1,1}^k \\ p_{1,2}^k \\ \vdots \\ p_{1,N}^k \\ p_{2,1}^k \\ p_{2,2}^k \\ \vdots \\ p_{2,N}^k \\ \vdots \\ p_{N,1}^k \\ p_{N,2}^k \\ \vdots \\ p_{N,N}^k \end{bmatrix}_{N \times 1}" /></span><script type='math/tex'>\Gamma_k = \begin{bmatrix} p_{1,1}^k \\ p_{1,2}^k \\ \vdots \\ p_{1,N}^k \\ p_{2,1}^k \\ p_{2,2}^k \\ \vdots \\ p_{2,N}^k \\ \vdots \\ p_{N,1}^k \\ p_{N,2}^k \\ \vdots \\ p_{N,N}^k \end{bmatrix}_{N \times 1}</script>, where  <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_0d577846ec877dc57a43942af2f5919f.gif' style='vertical-align: middle; border: none; ' class='tex' alt="k = 1, ..., M" /></span><script type='math/tex'>k = 1, ..., M</script> and <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_7810a75f664cc78102b11f42972062e3.gif' style='vertical-align: middle; border: none; ' class='tex' alt="p_{i,j}^k \in I_k" /></span><script type='math/tex'>p_{i,j}^k \in I_k</script></center></p>
<p>Since we are much more interested in the characteristic features of those faces, let's subtract everything what is common between them, i.e. the <strong>average face</strong>.<br />
The average face of the previous mean-adjusted images can be defined as <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_2ee5b10fa3153fc3dfe076869222c7d8.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\Psi = {{1}\over{M}} \sum_{i=1}^{M} \Gamma_i" /></span><script type='math/tex'>\Psi = {{1}\over{M}} \sum_{i=1}^{M} \Gamma_i</script>, then each face differs from the average by the vector <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_e6b4043d66e1baf4d6839f0bc51c4107.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\Phi_i = \Gamma_i - \Psi" /></span><script type='math/tex'>\Phi_i = \Gamma_i - \Psi</script>.</p>
<p>Now we should attempt to find a set of orthonormal vectors which best describe the distribution of our data. The necessary steps in this at a first glance daunting task would seem to be:</p>
<ol>
<li>Obtain a <a href="http://en.wikipedia.org/wiki/Covariance_matrix" target="_blank">covariance matrix</a><br />
<span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_cc75d8834aff897124eaf1ef8e60fa1d.gif' style='vertical-align: middle; border: none; ' class='tex' alt="C = {{1}\over{M}} \sum_{i=1}^{M} \Phi_i \Phi_i^T = AA^T" /></span><script type='math/tex'>C = {{1}\over{M}} \sum_{i=1}^{M} \Phi_i \Phi_i^T = AA^T</script>, where <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_95e05c29c4d2ed0d862144b734d1200b.gif' style='vertical-align: middle; border: none; ' class='tex' alt="A = \left[ \Phi_1 \Phi_2 ... \Phi_M \right]" /></span><script type='math/tex'>A = \left[ \Phi_1 \Phi_2 ... \Phi_M \right]</script>.</li>
<li>Find the eigenvectors <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_c96b59279af8b06034c43473c16ab01d.gif' style='vertical-align: middle; border: none; ' class='tex' alt="u_k" /></span><script type='math/tex'>u_k</script> and eigenvalues <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_8ff9c1b69b4201fec1b23780372d5cdf.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\lambda_k" /></span><script type='math/tex'>\lambda_k</script> of <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_0d61f8370cad1d412f80b84d143e1257.gif' style='vertical-align: middle; border: none; ' class='tex' alt="C" /></span><script type='math/tex'>C</script>.</li>
</ol>
<p>However, note two things here: <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_7fc56270e7a70fa81a5935b72eacbe29.gif' style='vertical-align: middle; border: none; ' class='tex' alt="A" /></span><script type='math/tex'>A</script> is of the size <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_f538d8e85c63fd7582add2d8672562b2.gif' style='vertical-align: middle; border: none; ' class='tex' alt="N^2 \times M" /></span><script type='math/tex'>N^2 \times M</script> and hence the matrix <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_0d61f8370cad1d412f80b84d143e1257.gif' style='vertical-align: middle; border: none; ' class='tex' alt="C" /></span><script type='math/tex'>C</script> is of the size <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_72af584bfe05bf85e4d69e506a3b4675.gif' style='vertical-align: middle; border: none; ' class='tex' alt="N^2 \times N^2" /></span><script type='math/tex'>N^2 \times N^2</script>. To put things into perspective - if your image size is <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_7967136cf815c43bf428b85d8f990a4b.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="128 \times 128" /></span><script type='math/tex'>128 \times 128</script>, then the size of the matrix <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_0d61f8370cad1d412f80b84d143e1257.gif' style='vertical-align: middle; border: none; ' class='tex' alt="C" /></span><script type='math/tex'>C</script> would be <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_782e66b36998bb9fadf2b50c5ae2d775.gif' style='vertical-align: middle; border: none; ' class='tex' alt="16384 \times 16384" /></span><script type='math/tex'>16384 \times 16384</script>. Determining eigenvectors and eigenvalues for a matrix this size would be an absolutely intractable task!</p>
<p>So how do we go about it? A simple mathematical trick: first let's calculate the inner product matrix <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_9779c443501e95b557b4661bba870560.gif' style='vertical-align: middle; border: none; ' class='tex' alt="L = A^T A" /></span><script type='math/tex'>L = A^T A</script>, of the size <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_93ff99aa85b95038b1d2748d250caf4d.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="M \times M" /></span><script type='math/tex'>M \times M</script>. Then let's find it's eigenvectors <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_4f430d84ae7eb54df6bfe4a906af6638.gif' style='vertical-align: middle; border: none; ' class='tex' alt="v_i, i = 1, ..., M" /></span><script type='math/tex'>v_i, i = 1, ..., M</script> of <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_d20caec3b48a1eef164cb4ca81ba2587.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="L" /></span><script type='math/tex'>L</script> (of the <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_69691c7bdcc3ce6d5d8a1361f22d04ac.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="M" /></span><script type='math/tex'>M</script>-th dimension). Now observe, that if <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_d3c618cc46f1a82db89606012cd75043.gif' style='vertical-align: middle; border: none; ' class='tex' alt="L v_i = \lambda_i v_i" /></span><script type='math/tex'>L v_i = \lambda_i v_i</script>, then</p>
<p><center>\begin{array} {rcl} A L v_i &#038;=&#038; \lambda_i A v_i \Rightarrow \\ A A^T A v_i &#038;=&#038; \lambda_i A v_i \Rightarrow \\ C A v_i &#038;=&#038; \lambda_i A v_i, \end{array}</center></p>
<p>and hence <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_ce45a4315a74df24a9d46ff33119fd69.gif' style='vertical-align: middle; border: none; ' class='tex' alt="u_i = A v_i" /></span><script type='math/tex'>u_i = A v_i</script> and <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_5614371f803f8a78b18b27391549a107.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\lambda_i" /></span><script type='math/tex'>\lambda_i</script> are respectively the <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_69691c7bdcc3ce6d5d8a1361f22d04ac.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="M" /></span><script type='math/tex'>M</script> eigenvectors (of <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_ec4ad94a9be87109217fcd9d10ebcd52.gif' style='vertical-align: middle; border: none; ' class='tex' alt="N^2" /></span><script type='math/tex'>N^2</script>-th dimension) and eigenvalues of <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_0d61f8370cad1d412f80b84d143e1257.gif' style='vertical-align: middle; border: none; ' class='tex' alt="C" /></span><script type='math/tex'>C</script>. Make sure to normalize <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_eb00a04135562ae6f74786f084f54327.gif' style='vertical-align: middle; border: none; ' class='tex' alt="u_i" /></span><script type='math/tex'>u_i</script>, such that <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_11ab4920d9be82d30e3c4c1fa3fbea30.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\left\| u_i \right\| = 1" /></span><script type='math/tex'>\left\| u_i \right\| = 1</script>.</p>
<p>We will call these eigenvectors <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_eb00a04135562ae6f74786f084f54327.gif' style='vertical-align: middle; border: none; ' class='tex' alt="u_i" /></span><script type='math/tex'>u_i</script> the <strong>eigenfaces</strong>. Scale them to 255 and render on the screen, to see why.</p>
<p>It turns out that quite a few eigenfaces with the smallest eigenvalues can be discarded, so leave only the <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_3fe4ce6c661e7015ff4e5c3054260ab8.gif' style='vertical-align: middle; border: none; ' class='tex' alt="R \leq M" /></span><script type='math/tex'>R \leq M</script> ones with the largest eigenvalues (i.e. only the ones making the greatest contribution to the variance of the original image set) and chuck them into the matrix <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_57491af98e75f833a0a80e05eaf5825f.gif' style='vertical-align: middle; border: none; ' class='tex' alt="U = \left[ u_1 u_2 ... u_R \right]_{N^2 \times R}" /></span><script type='math/tex'>U = \left[ u_1 u_2 ... u_R \right]_{N^2 \times R}</script></p>
<p>After you have done that - congratulations! We won't need anything else, but the matrix <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_4c614360da93c0a041b22e537de151eb.gif' style='vertical-align: middle; border: none; ' class='tex' alt="U" /></span><script type='math/tex'>U</script> for the next steps - face detection and classification.</p>
<h3>Face Classification Using Eigenfaces</h3>
<p>Once the eigenfaces are created, a new face image <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_07710b5c43702a8bb7b9104eacc6ba71.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="\Gamma" /></span><script type='math/tex'>\Gamma</script> can be transformed into it's eigenface components by a simple operation:</p>
<p><center><span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_1358890c107de0ec0ab7637e1d6a01a4.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\Omega = U^T (\Gamma - \Psi) =  \begin{bmatrix} \omega_1 \\ \omega_2 \\ \vdots \\ \omega_R \end{bmatrix}_{R \times 1}" /></span><script type='math/tex'>\Omega = U^T (\Gamma - \Psi) =  \begin{bmatrix} \omega_1 \\ \omega_2 \\ \vdots \\ \omega_R \end{bmatrix}_{R \times 1}</script>.</center></p>
<p>The weights <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_eb948b1c50831e30722aa00670819bc4.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\omega_i \in \Omega" /></span><script type='math/tex'>\omega_i \in \Omega</script> describe the contribution of each eigenface in representing the input face image. We can use this vector for <strong>face recognition</strong> by finding the smallest <a href="http://en.wikipedia.org/wiki/Euclidean_distance">Euclidean distance</a> <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_c64243adb5cde4ced701f126265899d5.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="\epsilon_{rec}" /></span><script type='math/tex'>\epsilon_{rec}</script> between the input face and training faces weight vectors, i.e. by calculating <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_3bb3e0a287a6f7f61935fcea0b7a7b94.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\epsilon_{rec} = min \left\| \Omega - \Omega_i \right\|" /></span><script type='math/tex'>\epsilon_{rec} = min \left\| \Omega - \Omega_i \right\|</script>. If <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_2fb45501393e53f3bf03a43800a1f8c3.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\epsilon_{rec} < \Theta_{rec}" /></span><script type='math/tex'>\epsilon_{rec} < \Theta_{rec}</script>, where <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_8d7bfd80a860b12ebb430bc051573049.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\Theta_{rec}" /></span><script type='math/tex'>\Theta_{rec}</script> is a treshold chosen heuristically, then we can say that the input image is recognized as the image with which it gives the lowest score.</p>
<p>The weights vector can also be used for an unknown <strong>face detection</strong>, exploiting the fact that the images of faces do not change radically when projected into the face space, while the projection of non-face images appear quite different. To do so, we can calculate the distance <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_2283e0ad240100f5a6953e6efd2cc06f.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\epsilon_{det}" /></span><script type='math/tex'>\epsilon_{det}</script> from the mean-adjusted input image <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_f1833047ad4f5ada395df8a2903b641c.gif' style='vertical-align: middle; border: none; padding-bottom:1px;' class='tex' alt="\Phi = \Gamma - \Psi" /></span><script type='math/tex'>\Phi = \Gamma - \Psi</script> and its projection onto face space <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_79041fdea0a0f2e4a7c83d2d9c63b38a.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\Phi_f = \sum_{i=1}^R \omega_i u_i " /></span><script type='math/tex'>\Phi_f = \sum_{i=1}^R \omega_i u_i </script>, i.e. <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_d3bfb48a60d5ec7f9ad1a45312aeb636.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\epsilon_{det} = \left\| \Phi - \Phi_f \right\|" /></span><script type='math/tex'>\epsilon_{det} = \left\| \Phi - \Phi_f \right\|</script>. Again, if <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_d76819b7484100ade0300a9b8999554d.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\epsilon_{det} < \Theta_{det}" /></span><script type='math/tex'>\epsilon_{det} < \Theta_{det}</script> for some treshold <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_94ade02ef424d4e009df3fe9720e1992.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\Theta_{det}" /></span><script type='math/tex'>\Theta_{det}</script> (also obtained heuristically, for example, by observing <span class='MathJax_Preview'><img src='http://blog.zabarauskas.com/wp-content/plugins/latex/cache/tex_2283e0ad240100f5a6953e6efd2cc06f.gif' style='vertical-align: middle; border: none; ' class='tex' alt="\epsilon_{det}" /></span><script type='math/tex'>\epsilon_{det}</script> for an input set consisting only of face images and a set of non-face images) we can conclude that the input image is a face.</p>
<h3>References</h3>
<p>1. Face Recognition Using Eigenfaces, Matthew A. Turk and Alex P. Pentland, MIT Vision and Modeling Lab, CVPR ‘91.<br />
2. Eigenfaces for Recognition, Matthew A. Turk and Alex P. Pentland, Journal of Cognitive Neuroscience ‘91.<br />
3. <a href="http://www.scholarpedia.org/article/Eigenfaces" target="_blank">Eigenfaces</a>. Sheng Zhang and Matthew Turk (2008), Scholarpedia, 3(9):4244. </p>
]]></content:encoded>
			<wfw:commentRss>http://blog.zabarauskas.com/eigenfaces-tutorial/feed/</wfw:commentRss>
		<slash:comments>20</slash:comments>
		</item>
		<item>
		<title>SMRP6400/SMDK64X0 IIC synchronization problems</title>
		<link>http://blog.zabarauskas.com/smrp6400smdk6410-iic-synchronization-problems/</link>
		<comments>http://blog.zabarauskas.com/smrp6400smdk6410-iic-synchronization-problems/#comments</comments>
		<pubDate>Tue, 07 Jul 2009 21:45:12 +0000</pubDate>
		<dc:creator>Manfredas Zabarauskas</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[drivers]]></category>
		<category><![CDATA[iic]]></category>
		<category><![CDATA[multithreading]]></category>
		<category><![CDATA[samsung]]></category>
		<category><![CDATA[smdk6400]]></category>
		<category><![CDATA[smdk6410]]></category>
		<category><![CDATA[smrp6400]]></category>
		<category><![CDATA[windows ce]]></category>

		<guid isPermaLink="false">http://blog.zabarauskas.com/?p=116</guid>
		<description><![CDATA[Since last week we have spent quite some time debugging Samsung SMRP6400/SMDK64X0 IIC drivers, I thought I might share with one particular example here. It is both a good showcase of the hardware/software synchronization issues and since the bugs are in the latest version of the drivers shipped together with the development platforms, it might [...]]]></description>
			<content:encoded><![CDATA[<p><small><div class="wp-caption alignright" style="width: 191px"><img title="Debugging Samsung SMRP6400" src="http://blog.zabarauskas.com/img/100_0139_small.jpg" alt="Debugging Samsung SMRP6400" width="181" height="150" /><p class="wp-caption-text">Debugging Samsung SMRP6400</p></div></small></p>
<p>Since last week we have spent quite some time debugging Samsung SMRP6400/SMDK64X0 IIC drivers, I thought I might share with one particular example here. It is both a good showcase of the hardware/software synchronization issues and since the bugs are in the latest version of the drivers shipped together with the development platforms, it might save someone from additional headaches.</p>
<p>So, while working on the drivers for IIC one of our automated tests to make sure that IIC still works was to write some data on the bus, do an immediate read-back and verify that both data written and read back matches; something similar to this:</p>
<pre>UCHAR outData[3] = { REGISTER, DATA_BYTE_1, DATA_BYTE2 };
UCHAR inData[2] = { 0, 0 };

IIC_Write(SLAVE_ADDRESS, outData, 3);
IIC_Read(SLAVE_ADDRESS, REGISTER, inData, 2);

if ((inData[0] != DATA_BYTE_1) || (inData[1] != DATA_BYTE_2))
{
    DEBUGMSG(ERROR_MSG,
             (TEXT("Immediate write/read data mismatch: data sent [0x%X " \
                   "0x%X] differs from the data received [0x%X, 0x%X]."),
                   DATA_BYTE_1, DATA_BYTE_2, inData[0], inData[1]));
}</pre>
<p>However, after we added some IIC read calls from another hardware driver it started spitting fire throwing the following error message:</p>
<pre>Immediate write/read data mismatch: data sent <strong>[0xCA, 0xFE]</strong>
differs from the data received <strong>[0xCA, 0xFE]</strong>.</pre>
<p>Now this clearly meant we had a serious problem: the error message was saying that the data does not match, which obviously was not the case as shown by the message text!</p>
<p>Since we were pretty confident about our side of things, as well as the readings from the scope, it seemed like a good time to start looking at the Samsung IIC drivers (and especially s3c64X0_iic_lib.cpp). The driver structure there is pretty straightforward: the first read/write byte on the bus triggers the hardware IRQ, which is mapped to SysIntr triggering a transfer event; then any subsequent call to a read/write function blocks waiting for a <code>transfer-done</code> event, which is triggered when the last byte is read/written in the IST.</p>
<p>Everything looks sane up to the point where a <code>transfer-done</code> event is signalled from the IST (pseudocode, not the original code below, due to the legal issues):</p>
<pre>static HANDLE ghTransferEvent;
static HANDLE ghTransferDone;
static DWORD IICSysIntr;

...

static DWORD IST(LPVOID lpContext)
{
    BOOL bTransferDone = FALSE;

    while (TRUE) {
        WaitForSingleObject(ghTransferEvent, INFINITE);

        switch (IIC_BUS_STATUS) {
            case MASTER_RECEIVE:
                // receive bytes and store them in the buffer
            break;

            case MASTER_TRANSMIT:
                // transmit bytes from the buffer in memory
                if (LAST_BYTE) bTransferDone = TRUE;
            break;

            InterruptDone(IICSysIntr);

            if (bTransferDone) {
                SetEvent(ghTransferDone);
            }
        }
    }
}</pre>
<p>Two major problems with this code are:</p>
<ol>
<li>The <code>bTransferDone</code> variable is never set from the IIC read, and hence the <code>transfer-done</code> event for bus reads is never triggered,</li>
<li>After the <code>bTransferDone</code> is set from the IIC write, it is never reset, hence the <code>transfer-done</code> event is triggered after reading/writing single byte on the bus in all subsequent transactions.</li>
</ol>
<p>That explains the initial test case failure: during the write/immediate read data comparison the data arrives exactly between the <code>if</code> statement and the following printout, thus triggering the error message, but printing the correct data to the output due to the early signal of the event.</p>
<p>The way to solve this is also straightforward: make sure that the <code>bTransferDone</code> variable is cleared after the <code>transfer-done</code> event is triggered, and make sure that master-receive mode sets the <code>bTransferDone</code> variable after reading the last byte of the transaction from the IIC bus.</p>
<p>In pseudocode it would look similar to:</p>
<pre>static DWORD IST(LPVOID context)
{
    BOOL bTransferDone = FALSE;

    while (TRUE) {
        WaitForSingleObject(ghTransferEvent, INFINITE);

        switch (IIC_BUS_STATUS) {
            case MASTER_RECEIVE:
                // receive bytes and store them in the buffer
                if (LAST_BYTE) bTransferDone = TRUE;
            break;

            case MASTER_TRANSMIT:
                // transmit bytes from the buffer in memory
                if (LAST_BYTE) bTransferDone = TRUE;
            break;

            InterruptDone(IICSysIntr);

            if (bTransferDone) {
                bTransferDone = FALSE;
                SetEvent(ghTransferDone);
            }
        }
    }
}</pre>
<p>The lesson of the day (quoting my colleague) is: "<em>The first rule about multithreading - you're wrong</em>".</p>
<div class="wp-caption alignleft" style="width: 652px"><img title="At work. Wolfson Microelectronics PLC, 2009" src="http://blog.zabarauskas.com/img/100_0138_small.jpg" alt="At work. Wolfson Microelectronics PLC, 2009" width="642" height="496" /><p class="wp-caption-text">At work. Wolfson Microelectronics PLC, 2009</p></div>
]]></content:encoded>
			<wfw:commentRss>http://blog.zabarauskas.com/smrp6400smdk6410-iic-synchronization-problems/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Hosting24 Analytics Code -->
<script type="text/javascript" src="http://stats.hosting24.com/count.php"></script>
<!-- End Of Analytics Code -->
