## Friday, November 30, 2012

### Big Data in Small Folder

"There is nothing like the extortion that is textbooks." said Prof Mehran Sahami in one of his Stanford classes. Textbooks are expensive and it's tragic that it is mostly students who buy them.

We buy textbooks whenever we can afford them. But when we can't, we are entirely at the mercy of some researchers' generosity, who have been kind enough to share their research papers, tutorials, books, presentations, handouts for free. So poor students can use their high-speed download subscriptions to Google like crazy and spend a lot of time simply downloading these texts off the internet.

Even without saving the same content multiple times, the stockpile of pdfs keeps piling up in my downloads folder. It doesn't help that I'm too lazy to change the default name of the file, and when I browse through my downloads folder, I find a bunch of files with weird names.

So I want to invent a naming convention that helps me organize my downloaded research papers, tutorials, course handouts, ebooks, etc. The naming convention should clearly indicate:

1. The broad subject of the pdf. A mere glance should suffice to decide whether the paper is about research in signal processing or solid state physics. Many papers are interdisciplinary, so maybe I'll use my best judgement to decide what the paper helps me learn and choose the subject accordingly.
2. The source of the file - Did I download something off IEEEXplore? Did I get it from the authors' website? Or did it just appear as a pdf link in Google search results?
3. The name (and perhaps affiliation) of the author.
4. A comprehensive reading order - I have a bunch of files in a folder names "backpropagation learning" which contains paper, tutorials and presentations. The order in which these files are to be read should be clear from the names of the files. I shouldn't end up reading a research paper on backpropagation before going through a tutorial first.

If nothing else, such a naming standard will atleast save a lot of my time which I otherwise would spend simply opening files, realizing that they are not what I was looking for, and closing them. Suggestions on the design of such a convention are welcome.

TL;DR?

Just save the page and read it later. Whenever.

## Thursday, November 1, 2012

### Proposal for SciPy.in 2012 - Time-Frequency Analysis with Python

This work is in collaboration with my co-authors Stuart Mumford and Nabil Freij, from the university of Sheffield.

This is a sequel to my talk at SciPy India 2011 - “pyhht: A Python Toolbox for the Hilbert Huang Transform” (http://goo.gl/uswB7). The talk led to many interesting discussions on blogs and the scipy-dev mailing list about the signal processing capabilities of the scipy ecosystem. The pyhht project itself has come a long way from a simple barebones implementation to a sophisticated module with features for nonlinear and nonstationary time-series analysis and visualization.

The first couple of minutes of the talk (not more than a slide) will be a summary of the signal processing features that developers and users would like to see in Python, and which of these are immediately doable. The focus of the talk, however, is time-frequency analysis. We will examine time and frequency characteristics of speech recordings and see where conventional signal processing tools in Python fall short. We will be able to see that time-frequency analysis is a central problem in signal processing and provides a clear motivation for a signal processing stack in any programming language.

Examining both time and frequency characteristics of a dataset is difficult for a number of reasons, mostly because conventional processing tools like Fourier and Wavelet work on tradeoffs between the time and frequency resolution obtained through a signal. Adjusting these tradeoffs is a cornerstone of time-frequency analysis. We will have a preliminary look at the mathematical problems associated with these tradeoffs, which can be conveniently expressed as numerical optimization problems in numpy. While numpy and scipy (and some scikits) have a wide variety of linear transforms, many routine functions in time-frequency analysis like the Wigner-ville distribution, or the short-time Fourier transform are not present. We will see how these distributions can be obtained by application of the Hilbert-Huang transform.

The talk will also feature some interactive time-frequency applications built using the Enthought tool suite. Although the methods that fall under time-frequency analysis are vast, most of them have a lot of common numerical procedures. This enables us to make basic time-frequency plotting applications using the ETS which can be extended and made more elaborate on demand to suit specific requirements. Particularly, we will be able to harness the interactive abilities of Chaco and Enaml in order to make time-frequency plots richer and more detailed.

## Saturday, October 27, 2012

### Curlews and Other Extinctions

For as long as I can remember, my father has had a large collection of Reader's Digest select editions. It was this collection that introduced me to some pearls of English literature that are now all but lost. Authors like Nevil Shute, Georges Blond, A J Cronin are nowadays available only on demand (atleast in India bookstores). There were a lot of much more famous writers too, like John Steinbeck and Pearl Buck. But even with these Nobel Prize winning authors it is difficult to locate specific books. It is a rare experience to find references to these authors or their books in contemporary literature, as happened with Richard Bach and Nevil Shute. When I read Richard Bach's piece in Illusions on how Nevil Shute inspired him, I called up my father and told him about it. It is very fortunate and happy, such growth from one generation of readers and writers to another. But Bach is one of the few so influenced. Most of us pride ourselves on not being inspired.

I don't know if it's because of this paradigm, but some really good literature lies in tatters in my father's closet. It should make a good weekend exercise to track down the provenance of these publications and see if they could be revived. A lot of these books are about animals, and one I remember particularly well is Fred Bosworth's Last of the Curlews in which he writes about the fictionalized life of the last Eskimo curlew. He wrote towards the end of the book a statement that convinces you just how horrible the idea of extinction can be, "An entire universe must come into creation once again, to recreate a curlew." Considering that those books in the closet might be extinct soon too is also equally appalling.

Then, today, out of the blue, I was shortening a URL using goo.gl and the captcha consisted of 'curlew'. I have never seen that word anywhere else except Bosworth's book. I had some idea that captchas get their data from digitally scanned books. Although this only means that atleast one book, with at least one mention of a curlew got scanned somewhere, I can't help feeling greatly optimistic about digitization of books. This idea is very beautifully captured in the new The Time Machine (2002) movie, where the time traveler, played by Guy Pearce, finds a very sophisticated AI which stores all of human knowledge. In the climax, the AI is shown reciting a story from Tom Sawyer to a group of young Eloi. I hope to God it doesn't come to that.

## Thursday, October 4, 2012

### Compressed Sensing with sklearn - A DTMF Example

Magic Reconstruction

I first read about compressed sensing about two years ago, on Cleve's Corner, a column which Cleve Moler, the founder of Mathworks, writes anually. The 2010 article was called "Magic Reconstruction: Compressed Sensing". Compressed (or compressive) sensing is immediately appealing for a number of reasons. It has all the features of a cool signal processing problem. It seems to beat the Shannon-Nyquist sampling theorem by making use of sparsity in real signals, and this sparsity begs the question - sparsity in which basis? This question in turn calls for some prior information about the data in question, which potentially leads to a wide range of other mathematical problems. (Perhaps a machine learning problem for detecting which basis are appropriate for a given set of signals, especially when prior information about signals is not available.)

But even without all this, as Moler points out, the real appeal of compressed sensing lies in the underlying matrix problem. This problem requires a great deal of rigour in linear algebra and convex optimization. I did get to those problems eventually, but initially I was more curious as to whether compressed sensing really worked, and I tried out the example in his post in Python. That's what this post is about.

Python Implementation of the DTMF Example

The problem consists simply of creating a touchtone signal, trying to compress and decompress it, and checking whether the original signal and the reconstructed signal match acoustically. Let's step through the programme step by step.

# Imports
from sklearn.linear_model.sparse import Lasso
from scipy.fftpack import dct, idct
from scipy.sparse import coo_matrix
from matplotlib.pyplot import plot, show, figure, title
import numpy as np


We pick any two touchtone frequencies $f_1$ and $f_2$ (in this case 697 Hz and 1336 Hz, corresponding to the '5' key on the keypad), and play the following signal for an eighth of a second.
$f = \sin(2\pi f_1t) + \sin(2\pi f_1t)$
At a sampling rate of 4 kHz, it comes to 5000 samples. As per the example, we take 500 random samples of this signal.

# Initializing constants and signals
N = 5000
FS = 4e4
M = 500
f1, f2 = 697, 1336 # Pick any two touchtone frequencies
duration = 1./8
t = np.linspace(0, duration, duration*FS)
f = np.sin(2*np.pi*f1*t) + np.sin(2*np.pi*f2*t)
f = np.reshape(f, (len(f),1))

# Displaying the test signal
plot(t,f)
title('Original Signal')
show()

# Randomly sampling the test signal
k = np.random.randint(0,N,(M,))
k = np.sort(k) # making sure the random samples are monotonic
b = f[k]
plot(t,f,'b', t[k],b,'r.')
title('Original Signal with Random Samples')
show()


Since this is a simple, almost stationary signal, a simple basis like discrete cosines should suffice to bring out the sparsity.

D = dct(np.eye(N))
A = D[k,:]


Here A is a matrix which contains a subset of 500 discrete cosine bases, and we need to solve $Ax=b$ for $x$. It is a nonlinear optimization problem and there are many solutions, but it turns out that the one that minimizes the $L_1$ norm of the solution gives the best estimate of the original signal. Since this is an optimization problem, it can be solved with many of the methods in scipy.optimize, say by taking the least squares solution of the equation (or the $L_2$ norm) as the first guess and minimizing iteratively. But I took the easier approach and used the Lasso estimator in the sklearn package, which is essentially a linear estimator that penalizes (regularizes) its weights in the $L_1$ sense. (A really cool demonstration of compressed sensing for images using Lasso is here).

lasso = Lasso(alpha=0.001)
lasso.fit(A,b.reshape((M,)))

# Plotting the reconstructed coefficients and the signal
plot(lasso.coef_)
title('IDCT of the Reconstructed Signal')
recons = dct(lasso.coef_.reshape((N,1)),axis=0)
figure()
plot(t,recons)
title('Reconstucted Signal')
show()


As can be seen through the plots, most of the coefficients of the lasso estimator as zeros. It is the discrete cosine transform of these coefficients that is the reconstructed signal. Since the coefficients are sparse, they can be compressed into a scipy.sparse matrix.

recons_sparse = coo_matrix(lasso.coef_)
sparsity = 1 - float(recons_sparse.getnnz())/len(lasso.coef_)
print sparsity


As it turns out, the compressed matrix is about 90% sparse. Thus, we have managed to reconstruct the signal from only 10% of its samples.

Validating the Reconstruction

A reasonably reliable method of validating the compression and reconstruction is to listen to the original  signal and check if the reconstructed signal sounds similar. The scikits.audiolab package can be used to play sound straight from numpy arrays. (I couldn't make audiolab work, so I validated this by saving the reconstructed array into a .mat file and playing it out in Octave.) The python codes used here are available as an ipython notebook and a python script in this repository.

Further...

The entire reconstruction depends on sparsity and prior information about the signal. That's probably why wavelets are a popular choice for bases in compressed sensing applications, since using wavelets is equivalent to imposing a prior basis on signals. The important question to answer in all compressed sensing problems is - in which basis is the signal sparse? An interesting follow up question would be - what if the bases are intrinsic mode functions? How does one perform empirical mode decomposition such that the decomposition is sparse? And that's only the tip of the iceberg. Prior information is exactly what is absent in intrinsic mode functions! Examining IMFs as bases for sparsity in signals should be an interesting problems, given that they complement wavelets nicely.

## Monday, June 4, 2012

### Graphical Application for Scientific Programming Using TraitsUI & Chaco

I wrote about using ETS Traits for building UIs in Python in the last post. I continue in this post, using Chaco to embed scientific graphical applications in a Traits UI. For the most part, this post is an adaptation of Gael Varoquaux's tutorial on TraitsUI, except here Chaco, a 2-D plotting package fully integrated with Traits, is used, instead of matplotlib.

The Application
The tutorial has a simple application which simulates the process of a camera acquiring an image, and performs some processing on that image interactively, i.e at the click of a button. The results of the processing and the variables associated with the image are required to be displayed on the GUI continuously, as one image after another is acquired and processed. For simplicity let us divide the classes and functions in this application in three categories:
1. those that deal with the acquisition (or in this case simulating the acquisition) and the processing of the image
2. those that deal with displaying the interface
3. those that are used to handle the event loops in the application.

As one would expect, there is some overlap between the last two categories. The difference has been made simply to demonstrate the major features of this application - performing the data intensive work, rendering plots with Chaco and embedding them in a Traits UI, and finally deploying the UI interactively. A wide variety of scientific GUI applications can be covered between these three categories, and each of them uniquely demonstrates the advantages of working with Python and ETS tools.

Image Acquisition and Processing
The following are the functions and classes required to perform the acquisition and processing in the application.
from traits.api import HasTraits, Float, Enum
from traitsui.api import View, Item
from scipy import rand, indices, exp, sqrt, sum

class Experiment(HasTraits):
width = Float(30, label = 'width', desc = 'width of the cloud')
x = Float(50, label = 'X', desc = 'X position of the center')
y = Float(50, label = 'Y', desc = 'Y position of the center')

class Results(HasTraits):
""" Object used to display the results.
"""
width = Float(30, label="Width", desc="width of the cloud")
x = Float(50, label="X", desc="X position of the center")
y = Float(50, label="Y", desc="Y position of the center")

)

class Camera(HasTraits):
""" Camera objects. Implements both the camera parameters controls, and
the picture acquisition.
"""
exposure = Float(1, label="Exposure", desc="exposure, in ms")
gain = Enum(1, 2, 3, label="Gain", desc="gain")

def acquire(self, experiment):
X, Y = indices((100, 100))
Z = exp(-((X-experiment.x)**2+(Y-experiment.y)**2)/experiment.width**2)
Z += 1-2*rand(100,100)
Z *= self.exposure
Z[Z>2] = 2
Z = Z**self.gain
return(Z)

def process(image, results_obj):
""" Function called to do the processing """
X, Y = indices(image.shape)
x = sum(X*image)/sum(image)
y = sum(Y*image)/sum(image)
width = sqrt(abs(sum(((X-x)**2+(Y-y)**2)*image)/sum(image)))
results_obj.x = x
results_obj.y = y
results_obj.width = width


Among these classes only the Results class needs special attention. While the rest of the classes and the process function simply carry data attributes and process them, the Results class contains a 'view' attribute. This attribute enables the interface to display the results.

There are three threads in the application:
1. The thread handling the UI event loop. This is the top-level thread which is open at the start of the program. It triggers threads that are responsible for acquiring and processing the image, and displays data that it has gathered from those threads.
2. The acquisition thread, which keeps waiting for the camera to be triggered (through the GUI loop), acquires the image, and invokes the processing thread. Processing may take more time than acquisition, and hence a major role of this thread is to keep two different processing threads from running simultaneously. It does not trigger a new processing thread if one is still running.
3. The processing thread, which performs the numerically intensive work on the data, and dies when it is done.

Of these threads, the GUI event loops are handled by the .configure_traits() method of the HasTraits subclass which we use to make the main window of the GUI. So that leaves only the acquisition and the processing threads to be handled safely. The acquisition thread is subclassed from the threading.Thread class as follows (note that the imports from the previous snippet are still valid):
from threading import Thread
from time import sleep

wants_abort = False

def process(self, image):
try:
if self.processing_job.isAlive():
self.display("Processing too slow")
return
except AttributeError:
pass
self.processing_job = Thread(target = process, args = (image, \
self.results))
self.processing_job.start()

def run(self):
self.display('Camera started')
n_img = 0
while not self.wants_abort:
n_img += 1
img = self.acquire(self.experiment)
self.display('%d image captured' %n_img)
self.image_show(img)
self.process(img)
sleep(1)
self.display('Camera stopped')

The GUI Classes
The widget should look something like this:

The left half of the GUI is a Chaco image plot which displays the acquired image. The right half is a tabbed layout consisting of a general tab which has the Start/Stop button, an experiment tab that displays the variables of the experiment and the camera tab which displays the parameters of the acquisition process. The right half of the GUI is written in the following ControlPanel class, which is a container for the other classes.
from chaco.api import ArrayPlotData, Plot
from traits.api import HasTraits, Instance, String, Float, Enum, Button

class ControlPanel(HasTraits):
experiment = Instance(Experiment, ())
camera = Instance(Camera, ())
plotdata = Instance(ArrayPlotData, ())
results = Instance(Results, ())
results_string = String()
start_stop_acquisition = Button('Start / Stop Acquisition')
view = View(Group(Group(Item('start_stop_acquisition', show_label = False),
Item('results_string', show_label = False,
style = 'custom'),
label = "Control", dock = 'tab'),
Group(Group(Item('experiment', style = 'custom',
show_label = False), label = 'Input'),
Group(Item('results', style = 'custom',
show_label = False), label ='Results'),
label = 'Experiment', dock = 'tab'),
Item('camera', style = 'custom', show_label = False,
dock = 'tab'),
layout = 'tabbed'))

def _start_stop_acquisition_fired(self):
"""
Method that gets called when the 'Start/Stop' button is clicked.
"""
else:

"""
Writes a line to the control panel indicating the acquisition of the
nth image
"""
self.results_string = (string + "\n" + self.results_string)

def image_show(self, image):
"""
Method for rendering a random 2D array as an image.
"""
self.plotdata.set_data('imagedata', image)

Note that although the ControlPanel class does not render the plot itself, it contains an ArrayPlotData attribute. ArrayPlotData is a class in chaco.api which is used to associate Numpy arrays with Chaco plots. Since a new array is created every time the Start/Stop Acquisition button is clicked, we need to set the contents of the Chaco plot to the new array. This is done by the ArrayPlotData().set_data() method. Every time the set_data() method is called it displays the new image by default. ArrayPlotData is the only Chaco class that needs to validated in the ControlPanel class, because it is all that is required to make the plot respond to the GUI event loop, while the plotting itself is done in a separate class.

The Chaco Plot and ComponentEditor
It is now that we come to the most important distinction from the original tutorial. As emphasized in section 4.1 (Making a Traits Editor for Matplotlib), Traits does not provide an editor for everything. The tutorial therefore goes on to make a somewhat complex traits editor for matplotlib plots. Although matplotlib is a fairly simple tool, making this editor requires some knowledge of matplotlib's backends and wxPython. This requirement is eliminated due to the use of ComponentEditor. A Chaco plot can be used in a TraitsUI with ComponentEditor, which is a class for wxPython editors for traits. The MPLFigureEditor can simply be replaced by ComponentEditor when using Chaco.

We first make a TraitsUI Handler subclass which processes the GUI event loop safely.

from traitsui.api import Handler
from pyface.api import GUI

class MainWindowHandler(Handler):

def close(self, info, is_OK):
sleep(0.1)
GUI.process_events()
return True

The pyface module is used by the traits GUIs to implement views and editors. The GUI.process_events() method processes any current GUI events.

Finally we make the main window class which combines the Chaco plot with the control panel.

class MainWindow(HasTraits):

plot = Instance(Plot)
plotdata = Instance(ArrayPlotData, ())
panel = Instance(ControlPanel)

def _panel_default(self):
return ControlPanel(plotdata = self.plotdata)

def _plot_default(self):
self.plotdata = ArrayPlotData(imagedata=zeros((100,100)))
plot = Plot(self.plotdata)
plot.img_plot('imagedata')
self.plot = plot
return plot

view = View(HSplit(Item('plot', editor = ComponentEditor(), \
dock = 'vertical'),
Item('panel', style = 'custom'),
show_labels = False),
resizable = True, handler = MainWindowHandler(),
buttons = NoButtons)

Note that the editor for the Chaco plot object is ComponentEditor and the MainWindowHandler object is used as the handler for the traits UI view. Calling the configure_traits() method on a MainWindow object will start the GUI. The entire script is available here.

This simple application amply demonstrates the kind of modularity one can reach by using traits. The data intensive processes, GUI events and their lower threads can be comfortably divided into different classes, and we have a wide variety of  editors and handlers to make them interactive.

Special thanks to Puneeth Chaganti and Pankaj Pandey who did most of the debugging, and to the people on the enthought-dev mailing list.

## Wednesday, May 9, 2012

### TraitsUI - Python for Quick and Efficient GUIs

A highlight of the Enthought Tool Suite is the Traits package. Folks at Enthought eat and sleep Traits. Those who regularly use ETS would be familiar with Traits, as ETS almost builds on top of Traits. The TraitsUI tutorial by Gael Varoquaux is an excellent introduction for using Traits to create powerful UIs in Python. The tutorial uses Traits and matplotlib to create a basic interactive GUI. Last month I started as an intern at Enthought, Inc. My first task was to convert Gael's tutorial to use Chaco instead of matplotlib for the plotting. Chaco is an interactive 2D plotting package that is already well integrated with TraitsUI (There is a module for 3D plotting too - Mayavi). Although I have only seen Traits being used in the context of making UIs, it is basically a sophisticated type definition system that can go well beyond UIs. In this post I will attempt to give the reader a cursory feel of TraitsUI specifically and Traits in general.

Traits as Type Definitions in Python
As a beginner in programming, and especially in Python, I have found it quite easy to create new types by simply writing new classes. One could, of course, take a lower-level approach and extend Python using the C/C++ API, but that would miss the point of programming in Python: to write readable, clean code and fast. The Traits package is something that makes the former approach more palatable. It does so by adding a number of characteristics to Python class attributes, like initializing default values for attributes, restricting attributes by their respective properties, notifying the user about changes in attributes, etc. These advantages make Traits particularly effective for GUI design - users might want to interactively modify attributes, visualize the return values of different methods, etc. The visualization of traits is supported by the TraitsUI package.

Visual Representation of Traits
To demonstrate the graphical and interactive modification of traits, consider the following snippet:
from traits.api import HasTraits, CInt, Enum

class Camera(HasTraits):

gain = Enum(1,2,3, desc = 'The gain index of the camera',
label = 'Gain')
exposure = CInt(10, desc = 'Exposure in ms',
label = 'Exposure')

def capture(self):
print "capturing an image at %i ms exposure, gain: %i" %
(self.exposure, self.gain )

Camera().configure_traits()

Let's consider the trait attributes in the Camera class. The HasTraits class can be subclassed to create the most commonly used traits required in UI applications. The gain of a camera can be (virtually) any real number, but in this case we wish to assign the gain trait a predefined set of integers. For this we use the Enum trait, which is used to assign a predefined set of values to a trait, and they do not have to be of the same type. The default value of an Enum trait is the first value passed to it - in this case, 1. The next trait, CInt, is called a casting trait. Casting traits can be used to cast the type of the value to the type that is required by the trait. When the class is instantiated and the configure_traits() method is called on it, it automatically creates the widgets required by the UI. The user does not have to worry about the layers between writing algorithms and rendering widgets.

A great variety of traits are available in the package, most of them with characteristic visual properties.

Interactive GUIs with Traits
A major advantage of using traits is its notification property. This is indispensable in making interactive GUIs. There are many ways of making a Python programme react to changes in variables. Trait notifications are particularly effective in accomplishing this graphically. Consider the following code snippet:
from traits.api import HasTraits
class EchoBox(HasTraits):

input = Str
output = Str

def _input_changed(self):
output = self.input

EchoBox().configure_traits()
For any trait foo, a notification for it's change can be created by writing a method _foo_changed(). If we use a Button object in a UI, then the notification is made by writing a _bar_fired() method in the class (where bar is the name of the Button object). This is illustrated in the following snippet:
from traits.api import HasTraits, Button, Int
from traitsui.api import View, Item

class ButtonClick(HasTraits):
value = Int()

self.value += 1

view = View('value', Item('add_one', show_label = False))

ButtonClick().configure_traits()
The View and Item classes from the traitsui.api module are used to add different panels to one TraitsUI window.

More with Traits
I have taken a very minimal approach to Traits so far. In the next post I will write about embedding 2-D plots in TraitsUI, for which I will use ETS Chaco. Chaco is a very convenient package for adding 2-D plotting functionality to a TraitsUI, as it is fully integrated with Traits. Even standalone Chaco plot widgets require Traits to be rendered. I have omitted a few parts of the original tutorial - especially the parts on executing event loops via threads and ensuring safety of these threads. I will rewrite the tutorial when I have played around with Traits a bit.

## Monday, April 2, 2012

### GSoC Notes: Image Segmentation by Clustering

This photograph was taken by my cousin somewhere near Onyx, California. An interesting problem would be trying to differentiate the clouds from the snow in this picture. In this post, I shall attempt to do that with the help of k-means clustering. The heuristics of all the methodology and algorithms have been deliberately kept simple in order to see the least these algorithms can do.

Generally, in performing clustering on the information in images, the feature vectors consist of readily available data, like the intensities of pixels and the coordinates of those intensity values. Let's make a naive assumption that this image has no discernible spatial patterns, and omit the coordinates from our feature vectors. The dimensions of the image are 960 x 720 x 3. So we have 960*720 feature vectors, each having three components, the red, green and blue values of the respective pixel. It is also possible to do some pre-processing and come up with more elaborate feature vectors, but we shall leave that for later.

For initializing a k-means algorithm we need to decide on the number of clusters we want. Let's say we want to cluster the pixels into three groups, one representing the snow, one for the clouds, and everything else goes into the third. This is a highly relaxed assumption and a lot can go wrong with this kind of reasoning. For instance, whatever we intend to put in the third group might itself belong to a number of classes (like whatever isn't snow or clouds might be the earth, the tree cover, etc) and these classes might not have sufficiently common characteristics to fall into one cluster. A popular way to address this is to use one-vs-all classification, which identifies objects belonging to one class and rejects all others. But this would require the knowledge of all possible classes. But we don't know how many classes there are other than snow and clouds. Unsupervised clustering comes into handy in such cases.

Here is the code I used for generating these contours on the image. It is an adaptation of Vincent Michel and Alexander Gramfort's demo of segmentation, modified to use k-means clustering.

While the results are far from satisfactory, even a crude clustering of the pixel values can provide a reasonable segmentation of the image. In retrospect, this result tells us how ambitious the problem was, and that it calls for better preprocessing. Now it can be clearly seen that being ambiguous about the third cluster has caused only the land to be well segmented, whereas in many places the snow and the clouds fall in the same cluster.

Let's try a more comfortable image now. This one is from the CASIA Iris dataset. We initialize three clusters here too. Detecting the pupil is the easiest part of segmenting an iris image, because the pupil is easily the darkest part of the image and is also spatially very convenient. The real goal is to be able to isolate the iris region within exactly two contours. As the results show, the pupil and the eyelids can be easily identified. But even the eyelashes have to be accounted for. In such a case, simple clustering will not do. It will have to be combined with some confidence measures on fitting contours to the iris. I don't know how to do that yet. In a later post I will write about using clustering as pre-processing for other machine learning based segmentation algorithms.

(Snow and Clouds photo by Sandeep Hardas)