Tuesday 10 January 2012

I feel all fuzzy inside...

Just kidding.

I've not been around much recently as I've been working hard at my MSc project.  For part of this I've had to implement something called a "Fuzzy Transform", this was initially presented by Irina Perfilieva and has been developed by the Institute for Research and Applications of Fuzzy Modeling (IRAFM) at the University of Ostrava in Ostrava in the Czech Republic.

The way in which I'm presently using the Fuzzy Transform is one that provides a numerical analysis method that is a cross between fuzzy logic and a filtering function (a filtering function is used to remove noise [unwanted information] from a signal; such functions are in use everywhere in our modern world, from cellphones to missile guidance systems).  It's a numerical method, because it doesn't try to "guess" what gave rise to the original data (methods that try to work out what gave rise to the data in the first place are "estimators" and tend to use "mathematical models" to represent whatever is generating the data from a given input); if you wanted to you could feed in a totally random stream of numbers and a numerical method would have a good go at approximating the numbers "with arbitrary accuracy" (this means "as well as you make it", see below).

One of the things that the fuzzy transform provides as well as the filtering capability of the filtering function is an ability called "Universal Approximation", this means that provided any series of data a correctly set-up fuzzy transform can be used to approximate the data.  Universal Approximators are useful since they allow you to do all kinds of fun stuff, such as pattern matching (for example, character recognition and face recognition), data compression and forecasting... but more on this later.

So, let's take a varying series of data:


This isn't of anything very exciting other than something that provides a nice wiggly line (if you're really interested it's y=sin(x.π)+x with added zero mean, normally distributed noise with variance 0.01).  So what happens if we use the fuzzy transform?  We get a plot a little like this:


The blue line in the above plot is the result of the Fuzzy Transform, this was obtained by dividing the original wiggly line into 10 horizontally measured sections, or partitions, taking the Fuzzy Transform of the data and then inverting (reversing) the transform; and you can see that it kind of matches the original wavy line, but as well as removing all the noise, the F-Transform has also removed a lot of the shape of the original line (we've lost information).  Using 20 partitions gives this:


Which uses the same colours for the noisy data and the approximation as previously and is a better fit, but still leaves a few gaps.  While using 30 partitions gives this:


Which again uses the same colours for the noisy data and the approximation as previously and almost exactly matches the original function, which I've plotted below (in green) along with the 30-partition approximation to allow an easy comparison:


So what does this mean?  It means that the Fuzzy Transform can approximate the data "with arbitrary accuracy", which we can now see is a way of saying that it can match the data we feed into it (the red line) as closely as we want it to.  Why might we not want to try to get an exact match?  Well, firstly, the more accurate we make the approximation, the more calculations are required and so it takes longer to get the approximation, and, secondly, if we make the approximation too accurate then it will also start to approximate the noise (the Fuzzy Transform doesn't know where the data has come from, so it can't tell what is "noise" and what is "signal").  This means that we need to find a balance between ease of calculation, and being a good enough approximation, while still "filtering out" the noise.

In the example above, doubling our "effort" by going from 10 to 20 partitions gives quite a large benefit, but putting in another half as much effort again to get from 20 to 30 partitions doesn't give a very big improvement (so if the end use could tolerate the inaccuracy of the 20 partition transform then we'd use that rather than "wasting effort").

Finally, I suppose you want to know how can this be used?  Well IRAFM and others have shown how the Fuzzy Transform can be used to detect patterns and relationships that underlie quite complex data.  For example, in one paper published in 2008 [subscription required] the Fuzzy Transform was used to provide a model of the GDP of the Czech Republic based on a set of other data, such as unemployment, the rate of inflation, etc.  The Fuzzy Transform has also been used to automatically combine multiple images to provide an image that is of better quality overall than the constituent images [PDF].  Pretty impressive, eh?

If you want to learn more about the Fuzzy Transform, or the other things that IRAFM are involved in researching and developing then they have an extensive list of publications that are publicly available in PDF format.

This may be a bit technical, so please ask questions, I'll do my best to explain, or point you to somewhere you can find more information!