Saturday, 23 March 2013

Artificial Neural Networks Part 1: Why are they important?

Some products have algorithms that the program seems to learn over time, improving at its task. For example, Furbies.

As always in theoretical computer science, Furbies are by far the most demonstrative example.

How does that happen? If you have any algorithms experience, you’ll know that computers don’t ‘learn’ like a people do. How could you get bad output the first time you do something, yet good output a couple of times later? How could the Furby slowly learn to say the things you say to it? Is it, perhaps, a demon posessing powers far beyond those of modern machinery? Or is it using a neural network?


Hopefully, the latter. An artificial neural network (ANN) is a model for solving problems in the same way that a brain might. It uses many tiny components that mimic the work of neurons; they each read a small piece of input, process it, and pass it out to other neurons in the network. Each piece of input is brought closer to the correct output when it is processed, so the system eventually migrates to a solution.

ANNs have a large number of these neuron-like components, each doing a very small amount of work, which has advantages over the typical computer model:

  • ANNs can be incredibly fast, because they share the workload over so many components.
  • ANNs can be incredibly robust. Since no single component is doing all of the work, the algorithm can still produce the correct output with a few components missing or working incorrectly.
  • ANNs can be comparatively simple to implement because you don’t have to encode information about the specific problem you’re solving; you just make your model and take it through a ‘teaching’ process where it learns on its own to solve that problem. This means you can solve problems with ANNs that you don’t even understand the solution to.
A diagram of this process, from this great French theoretical computer science blog.

This may all seem magical and/or impossible, but it’s so not. In future posts I’ll explain myself by going through ANN implementation details, but I think it’s more to start off knowing how awesome they are before careening into the math. Because they are very, very awesome.

Some neurons, a.k.a. furby guts. Look at all that science.

End of Hiatus

Internet, prepare for my triumphant return to this blog after a summer of radventuring!

Topics to come:
  • Artificial neural networks
  • Formal logic
  • Artificial intelligence
  • Harlem shake
  • Babin'
  • Java

And here are two pictures I want you to have because I love you deeply:


When I have Google glass, I will consider this fair play.

Friday, 19 October 2012

Introduction to Hidden Markov Models

A Markov model is called ‘hidden’ when you don’t always know the current state that the system is in.

Just to remind ourselves; when I say ‘state’ I’m talking about the current value of the randomly changing variable being modeled as a Markov process. If the variable is ‘weather’, the states are things like ‘rainy’, ‘sunny’, ‘cloudy’, etc. This set of weather states is a Markov process, but NOT a hidden one, because we can know what the weather is like by looking outside.

state = 'angry deity'

What systems have states that you can’t see?

The classic example is a person talking. When you’re listening to someone speak, you can hear the sounds they are making, but you can’t be completely certain what words those sounds correspond to. Why not? Well, they might misspeak accidentally, or you might mishear them because there was noise in the background, or you might misinterpret them because of a grammatical ambiguity.

This is exactly the sort of thing that happens when people don't understand HMMs.

In this example, the system’s states are the words the person intends to say. What you observe, called the system’s ‘outputs’ are the sounds you are hearing. The two do not always match up, so it’s as if the states (words) are partially hidden from you. However, if we know:

  •  the transition probabilities of the underlying states, and
  •  the likelihood of each state producing each output

… then we can make some good estimates about what kind of underlying states the system went through given a particular list of outputs. We can even do things like find out which series of states is the most likely using the Viterbi formula.

So, what’s the difference between a Markov model and a hidden Markov model? A Markov model is part of a hidden Markov model; its set of underlying states is always a Markov process. Hidden Markov models just have the extra property that the current state in the Markov process isn’t known for certain, so we have to guess it based on outputs of the system that are known for certain.

Next Post: The Viterbi Formula!

Wednesday, 10 October 2012

What ISN’T a Markov Process?

Not a Markov Process

Here is the classic example of a non-Markov process:

You have two red counters and one blue counter in a bag. The sequence of states

S = {s0, s1, …, sN}

sn = the counter taken out of the bag at time n NOT a Markov process.

Why not?

Because if all you know is the current state, you don’t have enough information to give the probabilities of transitioning to each next state. 

For example, what are the transition probabilities when you have just removed a red counter?

Well, the possibilities for the next state are:

  1. Drawing another red counter,
  2. Drawing the blue counter, or
  3. The bag is empty.

But the probability of drawing the other red counter depends on whether this is the second time we have drawn one, i.e. whether the other red counter is still in the bag. If this is the first red counter we’ve drawn, the transition probability to drawing another red counter will be greater than zero. If this is the second red counter we’ve drawn, the transition probability to drawing another red counter will be zero. Same for the blue counter or the bag being empty. How could we know whether these are possible states to transition to just from knowing that our current state is a red counter? There isn’t enough information, so this isn’t a Markov process.


“But”, you ask, “Couldn’t we define the set of states differently in a way that does have the Markov property? Would that make this a Markov process?” That is an excellent question, you total genius. 

Hey, I found this picture of you!

Coincidentally, I had the same question, which I asked in this Stack Exchange Mathematics post and will now summarize!

Why can’t we redefine the state space for the counters-in-a-bag problem to be:

sn = the counters remaining in the bag

Surely counters-in-a-bag would then be a Markov process, since we would know the probability of transition to each state if we know the current state?

The first place this thinking is wrong is in trying to define whether ‘taking counters out of a bag’ is a Markov process. A Markov process isn’t ever an activity – it’s a set of states that can be transitioned through. For any randomized activity, you could define several different state sequences to represent it, each of which may or may not be a Markov process. For example:

        sn = the counter taken out of the bag at time n
        e.g. {RED, BLUE, RED}

        sn = the counters remaining in the bag at time
          e.g. {RED+BLUE, RED, EMPTY}

        sn = the list of all previously drawn counters
        e.g. {RED, RED+BLUE, RED+BLUE+RED}

        sn = the amount of fun and excitement you experience when drawing a counter

No reason you can’t draw coins from a bag WHILE listening to Gangnam Style!

So while actually getting counters and taking them out of a bag may be a raging Saturday night in the math department, it is neither a Markov process nor a non-Markov process. This is why redefining the state space to make 'sn' the number of counters in the bag isn’t a solution; it’s just a different problem.

Great. Then, is the first state space we described a Markov process? Clearly no.

What about the second one, with the state describing all remaining counters? Technically it is, since you can give each transition probability at each state. However, creating states that artificially include information about previous states counts as actually including those states, so it is definitely not a first-order Markov model, which is really the useful one. 

Processes where your states have to have a large order lose most of the benefits of a Markov model (its simplicity). So, people assume that Markov models should be first order, otherwise you should be representing your system differently. The counters-in-a-bag problem can’t be represented with an order small enough to exclude any previous states, so it’s not suited to representation by a Markov process.


Sunday, 7 October 2012

The Markov Assumptions

Basically all Markov models have two special properties called the 'Markov assumptions'.

  1.  The model only takes the current state into account when determining transition probabilities, not any previous states.

    This is called the ‘limited horizon principle’, and the Markov models that adhere to it are called ‘first order’ models. These are the most useful and the most common. However, some texts say that they are the only valid Markov models. I find this dubious, firstly because any 'higher order' model can be converted into a first order model, and secondly because lots of people are using higher order models anyway.

    For people who find this confusing because they don't know what horizons are.

  2. The transition probabilities of each state do not change with time.

    This is called the ‘stationary process assumption’. Same deal as the limited horizon assumption – many websites say that this is a necessary criterion for every Markov model, but researchers still use non-stationary models.



This woman is upset because her megaphone is so tiny. She should probably just get a blog.
A few of the resources I've found on Markov models have included these two properties in the definition for what constitutes a valid model. While I'm sure they're all more qualified than me, I find it very counterintuitive to think of the assumptions as being so rigid. Just because you change your transition probabilities over time (or look back two states instead of one) doesn't mean you aren't still using the underlying logic of Markov models; predicting future states based on current information.

I would treat both these assumptions as very good guidelines; your model will almost certainly be more elegant and more useful if you conform to them. Also, it seems convenient to assume these properties are true of a Markov model unless told otherwise (like, in an exam). But what if the state sequence you're working with is such that breaking an assumption makes the model better? Of course you should break the assumption, and the resulting model will still be extremely Markovian.

There’s no concrete reason why you couldn’t have transition probabilities be a function of more than one previous state and/or a function of time, if it most accurately describes the system you’re trying to represent.

I just hope The Man is ready for my maverick views.

P.S. If you have any thoughts on this I’d love to discuss it with you (leave a comment)!

Next Post: What ISN'T a Markov process?


Due to my forthcoming exams on the subject, I will also be posting about object oriented design patterns, Objective C for Java developers and SQL. I trust that this will not upset my vast audience (currently zero, although if you’re reading this I guess not! Exciting! Leave a comment or something!).

Hugs and kisses.

Only if the second accessor is called inside the first. Otherwise that's coupling.

Saturday, 6 October 2012

Introduction to Markov Processes


Many natural systems can have their behavior described nicely as a series of states. For example: Insects are born in an ‘inside an egg’ state. Eventually most will transition to a ‘gross mucus-y larvae’ state, followed by an ‘annoying flying thing’ state, and ending in a ‘dead’ state. If we know the series of states and the order they occur in, we can represent the process as a chain of states, called a Markov chain:

The numbers on each transition represent the probability of an insect in the previous state going to the next state at the next time step. We know there’s random variation between insects in a species, so we can represent the variations by having different transitions an insect might make with different probabilities. The time step will be some discrete unit that you choose based on the context. For an insect with a short life span, a useful time step might be about an hour. So if we have 200 eggs of insects whose life cycles are represented by the above chain, 100 of them will hatch within an hour, and 50 will hatch in the second hour.

The crucial thing about this process is that the only thing we need to know to predict the likelihood of an insect transitioning into a particular next state is the current state. This property of a process is called the Markov property, and any random process that adheres to it is a Markov process.

When I first started looking into Markov processes, I was confused about why Markov models limited themselves to the Markov property. Why not just say that we can take other factors into account than the current state, such as previous states or external variables (like, for the insects, an external variable could be seasons; they aren’t expressed in the insect’s state, but they affect their life cycles)? Couldn't we then describe many more processes with Markov-like models, and much more accurately? Well, we could. But the beauty of a Markov model is its ability to describe many different things that occur in nature even though it’s very simple. This simplicity makes data from Markov processes much easier to manipulate. A good example of cool things you can do when you assume the Markov property is use of the Viterbi algorithm, which I will post about later.

Markov models can't account for wasps' likelihood of death increasing as they get closer to me and my vaccuum cleaner.


The above chain is very simple, because we know that insects basically always go from the first to the last state in order. But what if they didn’t? What if there were a species of insect that could sometimes go back into their eggs, or sometimes hatched already in the flying stage? I would find that deeply alarming, but we can still represent it with a Markov chain by adding in arrows for the new transition possibilities:

Here are a couple of things that can be inferred from the above Markov chain:

  •         Half of the insects hatch out of their eggs into larvae, while the other half go directly to the flying state. If we have one of the eggs, there is a 50% chance it will hatch into a larva.
  •      Eighty percent of the larvae will turn into flying things, and ten percent (presumably the losers) will go back into their egg for a while instead of getting on with their lives.
  •      After the insects die five percent of them rise again, returning to the flying state as zombies.

It’s also good to note that the time period expressed by the Markov chain doesn’t really ‘end’. So, it’s not that  5% of the above insects return to the flying state and then after that they definitely die; any insect in the dead state has a 5% chance of zombification at the next state, no matter how many  times they have been dead before. Since there’s no alternative dead state whose only outgoing arrow loops with itself (as in the previous example) these insects can’t be truly killed. I hope you can now appreciate the seriousness of Markov processes.

Having understood his species' Markov chain, he rejoices in his immortality.

Bonus links!

Next Post: (probability 0.85 of being about) Markov Assumptions (0.15 of being about my sordid personal life - based on the current post only!)