2

I'm a software developer who keeps trying (and failing) to get my head around AI and neural networks. There is one area that sparked my interest recently - simulating a mouse "homing in" on a piece of cheese by following the smell. Based on the rule that moving closer to the cheese = stronger smell = good, then it feels like it should be quite a simple problem to solve - in theory at least!

My thought process was to start by placing the mouse and cheese in random positions on the screen. I would then move the mouse one step in a random direction and measure its distance to the cheese, and if it's closer than before (stronger smell) then that's good. This is where I come unstuck on the theory - this "feedback" somehow needs to modify the mechanism used to move the mouse, gradually refining it until the mouse is able to head straight towards the cheese. Once "trained", I should be able to reposition the cheese and expect the mouse to travel to it more quickly. Note I'm also keeping things simple by not having obstacles for the mouse to negotiate around.

How on earth would this be implemented with a NN? I understand the basic concepts, but I find that things unravel once I start looking at real code! The examples I've seen typically start by training the NN from a data set, but this doesn't seem to apply here as it feels like the only training available is "on the fly" as the mouse moves around (i.e. closer = good, further away = bad). I'm assuming the brain has some kind of "reward mechanism" triggered by a stronger smell of cheese.

Am I barking up the wrong tree - either with my thought process, or NN not being a good fit for this problem? This isn't homework btw, just something that I've been puzzling over in the back of my mind.

DukeZhou
  • 6,237
  • 5
  • 25
  • 53
  • 2
    You have stumbled on reinforcement learning. I’ll give a better answer later but searching for ‘gridworld reinforcement learning’ will head you in the right direction. – Jaden Travnik Mar 09 '18 at 13:27

1 Answers1

1

I would go with a NEAT algorithm for this one. Basically, you "breed" the right network that does the best every generation. Then Rinse and repeat. Doesn't require a training datbase. Bots compete and evolve as they go.

For example, this demonstration shows how Neataptic.js teaches a bunch of bots to follow a target: https://wagenaartje.github.io/neataptic/articles/targetseeking/

Here is more complex problem - a game of Agar.io: https://wagenaartje.github.io/neataptic/articles/agario/

Alexus
  • 236
  • 1
  • 8