Learning Alarm: Simple RL Alarm
The first version of Learning Alarm is now live. Using TensorFlow, I implemented a very simple neural network that monitors an alarm clock application and uses reinforcement learning to improve its behavior. The app works as follows.
Step 1: The user inputs the desired time he/she wants to wake up.
Step 2: Each day, the application starts up an hour before the target time.
Step 3: The neural network will tell the alarm when to sound the alarm. There is a 20% chance that the action is random, which allows the network to explore various possibilities. Otherwise it will execute what it believes to be the optimal time to wake the user up at the target time.
Step 4: When the user finally wakes up, he/she presses a big red button to tell the application to stop
Step 5: The app sends a reward to the network based on how close to the target time that the button is pressed. The closer the user wakes up to the target time the greater the reward. If the user wakes up after the target time, the penalty is doubled.
Step 6: The network takes the reward and updates its value function for that given action.
Step 7: Each day, the alarm learns more and more about the user and over time finds the optimal way to wake him/her up.
Currently, the only variable is the time the alarm will sound. I would like to add more variety by introducing different sounds the alarm can play, the volume it plays them at, and having the ability to play alarms at multiple times. However, the added variables will mean that the alarm will take longer to optimize and the user might be long gone before it becomes even remotely useful.
To reduce complexity, I can have the alarm start only half an hour before the target time and calculate a value for every 5 minutes. This will reduce the number of instances for time values from 60 to 6 and will offset the addition of new variables. I would also like to implement more advance reinforcement learning techniques, such as hidden layers of weights or backpropagation. My goal is to have the network learn more quickly by updating its value function for more than one action at a time.
I am excited to see how far I can push this. Below is a link to the code if you want to test it yourself. Please let me know if you have any ideas/suggestions.
Leave a Reply