Knowing vs Understanding

I came across this video recording of Richard Feynman titled "Knowing versus Understanding" when I was a PhD student.

In the video, Feynman told a hypothetical story about how ancient people by observing the location of the sun and the moon, is able to come up with a bunch of steps of calculations, such that they can predict the location of the sun and the moon in the sky, and phenomena like eclipses at high accuracy. Then a young man came along and proposed a theory that the sun, the earth and the moon are spheres moving around each other. What greeted him was the question: "How accurate can you predict the eclipse?" And people waved him off when he said his theory was still under development.

In my field of legged robot control, reinforcement learning (RL) has taken over the field. Every other week, there will be a new video of humanoid robot doing cool stuff that I thought was still decades away when I was a PhD student, such as dancing, playing table tennis and parkour. Behind the scene is this technology called sim-to-real, where reinforcement learning is used to train policies for thousands of humanoids in simulation and then deploy them to the real world.

It is remarkable how similar the situation is to Feynman's story. We have this neural network trained with reinforcement learning, which under the hood, is a model that takes a bunch of steps of calculations, and come up with an action. While we have a pretty good theory of reinforcement learning itself, we don't really have a good theory of the policy that it comes up with, i.e., an analogue of the sun-earth-moon system.

It used to be the case that model predictive control (MPC) was the state of the art in legged robot control. But now in terms of demo that can impress people, RL is eating MPC alive. A guy who used to work on MPC even joked during last year's ICRA that 2025 might be the last year where we can still see MPC paper. And the phrase "bitter lesson" is being repeated more and more often.

However, even as a RL practictioner (I did my PhD thesis on RL for legged robot), I believe methods that can provide some explainability to what the policy is doing is still valuable. This will be the analogue of the theory of the sun-earth-moon are spheres moving around each other by the young man in Feynman's story. I believe instead of abandoning the good old model-based control approach, this is a perfect timing to pick it up. For one, there is a performance gap between model-based and RL-based methods, that means the room for improvement and innovation is huge. Second, before we had only a rough idea of how to control these robots well, now with reinforcement learning, we actually have a computational model (albeit a black-box one) that can achieve this goal. We just need to find out how to turn that into a white-box model (inverse optimal control?).

To summarize, while RL is hard to beat, we should be patient and not give up on model-based control. In reality, it took hundreds of years, from Kepler, Newton, to Einstein, to come up with the theory of general relativity that can describe astronomical phenomena. And the theory turns out to be more than just providing accurate prediction, it enables practical technologies such as GPS. For legged robot control, we probably won't need hundreds of years, I hope that there are still enough people working on this and eventually come up with a breakthrough in my lifetime.


Questions or corrections? Reach me at zxieaa@gmail.com.