What I want to work on in 2026

April 27, 2026

Every now and then I will be asked what I want to work on next. Almost every time I give a slightly different answers based on what was in my head at the moment, and often regrets why I haven't mentioned this and that. So I decide to write down what I think are exciting research directions at the moment. With the technology advancing so fast, the list is likely to be outdated soon, with some ideas being solved within the next few weeks, or exciting new ideas popping up. So here is my list, in the year of 2026.

1. Inverse optimal control

With reinforcement learning being the dominant paradigm for synthesizing athletic control for legged robots, should we abandon model-based control approaches such as model predictive control? I actually feel like this is an unique opportunity for model-based method. Before, we have no idea what the correct controller is, and given the complexity of the problem, researchers had to rely on their intuion to come up with heuristic and simplification to make MPC work. Now we have this amazing tool called reinforcement learning that can produce super capable controllers. Instead of treating it as a black box (which is totally fine if all we care about is to have this controller to do whatever we are asking it to do), it will be interesting to see if we can come up with an optimal control formulation with a behavior that matches the learned controller. Besides satisfying scientific curiosity, this may also lead to more robust and interpretable controllers.

2. Robust optimal control

A reason that a reinforcement learning controller is so much more robust is due to the randomization scheme used during training. While as far as I know, optimal control problem for legged robot often assumes a determinstic model. In the region where the model is inaccurate, the controller easily fails. I am not sure why we haven't seen much work on robust optimal control formulation for legged robots, but I think with the right formulation, we can have a optimal controller matching the robustness of those of a RL trained controller.

3. Energy-based model and reinforcement learning

In most reinforcement learning work in legged robot, during training, the policy often outputting a Guassian distribution, limiting the policy to learn uni-modal behavior. On the other hand, Energy-based model formulation, and its variant such as diffusion model, can learn multi-modal distribution. It will be interesting to see how to employ such models in RL to enable multi-model behavior learning. There are some recent works on this direction, but I think the results are still not very satisfactory. Another closely related direction is sampling-based MPC, which can also generate multi-modal behavior. There is a also a nice connection between sampling-based MPC, energy-based model and RL, based on the paradigm of what is called Control as Inference. I believe the unification of these approaches can lead to more powerful control policies and more efficient learning.

4. Animal-like athlectic intelligence

While RL has achieved great success in controlling robots and virtual characters, I believe it is fair to say what they learn is nowhere close to what animals can or will do. For example, a squirrel can pass through a complex parkour course within a few trys, while it probably will take a RL agent millions of time to achive something similar. Not to mention animals are probably more energy efficient with far less sensing and actuation capabilities. A phrase I often hear is that flat ground walking is solved. While it may be true if all we care about is to have legged robots walking reliably on flat ground, regardless how they do it. I think to truly achieve animal-level athletic intelligence, even in simple scenarios such as flat ground walking, is still far from solved. My dream will be to have a simulated squirrel that can go across a parkour course with the same agility as a real squirrel.

Questions or corrections? Reach me at zxieaa@gmail.com.