What does world model learn?
April 6, 2026When I was doing my post-doc at Stanford, Tom Van Wouwe, a fellow post-doc was working on learning a dynamic model for some musckuloskeletal system. If I remember correctly, back then he was using OpenSim, which was a popular simulation tool for biomechanics but was notoriously slow to run. The hope is that a learned model can speed up the simulation by order of magnitudes. One day he asked me (I am paraphrasing here), why should we expect the learned model to run faster than the original simulation?
Now that the World Model has become a hot topic in robotics and AI, I thought about the conversation a lot. In the context of articulated rigid body simulation, which is my field of interest, what did the world model trained with simulation data really learn? I can think of three possibilities:
Possibility 1: The model learns a look up table of the training data, with some compression. In this case, with enough training data, the model has good interpolation capability, and probably extrapolate a bit. It can run much faster than the simulator, but I doubt it will generalize to unseen states that is far away from the training data.
Possibility 2: The model learns the underlying physics of the system. In this case, we can conjecture that the model probably implement the Featherstone algorithm with the neural network. In this canse, should we expect the neural network to run faster than the simulator itself? One possibility is that the machine learning algorithm is smarter than us and is able to find an implementation that beats our hand-coded one. But that seems unlikely.
Possibility 3: The model discovers another explaination of the data, i.e., forget about Newton's second law, there is another law of physics that is better and can be implemented faster. Again, seems unlikely.
I am less interested in the first possibility (which I think is the most likely one), since a learned look-up table, while computationally more efficient, does not seem usable for real world planning that requires generalization. Now suppose the second or third possibilies are actually what happens, to me personally, it will be fun to extract that more efficient implementation or brand new law of physics than using the learned black box model for some downstream tasks.
So that is my thought on the world model. Again, purely in the context of learning a dynamic model for articulated rigid body simulation, which can be useful for research in legged robots or digital humans. While I am skeptical about using a learned model to do planning (outside the region of the training data), a learned model does have other applications, e.g., a smooth local approximation of the dynamics that is inherently non-smooth due to contacts, enabling gradient based optimization (again, valid only locally). There might also be other possibilities that I haven't thought of. I am looking forward to be proven wrong though, and see what the world model can bring to the field of robotics.
Questions or corrections? Reach me at zxieaa@gmail.com.