As asked
Explain the math behind DDPM. What is the forward process doing to the data, what distribution does it converge to, and how does the reverse process reconstruct samples? Where does the neural network actually sit in this picture?
Sample answer outline
The forward process is a fixed Markov chain that gradually adds Gaussian noise over T steps following a variance schedule, driving the data distribution toward a standard normal. The reverse process learns to denoise step by step, and the network is trained to predict the noise (or the score) added at each step. A strong answer mentions the reparameterization that lets you jump to any noise level in one shot, which is what the L_simple loss exploits.
Expect these follow-ups
- Why does predicting the noise work better than predicting the original image directly in practice?
- What happens if your variance schedule is too aggressive early in training?