ML101

This is a loose collection of questions that I stumbled over during my studies.

Probability Theory

What is a score function estimator (REINFORCE estimator)?

Why do we use the cross entropy loss in classification?

What is the ELBO?

How is linear regression connected to maximum likelihood?

What is the difference between Pearson and Spearman correlation?

How can the variance of the score function estimator be decreased?

Engineering

How does the attention mechanism (Attention is all you need) work?

How to map numbers in an interval \([a, b]\) onto another interval \([c, d]\) ?

How can the average loss be calculated with batch means?

Why should a MSE loss be avoided after a sigmoid layer?