Viktor Stein<p>My first preprint is online: <a href="https://arxiv.org/abs/2402.04613" target="_blank" rel="nofollow noopener noreferrer" translate="no"><span class="invisible">https://</span><span class="">arxiv.org/abs/2402.04613</span><span class="invisible"></span></a> :)</p><p>We define and analyse Maximum Mean Discrepancy (MMD) regularized \(f\)-divergences \( D_{f, \nu}\) and their <a href="https://mathstodon.xyz/tags/Wasserstein" class="mention hashtag" rel="tag">#<span>Wasserstein</span></a> gradient flows. </p><p>We define the \(\lambda\)-regularized \(f\)-divergence for \(\lambda>0\) as<br />\[D_{f, \nu}^\lambda(\mu) := \min_{\sigma\in M_+(\mathbb R^d)} D_{f, \nu}(\sigma) + \frac{1}{2 \lambda} d_K(\mu, \sigma)^2,\] <br />(yes, the min is attained!) where \( d_K \) is the kernel metric<br />\[ d_K(\mu, \nu) := \| m_{\mu - \nu} \|_{\mathcal H_K},\]<br />where \( (\mathcal H_K, \| \cdot \|_{\mathcal H_K}) \) is the Reproducing Kernel Hilbert Space for the kernel \( K \colon \mathbb R^d \times \mathbb R^d \to \mathbb R \) and <br />\[ m \colon M(\mathbb R^d) \to \mathcal H_K, \qquad \mu \mapsto \int_{\mathbb R^d} K(x, \cdot) \, \textrm{d}\mu(x)\]<br />is the kernel mean embedding (KME) of finite signed measures in the RKHS.<br />One can image the KME to be the generalization of the <a href="https://mathstodon.xyz/tags/KernelTrick" class="mention hashtag" rel="tag">#<span>KernelTrick</span></a> from points in \( \mathbb R^d \) to measures on \( \mathbb R^d \).</p><p>We then show that for any \( \nu \in M_+(\mathbb R^d) \) there exists a proper convex lower semicontinuous functional \( G_{f, \nu} \colon \mathcal H_K \to (- \infty, \infty] \) such that \[ D_{f, \nu}^{\lambda} = G_{f, \nu}^{\lambda} \circ m,\] where \(F^{\lambda}\) denotes the normal Hilbert space <a href="https://mathstodon.xyz/tags/MoreauEnvelope" class="mention hashtag" rel="tag">#<span>MoreauEnvelope</span></a> of \(F\).</p><p>We can now use standard Convex Analysis in Hilbert spaces to calculate the (\(\frac{1}{\lambda}\)-Lipschitz-continuous) gradient of \(D_{f, \nu}^{\lambda}\) and find the limits for \( \lambda \to \{ 0, \infty \}\) (pointwise and in the sense of Mosco), showing that \( D_{f, \nu}^{\lambda}\) interpolates between \( D_{f, \nu}\) and $d_K(\cdot, \nu)^2\).</p>