Multilayer (vanilla) RNN.

hₜˡ = tanh[ Wˡ ⋅ (hₜˡ⁻¹, hₜ₋₁ˡ)ᵀ ].

The Beta distribution is the probability distribution of a probability.

For example, let p be the probability of some event happaning. Assume that we don't know p exactly, but know that p should lie within approximately [0.1, 0.35], and is most likely about 0.2. Then we may use the Beta(20, 80) distribution to represent this knowledge, because its mean value is 20/(20+80) = 0.2 and it lies almost entirely within [0.1, 0.35].

More along these lines: stats.stackexchange.com/a/4778

Oh, actually I meant to attach this figure.

Source: Strang (1993) The Fundamental Theorem of Linear Algebra

Let A be an n × m matrix with n > m that has linearly independent columns.
Consider the eq. Ax = b, where b is *not* in the column space. Then Ax = b cannot be solved. Instead we can aim at minimizing the error (b - Ax).
The vector b can be decomposed as b = p + e, where p is in the column space of A and e is in the nullspace of Aᵀ.
Now we can approximate the "solution" to Ax = b by solving Ax = p. In fact, the solution to Ax = p minimizes the squared error ||b - Ax||².

Fig. from Strang (1993)

Grid Search no more!

Here is a very nice illustration from Bergstra & Bengio (2012) why Random Search is often superior to Grid Search for purposes of parameter choice -- Random Search gives by far the better approximations to the important univariate parameter distributions.

Turns out an ancient paper(*) has the answer.
If z = u₁ + iv₁ and w = u₂ + iv₂, where u₁, u₂, v₁, v₂ ~ N(0,1) (and independent), then the probability density of
r := |wz|
is given by
rK₀(r),
where K₀ denotes the modified Bessel function of the second kind with order 0.

(*) Wells, Anderson, Cell (1962) "The Distribution of the Product of Two Central or Non-Central Chi-Square Variates"

Consider two random complex numbers
z = u₁ + iv₁ and
w = u₂ + iv₂,
where u₁, v₁, u₂, v₂ are independent standard normal random variables (N(0,1)).
Then what is the probability distribution of the absolute value of the product |zw|?
Some empirical investigation (simulation) shows that the distribution looks like this:

Consider a matrix $$A\in\mathbb{R}^{m\times n}$$. Then $$Ax$$ is a linear mapping from $$\mathbb{R}^n$$ to $$\mathbb{R}^m$$. There is an $$r\leq n$$, such that:

1. $$A$$ maps an $$r$$-dim. subspace of $$\mathbb{R}^n$$ to an $$r$$-dim. subspace of $$R^m$$ (the column space or image of $$A$$).

2. The other $$(n-r)$$-dimensional subspace of $$\mathbb{R}^n$$, called the null space of $$A$$, is mapped to 0.

Figure from Strang (1993) "The Fundamental Theorem of Linear Algebra".

The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!