Discretize Density — discretize_density • latent2likert

Transforms the density function of a continuous random variable into a discrete probability distribution with minimal distortion using the Lloyd-Max algorithm.

Usage

discretize_density(density_fn, n_levels, eps = 1e-06)

Arguments

density_fn: probability density function.
n_levels: cardinality of the set of all possible outcomes.
eps: convergence threshold for the algorithm.

Value

A list containing:

prob: discrete probability distribution.
endp: endpoints of intervals that partition the continuous domain.
repr: representation points of the intervals.
dist: distortion measured as the mean-squared error (MSE).

Details

The function addresses the problem of transforming a continuous random variable $X$ into a discrete random variable $Y$ with minimal distortion. Distortion is measured as mean-squared error (MSE): $$ \text{E}\left[ (X - Y)^2 \right] = \sum_{k=1}^{K} \int_{x_{k-1}}^{x_{k}} f_{X}(x) \left( x - r_{k} \right)^2 \, dx $$ where:

$f_{X}$: is the probability density function of $X$,
$K$: is the number of possible outcomes of $Y$,
$x_{k}$: are endpoints of intervals that partition the domain of $X$,
$r_{k}$: are representation points of the intervals.

This problem is solved using the following iterative procedure:

$1.$: Start with an arbitrary initial set of representation points: $r_{1} < r_{2} < \dots < r_{K}$.
$2.$: Repeat the following steps until the improvement in MSE falls below given $\varepsilon$.
$3.$: Calculate endpoints as $x_{k} = (r_{k+1} + r_{k})/2$ for each $k = 1, \dots, K-1$ and set $x_{0}$ and $x_{K}$ to $-\infty$ and $\infty$, respectively.
$4.$: Update representation points by setting $r_{k}$ equal to the conditional mean of $X$ given $X \in (x_{k-1}, x_{k})$ for each $k = 1, \dots, K$.

With each execution of step $(3)$ and step $(4)$, the MSE decreases or remains the same. As MSE is nonnegative, it approaches a limit. The algorithm terminates when the improvement in MSE is less than a given $\varepsilon > 0$, ensuring convergence after a finite number of iterations.

This procedure is known as Lloyd-Max's algorithm, initially used for scalar quantization and closely related to the k-means algorithm. Local convergence has been proven for log-concave density functions by Kieffer. Many common probability distributions are log-concave including the normal and skew normal distribution, as shown by Azzalini.

References

Azzalini, A. (1985). A class of distributions which includes the normal ones. Scandinavian Journal of Statistics 12(2), 171–178.

Kieffer, J. (1983). Uniqueness of locally optimal quantizer for log-concave density and convex error function. IEEE Transactions on Information Theory 29, 42–47.

Lloyd, S. (1982). Least squares quantization in PCM. IEEE Transactions on Information Theory 28 (2), 129–137.

Examples

discretize_density(density_fn = stats::dnorm, n_levels = 5)
#> $prob
#> [1] 0.1063142 0.2444957 0.2983803 0.2444957 0.1063142
#> 
#> $endp
#> [1]       -Inf -1.2463707 -0.3831349  0.3831349  1.2463707        Inf
#> 
#> $repr
#> [1] -1.7264716 -0.7662698  0.0000000  0.7662698  1.7264716
#> 
#> $dist
#> [1] 0.07994185
#> 
discretize_density(density_fn = function(x) {
  2 * stats::dnorm(x) * stats::pnorm(0.5 * x)
}, n_levels = 4)
#> $prob
#> [1] 0.1647781 0.3380772 0.3359810 0.1611637
#> 
#> $endp
#> [1]       -Inf -0.5537368  0.3597728  1.2808536        Inf
#> 
#> $repr
#> [1] -1.04423728 -0.06323628  0.78278182  1.77892538
#> 
#> $dist
#> [1] 0.1026681
#>