Class
NumCosmoMathStatsDist
Description [src]
abstract class NumCosmoMath.StatsDist : GObject.Object
{
/* No available fields */
}
Base class for implementing N-dimensional probability distributions.
Abstract class to reconstruct an arbitrary N-dimensional probability distribution. This class provides the tools to perform a radial basis interpolation in a multidimensional function using a radial basis function and then generates a new sample using the interpolation function as the kernel. This method generates a sample that is distributed by the original distribution, but in a more simple way since the used kernels are easier to sample from. For more information about radial basis interpolation, check [Radial Basis Function Interpolation, Wilna du Toit]. A brief description of the radial basis interpolation method can be found below.
Given a d-dimensional function $g(x): \mathbf{R}^d \rightarrow \mathbf{R}$, a radial basis function $\phi(x, \Sigma)$ is used such that \begin{align} \label{Interpolation_eq} s(x) = \sum_i^n \lambda_i \phi(|x-x_i|, \Sigma_i), \quad x~ \in~ \mathbf{R} . \end{align} The variables $\lambda_i$ represent the weights and are found such that \begin{align} \label{eqnnls1} s(x_i) = g(x_i) , \end{align} being $x_i$ the sample points. The values generated by $\phi(|x-x_i|, \Sigma_i)$ are displayed in a symmetric $n \times n$ matrix $\Phi$. This function depends on the norm of the points and on the covariance matrix $\Sigma$ associated with each point. The weights $\lambda_i$ are also organized in a matrix representation such that equation \eqref{eqnnls1} becomes \begin{align} \label{eqnnls} G = \lambda \times \Phi ,\end{align} where $G$ is a matrix containing all the function values $g(xi)$. Once the Lambda matrix is found, one may use $s(x)$ to sample values from $g(x)$, which is easier to do since $s(x)$ is a polynomial function.
We want $s(x)$ to be a probability distribution so we can sample from it. Therefore the Lambda matrix containing the weights is seen as the probability density and it must be minimized such that its values are always positive and sum up to one. To solve equation this problem, this algorithm has the tools to solve equation \eqref{eqnnls} for $\lambda$, which is a least-squares problem, using the NNLS method, which can be found in nnls.c file. Thus, the algorithm can randomly choose a kernel $\phi(|x-x_i|, \Sigma_i)$ associated to a probability contained in $\lambda$ and sample a point from it.
In this object, the radial basis interpolation function is not completely defined. One must choose one of the instances of the class, the
NcmStatsDistKernelST object or the NcmStatsDistKernelGauss object, which uses a multivariate Student’s t function and a Gaussian function as the kernel.
After initializing the desired object for the interpolation function, one may use the methods of this file to generate the interpolation and to
sample from the new interpolated function.
The user must provide the input the values: over_smooth - ncm_stats_dist_set_over_smooth(), split_frac - ncm_stats_dist_set_split_frac(),
over_smooth - ncm_stats_dist_set_over_smooth(), $v(x)$ - ncm_stats_dist_prepare_interp(). The other parameters
must be inserted when the instance for the NcmStatsDistKDE or the NcmStatsDistVKDE object is initialized. To perform a calculation of this class, one
needs to initialize the class within one of its subclasses (NcmStatsDistKernelGauss or NcmStatsDistKernelST), along with the input of a child object of the class
NcmStatsDistKernel. For more information about the algorithm, see the description below.
-Since this class does not define what type of kernel will be used in the calculation (the fixed kernel in the NcmStatsDistKDE class or the variable kernel in NcmStatsDistVKDE class),
one cannot compute the sample just using this instance. Also, it must be provided the function to be used as the kernel, which is implemented in the children from the class NcmStatsDistKernel.
When initializing the NcmStatsDistKDE or NcmStatsDistVKDE classes, the function to be used as the kernel is defined in the object initialization function.
-This class also needs a child object to compute the interpolation matrix $IM$ and the covariance matrices stored in cov_decomp to perform the interpolation,
which is kernel dependent and therefore also computed by the class child objects.
-Regarding the kernel types based on the radial basis function, $\phi(|x-x_i|)$, and how the sample points in ncm_stats_dist_sample() are generated,
see the different implementations of NcmStatsDistKernel, e.g., NcmStatsDistKernelGauss and NcmStatsDistKernelST
-Regarding how the functions ncm_stats_dist_eval() and ncm_stats_dist_eval_m2lnp() are implemented, see
the different implementations of NcmStatsDist, i.e., NcmStatsDistKDE and NcmStatsDistVKDE. These objects also
compute the covariance matrix of each sample point and other objects needed for the least-squares problem, when
computing the weights matrix ($\lambda$).
Instance methods
ncm_stats_dist_add_obs
Adds a new point y to the sample with weight 1.0.
This function must be called to insert an initial sample into the object, so the interpolation can be computed.
ncm_stats_dist_eval
Evaluate the distribution at $\vec{x}=$x. The method ncm_stats_dist_eval_m2lnp()
can be used to avoid underflow.
ncm_stats_dist_eval_m2lnp
Evaluate the distribution at $\vec{x}=$x. This method is more
stable than ncm_stats_dist_eval() since it avoids underflows
and overflows.
ncm_stats_dist_get_n_kernels
After the prepare call, this function returns the number of kernels used in the interpolation.
ncm_stats_dist_get_rnorm
Gets the value of the last $\chi^2$ fit obtained when computing the interpolation through ncm_stats_dist_prepare_interp().
ncm_stats_dist_get_sample_size
After the prepare call, this function returns the size of the sample used in the interpolation.
ncm_stats_dist_get_shrink
The shrink factor is used to shrink the weights of the sample points in the interpolation.
ncm_stats_dist_kernel_choose
Using the pseudo-random number generator rng chooses
a random kernel based on the computed weights.
ncm_stats_dist_peek_full_cov_decomp
Gets the full covariance matrix decomposition. This is a the Cholesky decomposition of the covariance matrix of the whole sample.
ncm_stats_dist_prepare
Prepares the object for calculations. This function prepares the weight matrix and sets all the weights to 1.0/sample size. It also calls the kernel_prepare function, implemented by a child, and calls the get_href function.
ncm_stats_dist_prepare_interp
Prepares the object for calculations. Using the distribution values at the sample points. This function calls the prepare function and prepares the needed objects to compute the least squares problem. The interpolation matrix IM is prepared by a child object and called in this function. Then, depending on the cross validation method, the function solves the least squares problem using the ncm_nnls object.
ncm_stats_dist_prepare_kernel
Prepares the object for computations of the individuals kernels
and is usually part of ncm_stats_dist_prepare() and is should not
be called directly.
ncm_stats_dist_sample
Using the pseudo-random number generator rng generates a
point from the distribution and copy it to x.
ncm_stats_dist_set_cv_type
Sets the cross-validation method to cv_type.
If the selected method is none, all the sample points
will be used to compute the interpolation. If the cv_type is the cv_split,
a split fraction of the points are randomly excluded and the interpolation
is computed to a best fit of the remaining sample points,
which leads to a more point independent interpolation.
ncm_stats_dist_set_kernel
Sets the kernel to be used in the interpolation. The different types of kernels are: the gaussian kernel and the student-t kernel, which are under the file names ncm_stats_dist_kernel_gauss.c and ncm_stats_dist_kernel_st.c.
ncm_stats_dist_set_shrink
Sets the shrink factor to shrink. The shrink factor is used to shrink the weights of
the sample points in the interpolation. A shrink factor of 0.0 no shrinkage is applied,
a shrink factor of 1.0 full shrinkage is applied.
ncm_stats_dist_set_split_frac
Sets cross-correlation split fraction to split_frac.
This method shall be used when the cv_type is the cv_split.
The split fraction determines the fraction of sample points
that will be left out to use the cross validation method.
Signals
Signals inherited from GObject (1)
GObject::notify
The notify signal is emitted on an object when one of its properties has its value set through g_object_set_property(), g_object_set(), et al.
Class structure
struct NumCosmoMathStatsDistClass {
void (* set_dim) (
NcmStatsDist* sd,
const guint dim
);
gdouble (* get_href) (
NcmStatsDist* sd
);
void (* prepare_kernel) (
NcmStatsDist* sd,
GPtrArray* sample_array
);
void (* prepare) (
NcmStatsDist* sd
);
void (* prepare_interp) (
NcmStatsDist* sd,
NcmVector* m2lnp
);
void (* compute_IM) (
NcmStatsDist* sd,
NcmMatrix* IM
);
NcmMatrix* (* peek_cov_decomp) (
NcmStatsDist* sd,
guint i
);
NcmMatrix* (* peek_full_cov_decomp) (
NcmStatsDist* sd
);
NcmMatrix* (* peek_full_cov) (
NcmStatsDist* sd
);
gdouble (* get_lnnorm) (
NcmStatsDist* sd,
guint i
);
gdouble (* eval_weights) (
NcmStatsDist* sd,
NcmVector* weights,
NcmVector* x
);
gdouble (* eval_weights_m2lnp) (
NcmStatsDist* sd,
NcmVector* weights,
NcmVector* x
);
void (* reset) (
NcmStatsDist* sd
);
}
No description available.
Class members
set_dim: void (* set_dim) ( NcmStatsDist* sd, const guint dim )No description available.
get_href: gdouble (* get_href) ( NcmStatsDist* sd )No description available.
prepare_kernel: void (* prepare_kernel) ( NcmStatsDist* sd, GPtrArray* sample_array )No description available.
prepare: void (* prepare) ( NcmStatsDist* sd )No description available.
prepare_interp: void (* prepare_interp) ( NcmStatsDist* sd, NcmVector* m2lnp )No description available.
compute_IM: void (* compute_IM) ( NcmStatsDist* sd, NcmMatrix* IM )No description available.
peek_cov_decomp: NcmMatrix* (* peek_cov_decomp) ( NcmStatsDist* sd, guint i )No description available.
peek_full_cov_decomp: NcmMatrix* (* peek_full_cov_decomp) ( NcmStatsDist* sd )No description available.
peek_full_cov: NcmMatrix* (* peek_full_cov) ( NcmStatsDist* sd )No description available.
get_lnnorm: gdouble (* get_lnnorm) ( NcmStatsDist* sd, guint i )No description available.
eval_weights: gdouble (* eval_weights) ( NcmStatsDist* sd, NcmVector* weights, NcmVector* x )No description available.
eval_weights_m2lnp: gdouble (* eval_weights_m2lnp) ( NcmStatsDist* sd, NcmVector* weights, NcmVector* x )No description available.
reset: void (* reset) ( NcmStatsDist* sd )No description available.
Virtual methods
NumCosmoMath.StatsDistClass.peek_cov_decomp
Gets the covariance matrix associated with the i-th kernel.
NumCosmoMath.StatsDistClass.peek_full_cov_decomp
Gets the full covariance matrix decomposition. This is a the Cholesky decomposition of the covariance matrix of the whole sample.
NumCosmoMath.StatsDistClass.prepare
Prepares the object for calculations. This function prepares the weight matrix and sets all the weights to 1.0/sample size. It also calls the kernel_prepare function, implemented by a child, and calls the get_href function.
NumCosmoMath.StatsDistClass.prepare_interp
Prepares the object for calculations. Using the distribution values at the sample points. This function calls the prepare function and prepares the needed objects to compute the least squares problem. The interpolation matrix IM is prepared by a child object and called in this function. Then, depending on the cross validation method, the function solves the least squares problem using the ncm_nnls object.
NumCosmoMath.StatsDistClass.prepare_kernel
Prepares the object for computations of the individuals kernels
and is usually part of ncm_stats_dist_prepare() and is should not
be called directly.