Cross Validation Utilities

CPPLS.cv_classification — Function

cv_classification(; weighted::Bool=true)

Return a named tuple containing score_fn, predict_fn, select_fn, and flag_fn suited to CPPLS classification with one-hot response matrices. The scoring rule is based on nearest-mean classification and assumes models were fit in :discriminant mode.

The returned callbacks implement the following defaults: score_fn(Y_true, Y_pred) = 1 - nmc(Y_true, Y_pred, weighted), so larger scores are better and weighted=true applies inverse-frequency class weighting; predict_fn(model, X, k) = onehot(model, X, k) to obtain one-hot class predictions from a discriminant CPPLS fit; select_fn = argmax, so inner cross-validation chooses the component count with the largest score; and flag_fn(Y_true, Y_pred) = sampleclasses(Y_pred) .≠ sampleclasses(Y_true), which returns a per-sample misclassification mask.

This helper is meant to supply the callback interface expected by nestedcv, nestedcvperm, and outlierscan. In particular, score_fn, predict_fn, and select_fn can be passed directly to nestedcv and nestedcvperm, while flag_fn is used by outlierscan to count misclassified samples across repeated outer folds.

The higher-level DA workflow built on these callbacks is exposed through cvda and permda. In ordinary discriminant-analysis use, those wrappers are the preferred public entry points, while cv_classification remains the lower-level helper for direct calls to nestedcv, nestedcvperm, or related internals.

julia> cb = CPPLS.cv_classification();

julia> Y_true = [1 0; 0 1; 0 1];

julia> Y_pred = [0 1; 0 1; 0 1];

julia> cb.score_fn(Y_true, Y_pred)
0.5

julia> cb.flag_fn(Y_true, Y_pred)
3-element BitVector:
 1
 0
 0

julia> cb.select_fn([0.2, 0.6, 0.4])
2

source

CPPLS.cv_regression — Function

cv_regression(;
    score_fn=(Y_true, Y_pred) -> sqrt(mean((Y_true .- Y_pred) .^ 2)),
    select_fn=argmin,
)

Return a named tuple containing score_fn, predict_fn, and select_fn suited to CPPLS regression. By default, predictions are scored with root mean squared error and the selected number of components minimizes that loss.

The returned callbacks implement the following defaults: score_fn(Y_true, Y_pred) = sqrt(mean((Y_true .- Y_pred) .^ 2)), so smaller scores are better; predict_fn(model, X, k) = predict(model, X, k)[:, :, end], which extracts the prediction matrix for the requested component count from the 3-dimensional array returned by predict; and select_fn = argmin, so inner cross-validation chooses the component count with the smallest score.

This helper is meant to supply the callback interface expected by nestedcv and nestedcvperm for regression problems. The returned predict_fn extracts the prediction matrix corresponding to the requested number of components.

The higher-level regression workflow built on these callbacks is exposed through cvreg and permreg. In ordinary regression use, those wrappers are the preferred public entry points, while CPPLS.cv_regression remains the lower-level helper for direct calls to nestedcv, nestedcvperm, or related internals.

julia> cb = CPPLS.cv_regression();

julia> Y_true = reshape([1.0, 2.0], :, 1);

julia> Y_pred = reshape([1.0, 3.0], :, 1);

julia> cb.score_fn(Y_true, Y_pred) ≈ sqrt(0.5)
true

julia> B = reshape([2.0], 1, 1, 1);

julia> X_mean = [0.0];

julia> X_std = [1.0];

julia> Yprim_std = [1.0];

julia> model = CPPLSFitLight(B, X_mean, X_std, Yprim_std, :regression);

julia> X = reshape([1.0, 2.0], :, 1);

julia> cb.predict_fn(model, X, 1)
2×1 Matrix{Float64}:
 2.0
 4.0

julia> cb.select_fn([0.3, 0.2])
2

source