Title: | Conditional Density Estimation Network Construction and Evaluation |
---|---|
Description: | Parameters of a user-specified probability distribution are modelled by a multi-layer perceptron artificial neural network. This framework can be used to implement probabilistic nonlinear models including mixture density networks, heteroscedastic regression models, zero-inflated models, etc. following Cannon (2012) <doi:10.1016/j.cageo.2011.08.023>. |
Authors: | Alex J. Cannon |
Maintainer: | Alex J. Cannon <[email protected]> |
License: | GPL-2 |
Version: | 1.2.5 |
Built: | 2024-11-20 02:48:22 UTC |
Source: | https://github.com/cran/CaDENCE |
A conditional density estimation network (CDEN) is a probabilistic extension of the standard multi-layer perceptron neural network (MLP) (Neuneier et al., 1994). A CDEN model allows users to estimate parameters of a specified probability distribution conditioned upon values of a set of predictors using the MLP architecture. The result is a flexible nonlinear model that can be used to calculate the conditional mean, variance, prediction intervals, etc. based on the specified distribution. Because the CDEN is based on the MLP, nonlinear relationships, including those involving complicated interactions between predictors, can be described by the modelling framework. The CaDENCE (Conditional Density Estimation Network Creation & Evaluation) package provides routines for creating and evaluating CDEN models in the R programming language.
Procedures for fitting CaDENCE models are
provided by cadence.fit
, which relies on the
standard optim
function, the CaDENCE rprop
function, or, optionally, the psoptim
function from the
pso
package. Once a model has been developed,
cadence.predict
is used to evaluate the
distribution parameters as a function of predictors.
The package also provides a variety of zero-inflated distributions, including
the Bernoulli-gamma (bgamma
),
Bernoulli-Weibull (bweibull
),
Bernoulli-Pareto 2 (bpareto2
), and
Bernoulli-lognormal (blnorm
), for use in the CaDENCE models.
gam.style
, dummy.code
, xval.buffer
,
and rbf
are helper functions that may be useful for
data preprocessing, model evaluation, and interpretation of
fitted relationships.
Most other functions are used internally and should not normally need to be called directly by the user.
Cannon, A.J., 2012. Neural networks for probabilistic environmental prediction: Conditional Density Estimation Network Creation & Evaluation (CaDENCE) in R. Computers & Geosciences 41: 126-135. doi:10.1016/j.cageo.2011.08.023
Neuneier, R., F. Hergert, W. Finnoff, and D. Ormoneit, 1994., Estimation of conditional densities: a comparison of neural network approaches. In: M. Marinaro and P. Morasso (eds.), Proceedings of ICANN 94, Berlin, Springer, p. 689-692.
Functions implementing the Bernoulli-gamma distribution, in which zero values
occur with probability 1-prob
and non-zero values follow a gamma
distribution with scale
and shape
parameters.
dbgamma
gives a probability density function (pdf),
pbgamma
gives the cumulative distribution
function (cdf), qbgamma
gives the quantile function (inverse cdf),
and rbgamma
is used for generating random variates.
dbgamma(x, prob, scale, shape) pbgamma(q, prob, scale, shape) qbgamma(p, prob, scale, shape) rbgamma(n, prob, scale, shape)
dbgamma(x, prob, scale, shape) pbgamma(q, prob, scale, shape) qbgamma(p, prob, scale, shape) rbgamma(n, prob, scale, shape)
x , q
|
vector of quantiles. |
p |
vector of cumulative probabilities. |
n |
number of random samples. |
prob |
probability of a non-zero value. |
scale |
scale parameter of the gamma distribution. |
shape |
shape parameter of the gamma distribution. |
dbgamma
gives the pdf, pbgamma
gives the cdf,
qbgamma
gives the inverse cdf (or quantile function), and
rbgamma
generates random deviates.
Cannon, A.J., 2008. Probabilistic multi-site precipitation downscaling by an expanded Bernoulli-gamma density network. Journal of Hydrometeorology, 9(6): 1284-1300.
dgamma
, bweibull
, bpareto2
,
blnorm
plot(rbgamma(365, prob = 0.2, scale = 1, shape = 1), type = "h")
plot(rbgamma(365, prob = 0.2, scale = 1, shape = 1), type = "h")
Functions implementing the Bernoulli-lognormal distribution, in which zero values
occur with probability 1-prob
and non-zero values follow a lognormal
distribution with meanlog
and sdlog
parameters.
dblnorm
gives a probability density function (pdf),
pblnorm
gives the cumulative distribution
function (cdf), qblnorm
gives the quantile function (inverse cdf),
and rblnorm
is used for generating random variates.
dblnorm(x, prob, meanlog, sdlog) pblnorm(q, prob, meanlog, sdlog) qblnorm(p, prob, meanlog, sdlog) rblnorm(n, prob, meanlog, sdlog)
dblnorm(x, prob, meanlog, sdlog) pblnorm(q, prob, meanlog, sdlog) qblnorm(p, prob, meanlog, sdlog) rblnorm(n, prob, meanlog, sdlog)
x , q
|
vector of quantiles. |
p |
vector of cumulative probabilities. |
n |
number of random samples. |
prob |
probability of a non-zero value. |
meanlog |
meanlog parameter of the lognormal distribution. |
sdlog |
sdlog parameter of the lognormal distribution. |
dblnorm
gives the pdf, pblnorm
gives the cdf,
qblnorm
gives the inverse cdf (or quantile function), and
rblnorm
generates random variates.
dlnorm
, bweibull
, bpareto2
,
bgamma
plot(rblnorm(365, prob = 0.2, meanlog = 1, sdlog = 1), type = "h")
plot(rblnorm(365, prob = 0.2, meanlog = 1, sdlog = 1), type = "h")
Functions implementing the Bernoulli-Weibull distribution, in which zero values
occur with probability 1-prob
and non-zero values follow a Weibull
distribution with scale
and shape
parameters.
dbweibull
gives a probability density function (pdf),
pbweibull
gives the cumulative distribution
function (cdf), qbweibull
gives the quantile function (inverse cdf),
and rbweibull
is used for generating random variates.
dbweibull(x, prob, scale, shape) pbweibull(q, prob, scale, shape) qbweibull(p, prob, scale, shape) rbweibull(n, prob, scale, shape)
dbweibull(x, prob, scale, shape) pbweibull(q, prob, scale, shape) qbweibull(p, prob, scale, shape) rbweibull(n, prob, scale, shape)
x , q
|
vector of quantiles. |
p |
vector of cumulative probabilities. |
n |
number of random samples. |
prob |
probability of a non-zero value. |
scale |
scale parameter of the weibull distribution. |
shape |
shape parameter of the weibull distribution. |
dbweibull
gives the pdf, pbweibull
gives the cdf,
qbweibull
gives the inverse cdf (or quantile function), and
rbweibull
generates random variates.
dweibull
, bgamma
, bpareto2
,
blnorm
plot(rbweibull(365, prob = 0.2, scale = 1, shape = 1), type = "h")
plot(rbweibull(365, prob = 0.2, scale = 1, shape = 1), type = "h")
The maximum likelihood cost function used for CDEN model fitting. Calculates the negative of the logarithm of the likelihood. A normal distribution prior can be set for the magnitude of the input-hidden layer weights, thus leading to weight penalty regularization.
cadence.cost(weights, x, y, n.hidden, hidden.fcn, distribution, sd.norm, valid)
cadence.cost(weights, x, y, n.hidden, hidden.fcn, distribution, sd.norm, valid)
weights |
weight vector of length returned by |
x |
matrix with number of rows equal to the number of samples and number of columns equal to the number of predictor variables. |
y |
column matrix of predictand values with number of rows equal to the number of samples. |
number of hidden nodes in the CDEN model. |
|
hidden layer transfer function. |
|
distribution |
a list that describes the probability density function associated with the predictand. |
sd.norm |
|
valid |
|
Fit a CDEN model via nonlinear optimization of the maximum likelihood cost function.
cadence.fit(x, y, iter.max = 500, n.hidden = 2, hidden.fcn = tanh, distribution = NULL, sd.norm = Inf, init.range = c(-0.5, 0.5), method = c("optim", "psoptim", "Rprop"), n.trials = 1, trace = 0, maxit.Nelder = 2000, trace.Nelder = 0, swarm.size = NULL, vectorize = TRUE, delta.0 = 0.1, delta.min = 1e-06, delta.max = 50, epsilon = 1e-08, range.mult = 2, step.tol = 1e-08, f.target = -Inf, f.cost = cadence.cost, max.exceptions = 500)
cadence.fit(x, y, iter.max = 500, n.hidden = 2, hidden.fcn = tanh, distribution = NULL, sd.norm = Inf, init.range = c(-0.5, 0.5), method = c("optim", "psoptim", "Rprop"), n.trials = 1, trace = 0, maxit.Nelder = 2000, trace.Nelder = 0, swarm.size = NULL, vectorize = TRUE, delta.0 = 0.1, delta.min = 1e-06, delta.max = 50, epsilon = 1e-08, range.mult = 2, step.tol = 1e-08, f.target = -Inf, f.cost = cadence.cost, max.exceptions = 500)
x |
matrix with number of rows equal to the number of samples and number of columns equal to the number of predictor variables. |
y |
column matrix of predictand values with number of rows equal to the number of samples. |
iter.max |
maximum number of iterations of the optimization function. |
number of hidden nodes in the CDEN model; can be a vector indicating a range of values to fit. |
|
hidden layer transfer function. |
|
distribution |
a list that describes the probability density function associated with the predictand. |
sd.norm |
|
init.range |
range for random weights on [ |
method |
specifies the optimization method used to minimize |
n.trials |
number of repeated trials used to avoid shallow local minima during optimization. |
trace |
the level of printing which is done during optimization. A value of |
maxit.Nelder |
maximum number of iterations of the Nelder-Mead optimization function prior to main calling |
trace.Nelder |
the level of printing which is done during Nelder-Mead optimization. A value of |
swarm.size |
|
vectorize |
|
delta.0 |
size of the initial update-value if |
delta.min |
minimum value for the adaptive update-value if |
delta.max |
maximum value for the adaptive update-value if |
epsilon |
step-size used in the finite difference calculation of the gradient if |
range.mult |
if |
step.tol |
convergence criterion if |
f.target |
target value of |
f.cost |
cost function to be optimized. |
max.exceptions |
maximum number of repeated exceptions allowed during optimization. |
Fit a CDEN model by optimizing the maximum likelihood cost function
f.cost
, which is set by default to cadence.cost
.
Optimization relies on the standard optim
function, the
built-in rprop
function, or, optionally,
the psoptim
function from the pso
package.
The hidden layer transfer function hidden.fcn
should be set to
tanh
for a nonlinear model and to identity
for a
linear model. In the nonlinear case, the number of hidden nodes n.hidden
controls the overall complexity of the model. The predictand distribution
is set by the distribution
argument. Parameters of the specified
distribution can be held constant via the parameters.fixed
element
distribution
. Weight penalty regularization for the magnitude of the
input-hidden layer weights can be applied by setting sd.norm
to a value
less than Inf
.
The distribution
argument in cadence.fit
is the most important
part of the CaDENCE
modelling framework and has been designed to be
as flexible as possible. To this end, distribution
is a list with three
mandatory elements: density.fcn
, which specifies the R density function
for the predictand distribution; parameters
, which specifies the names
of the parameters used as arguments in density.fcn
; and
output.fcns
, which specifies the functions used to constrain the density
function parameters to their allowable ranges (i.e., inverse link
functions). If not specified, distribution
defaults to a normal
distribution. Note: the order of parameters
and output.fcns
must
match the order of arguments in the specified density.fcn
.
A fourth element of distribution
, parameters.fixed
, is optional.
Setting parameters.fixed
="sd"
for the normal distribution would, for
example, force the sd
parameter to take a constant value.
Samples of distribution
lists for a variety of probability distributions
are given below for reference:
# normal distribution norm.distribution <- list(density.fcn = dnorm, parameters = c("mean", "sd"), parameters.fixed = NULL, output.fcns = c(identity, exp)) # lognormal distribution lnorm.distribution <- list(density.fcn = dlnorm, parameters = c("meanlog", "sdlog"), parameters.fixed = NULL, output.fcns = c(identity, exp)) # exponential distribution exp.distribution <- list(density.fcn = dexp, parameters = c("rate"), parameters.fixed = NULL, output.fcns = c(exp)) # Poisson distribution poisson.distribution <- list(density.fcn = dpois, parameters = c("lambda"), parameters.fixed = NULL, output.fcns = c(exp)) # Bernoulli-gamma distribution bgamma.distribution <- list(density.fcn = dbgamma, parameters = c("prob", "scale", "shape"), parameters.fixed = NULL, output.fcns = c(logistic, exp, exp)) # Bernoulli-Weibull distribution bweibull.distribution <- list(density.fcn = dbweibull, parameters = c("prob", "scale", "shape"), parameters.fixed = NULL, output.fcns = c(logistic, exp, exp)) # Bernoulli-lognormal distribution blnorm.distribution <- list(density.fcn = dblnorm, parameters = c("prob", "meanlog", "sdlog"), parameters.fixed = NULL, output.fcns = c(logistic, identity, exp)) # Bernoulli-Pareto 2 distribution bpareto2.distribution <- list(density.fcn = dbpareto2, parameters = c("prob", "scale", "shape"), parameters.fixed = NULL, output.fcns = c(logistic, exp, exp)) # beta distribution beta.distribution <- list(density.fcn=dbeta, parameters=c("shape1", "shape2"), parameters.fixed=NULL, output.fcns=c(exp, exp)) # truncated normal distribution with lower = 0 library(msm) dtnormal <- function(x, mean, sd) dtnorm(x, mean, sd, lower = 0) dtnorm.distribution <- list(density.fcn = dtnormal, parameters = c("mean", "sd"), parameters.fixed = NULL, output.fcns = c(identity, exp)) # mixture of two normal distributions (mixture density network) library(nor1mix) dnormix <- function(x, mu1, mu2, sig1, sig2, w1){ if(length(x) > 1){ dens <- mapply(dnormix, x, mu1 = mu1, mu2 = mu2, sig1 = sig1, sig2 = sig2, w1 = w1) } else{ mix <- norMix(mu = c(mu1, mu2), sigma = c(sig1, sig2), w = c(w1, 1-w1)) dens <- dnorMix(x, mix) } dens } normix.distribution <- list(density.fcn = dnormix, parameters = c("mu1", "mu2", "sig1", "sig2", "w1"), parameters.fixed = NULL, output.fcns = c(identity, identity, exp, exp, logistic))
Values of the Akaike information criterion with small sample size correction
(AICc), and Bayesian information criterion (BIC) are calculated to assist in
model selection. It is possible for such criteria to fail in the face of
overfitting, for example with a nonlinear model and n.hidden
set too
high, as the distribution may converge on one or more samples. This can usually
be diagnosed by inspecting the scale parameter of the distribution for near
zero values. In this case, one can apply a weight penalty (via sd.norm
),
although this rules out the straightforward use of AICc/BIC for model
selection as the effective number of model parameters will no longer equal the
number of weights in the CDEN model.
Note: values of x
need not be standardized or rescaled by the user.
Predictors are automatically scaled to zero mean and unit standard deviation
and are rescaled by cadence.predict
.
a list of with number of elements equal to the length of n.hidden
; each list consists of:
W1 |
input-hidden layer weights |
W2 |
hidden-output layer weights. Attributes indicating the
mean and standard deviation of columns of |
Cannon, A.J., 2012. Neural networks for probabilistic environmental prediction: Conditional Density Estimation Network Creation & Evaluation (CaDENCE) in R. Computers & Geosciences 41: 126-135. doi:10.1016/j.cageo.2011.08.023
Neuneier, R., F. Hergert, W. Finnoff, and D. Ormoneit, 1994., Estimation of conditional densities: a comparison of neural network approaches. In: M. Marinaro and P. Morasso (eds.), Proceedings of ICANN 94, Berlin, Springer, p. 689-692.
cadence.predict
, optim
, rprop
,
xval.buffer
, logistic
data(FraserSediment) set.seed(1) lnorm.distribution <- list(density.fcn = dlnorm, parameters = c("meanlog", "sdlog"), parameters.fixed = NULL, output.fcns = c(identity, exp)) fit <- cadence.fit(x = FraserSediment$x.1970.1976[c(TRUE, rep(FALSE, 19)),], y = FraserSediment$y.1970.1976[c(TRUE, rep(FALSE, 19)),, drop=FALSE], n.hidden = 3, n.trials = 1, maxit.Nelder = 100, trace.Nelder = 1, hidden.fcn = tanh, distribution = lnorm.distribution, trace = 1) pred <- cadence.predict(x = FraserSediment$x.1977.1979, fit = fit) matplot(pred, type = "l")
data(FraserSediment) set.seed(1) lnorm.distribution <- list(density.fcn = dlnorm, parameters = c("meanlog", "sdlog"), parameters.fixed = NULL, output.fcns = c(identity, exp)) fit <- cadence.fit(x = FraserSediment$x.1970.1976[c(TRUE, rep(FALSE, 19)),], y = FraserSediment$y.1970.1976[c(TRUE, rep(FALSE, 19)),, drop=FALSE], n.hidden = 3, n.trials = 1, maxit.Nelder = 100, trace.Nelder = 1, hidden.fcn = tanh, distribution = lnorm.distribution, trace = 1) pred <- cadence.predict(x = FraserSediment$x.1977.1979, fit = fit) matplot(pred, type = "l")
Random initialization of the weight vector used during fitting of the CDEN model.
cadence.initialize(x, n.hidden, init.range, distribution)
cadence.initialize(x, n.hidden, init.range, distribution)
x |
matrix with number of rows equal to the number of samples and number of columns equal to the number of predictors. |
number of hidden nodes in the CDEN model. |
|
init.range |
range for random weights on [ |
distribution |
list used to specify the predictand distribution |
Predict conditional distribution parameters from a fitted CDEN model.
The returned value is a matrix with columns corresponding to the parameters
of the probability distribution specified in the distribution
argument passed to cadence.fit
.
cadence.predict(x, fit)
cadence.predict(x, fit)
x |
matrix with number of rows equal to the number of samples and number of columns equal to the number of predictor variables. |
fit |
list returned by |
a matrix with number of rows equal to that of x
and columns
corresponding to the parameters of the distribution
argument passed to cadence.fit
.
data(FraserSediment) lnorm.distribution.fixed <- list(density.fcn = dlnorm, parameters = c("meanlog", "sdlog"), parameters.fixed = "sdlog", output.fcns = c(identity, exp)) fit <- cadence.fit(x = FraserSediment$x.1970.1976, y = FraserSediment$y.1970.1976, hidden.fcn = identity, maxit.Nelder = 100, trace.Nelder = 1, trace = 1, distribution = lnorm.distribution.fixed) pred <- cadence.predict(x = FraserSediment$x.1977.1979, fit = fit) matplot(pred, type = "l")
data(FraserSediment) lnorm.distribution.fixed <- list(density.fcn = dlnorm, parameters = c("meanlog", "sdlog"), parameters.fixed = "sdlog", output.fcns = c(identity, exp)) fit <- cadence.fit(x = FraserSediment$x.1970.1976, y = FraserSediment$y.1970.1976, hidden.fcn = identity, maxit.Nelder = 100, trace.Nelder = 1, trace = 1, distribution = lnorm.distribution.fixed) pred <- cadence.predict(x = FraserSediment$x.1977.1979, fit = fit) matplot(pred, type = "l")
Converts a factor (categorical) variable to a matrix of dummy codes using a 1 of C-1 binary coding scheme.
dummy.code(x)
dummy.code(x)
x |
a factor variable. |
a matrix with the number of rows equal to the number of cases in x
and the number of columns equal to one minus the number of factors in
x
. The last factor serves as the reference group.
print(dummy.code(iris$Species))
print(dummy.code(iris$Species))
A dataset consisting of daily observations of suspended sediment concentration (SSC) (mg/L) and stream discharge (Q) (cu. m/s) for the years 1970-1979 at the Fraser River at Hope station in British Columbia, Canada (Water Survey of Canada station 08MF005). Samples are split into a seven year training period (1970-1976) and a three year testing period (1977-1979).
In terms of structure, FraserSediment
is a list with four elements:
x.1970.1976
, y.1970.1976
, x.1977.1979
,
and y.1977.1979
. x.1970.1976
and x.1977.1979
are
matrices with predictor variables: logQ
, log-transformed Q; and
dQ5
, dQ30
, and dQ90
, 5-, 30-, and 90-day moving averages
of daily changes in Q. y.1970.1976
and y.1977.1979
are matrices
with the predictand variable SSC
.
data(FraserSediment) pairs(cbind(FraserSediment$x.1970.1976, FraserSediment$y.1970.1976))
data(FraserSediment) pairs(cbind(FraserSediment$x.1970.1976, FraserSediment$y.1970.1976))
GAM-style effects plots provide a graphical means of interpreting
relationships between predictors and conditional pdf parameter values
predicted by a CDEN. From Plate et al. (2000): The effect of the
i
th input variable at a particular input point Delta.i.x
is the change in f
resulting from changing X1
to x1
from b1
(the baseline value [...]) while keeping the other
inputs constant. The effects are plotted as short line segments, centered
at (x.i
, Delta.i.x
), where the slope of the segment
is given by the partial derivative. Variables that strongly influence
the function value have a large total vertical range of effects.
Functions without interactions appear as possibly broken straight lines
(linear functions) or curves (nonlinear functions). Interactions show up as
vertical spread at a particular horizontal location, that is, a vertical
scattering of segments. Interactions are present when the effect of
a variable depends on the values of other variables.
gam.style(x, fit, column, baseline = mean(x[,column]), additive.scale = FALSE, epsilon = 1e-5, seg.len = 0.02, seg.cols = "black", plot = TRUE, return.results = FALSE, ...)
gam.style(x, fit, column, baseline = mean(x[,column]), additive.scale = FALSE, epsilon = 1e-5, seg.len = 0.02, seg.cols = "black", plot = TRUE, return.results = FALSE, ...)
x |
matrix with number of rows equal to the number of samples and number of columns equal to the number of predictor variables. |
fit |
element from list returned by |
column |
column of |
baseline |
value of |
additive.scale |
if |
epsilon |
step-size used in the finite difference calculation of the partial derivatives. |
seg.len |
length of effects line segments expressed as a fraction of the range of |
seg.cols |
colors of effects line segments. |
plot |
if |
return.results |
if |
... |
further arguments to be passed to |
A list with elements:
effects |
a matrix of predictor effects. |
partials |
a matrix of predictor partial derivatives. |
Cannon, A.J. and I.G. McKendry, 2002. A graphical sensitivity analysis for interpreting statistical climate models: Application to Indian monsoon rainfall prediction by artificial neural networks and multiple linear regression models. International Journal of Climatology, 22:1687-1708.
Plate, T., J. Bert, J. Grace, and P. Band, 2000. Visualizing the function computed by a feedforward neural network. Neural Computation, 12(6): 1337-1354.
data(FraserSediment) set.seed(1) lnorm.distribution <- list(density.fcn = dlnorm, parameters = c("meanlog", "sdlog"), parameters.fixed = NULL, output.fcns = c(identity, exp)) x <- FraserSediment$x.1970.1976[c(TRUE, rep(FALSE, 24)),] y <- FraserSediment$y.1970.1976[c(TRUE, rep(FALSE, 24)),,drop=FALSE] fit.nlin <- cadence.fit(x, y, n.hidden = 2, n.trials = 1, hidden.fcn = tanh, distribution = lnorm.distribution, maxit.Nelder = 100, trace.Nelder = 1, trace = 1) fit.lin <- cadence.fit(x, y, hidden.fcn = identity, n.trials = 1, distribution = lnorm.distribution, maxit.Nelder = 100, trace.Nelder = 1, trace = 1) gam.style(x, fit = fit.nlin[[1]], column = 1, main = "Nonlinear") gam.style(x, fit = fit.lin[[1]], column = 1, additive.scale = TRUE, main = "Linear (additive.scale = TRUE)")
data(FraserSediment) set.seed(1) lnorm.distribution <- list(density.fcn = dlnorm, parameters = c("meanlog", "sdlog"), parameters.fixed = NULL, output.fcns = c(identity, exp)) x <- FraserSediment$x.1970.1976[c(TRUE, rep(FALSE, 24)),] y <- FraserSediment$y.1970.1976[c(TRUE, rep(FALSE, 24)),,drop=FALSE] fit.nlin <- cadence.fit(x, y, n.hidden = 2, n.trials = 1, hidden.fcn = tanh, distribution = lnorm.distribution, maxit.Nelder = 100, trace.Nelder = 1, trace = 1) fit.lin <- cadence.fit(x, y, hidden.fcn = identity, n.trials = 1, distribution = lnorm.distribution, maxit.Nelder = 100, trace.Nelder = 1, trace = 1) gam.style(x, fit = fit.nlin[[1]], column = 1, main = "Nonlinear") gam.style(x, fit = fit.lin[[1]], column = 1, additive.scale = TRUE, main = "Linear (additive.scale = TRUE)")
logistic
computes a logistic sigmoid (S-shaped) function bounded between 0 and 1.
logistic(x)
logistic(x)
x |
a numeric vector |
Functions implementing the Pareto 2 (Lomax) and Bernoulli-Pareto 2 distributions.
In the latter case, zero values occur with probability 1-prob
and
non-zero values follow the Pareto 2 distribution with scale
and
shape
parameters. dpareto2
and dbpareto2
give the probability density
functions (pdf); ppareto2
and pbpareto2
give the cumulative distribution
functions (cdf); qpareto2
and qbpareto2
give the quantile functions (inverse cdfs),
and rpareto2
and rbpareto2
are used for generating random variates.
dpareto2(x, scale, shape) ppareto2(q, scale, shape) qpareto2(p, scale, shape) rpareto2(n, scale, shape) dbpareto2(x, prob, scale, shape) pbpareto2(q, prob, scale, shape) qbpareto2(p, prob, scale, shape) rbpareto2(n, prob, scale, shape)
dpareto2(x, scale, shape) ppareto2(q, scale, shape) qpareto2(p, scale, shape) rpareto2(n, scale, shape) dbpareto2(x, prob, scale, shape) pbpareto2(q, prob, scale, shape) qbpareto2(p, prob, scale, shape) rbpareto2(n, prob, scale, shape)
x , q
|
vector of quantiles. |
p |
vector of cumulative probabilities. |
n |
number of random samples. |
prob |
probability of a non-zero value. |
scale |
scale parameter of the pareto2 distribution. |
shape |
shape parameter of the pareto2 distribution. |
dpareto2
and dbpareto2
gives the pdfs;
ppareto2
and pbpareto2
gives the cdfs;
qpareto2
and qbpareto2
gives the inverse cdfs
(or quantile functions); and rpareto2
and
rbpareto2
generate random variates.
Arnold, B.C., 1983. The Pareto Distributions, International Co-operative Publishing House, Fairland, MD.
Lomax, K.S., 1954. Business failures: another example of the analysis of failure data. Journal of the American Statistical Association, 49(268): 847-852.
plot(rbpareto2(365, prob = 0.2, scale = 1, shape = 1), type = "h")
plot(rbpareto2(365, prob = 0.2, scale = 1, shape = 1), type = "h")
Evaluate a kernel matrix based on the radial basis function kernel. Can be
used in conjunction with cadence.fit
with hidden.fcn
set to identity
and sd.norm
set to a value less than
infinity to implement a kernel CDEN model.
rbf(x, x.basis, sigma)
rbf(x, x.basis, sigma)
x |
matrix with number of rows equal to the number of samples and number of columns equal to the number of predictors. |
x.basis |
matrix with number of rows equal to the number of basis functions and number of columns equal to the number of predictors. |
sigma |
kernel width |
kernel matrix with number of rows equal to the number of samples and number of columns equal to the number of basis functions.
From Riedmiller (1994): Rprop stands for 'Resilient backpropagation' and is a local adaptive learning scheme. The basic principle of Rprop is to eliminate the harmful influence of the size of the partial derivative on the weight step. As a consequence, only the sign of the derivative is considered to indicate the direction of the weight update. The size of the weight change is exclusively determined by a weight-specific, so called 'update-value'.
This function implements the iRprop+ algorithm from Igel and Huesken (2003).
rprop(w, f, iterlim = 100, print.level = 1, delta.0 = 0.1, delta.min = 1e-06, delta.max = 50, epsilon = 1e-08, step.tol = 1e-06, f.target = -Inf, ...)
rprop(w, f, iterlim = 100, print.level = 1, delta.0 = 0.1, delta.min = 1e-06, delta.max = 50, epsilon = 1e-08, step.tol = 1e-06, f.target = -Inf, ...)
w |
the starting parameters for the minimization. |
f |
the function to be minimized. If the function value has an attribute called |
iterlim |
the maximum number of iterations before the optimization is stopped. |
print.level |
the level of printing which is done during optimization. A value of |
delta.0 |
size of the initial Rprop update-value. |
delta.min |
minimum value for the adaptive Rprop update-value. |
delta.max |
maximum value for the adaptive Rprop update-value. |
epsilon |
step-size used in the finite difference calculation of the gradient. |
step.tol |
convergence criterion. Optimization will stop if the change in |
f.target |
target value of |
... |
further arguments to be passed to |
A list with elements:
par |
The best set of parameters found. |
value |
The value of |
gradient |
An estimate of the gradient at the solution found. |
Igel, C. and M. Huesken, 2003. Empirical evaluation of the improved Rprop learning algorithms. Neurocomputing 50: 105-123.
Riedmiller, M., 1994. Advanced supervised learning in multilayer perceptrons - from backpropagation to adaptive learning techniques. Computer Standards and Interfaces 16(3): 265-278.
Caculates training/validation indices for N-fold cross-validation of a dataset. Cross-validation folds are taken as contiguous blocks of cases with an optional buffer to prevent leakage of information between training/validation subsets due to the presence of autocorrelation.
xval.buffer(n.cases, n.xval=5, buffer.length=0)
xval.buffer(n.cases, n.xval=5, buffer.length=0)
n.cases |
an integer specifying the length of the dataset. |
n.xval |
an integer specifying the desired number of cross-validation folds. |
buffer.length |
an integer specifying the number of cases to be left out as a buffer between the training/validation subsets. |
a list with n.xval
elements, each containing:
train |
indices of training cases |
valid |
indices of validation cases |
Shabbar, A. and V. Kharin. 2007. An assessment of cross-validation for estimating skill of empirical seasonal forecasts using a global coupled model simulation. CLIVAR Exchanges. 12(4): 10-12.
Zeng, Z., W.W. Hsieh, A. Shabbar, and W.W. Burrows, 2011. Seasonal prediction of winter extreme precipitation over Canada by support vector regression, Hydrology and Earth System Sciences, 15: 65-74.
print(xval.buffer(100, n.xval = 3, buffer.length = 10))
print(xval.buffer(100, n.xval = 3, buffer.length = 10))