Title: | GEV Conditional Density Estimation Network |
---|---|
Description: | Implements a flexible nonlinear modelling framework for nonstationary generalized extreme value analysis in hydroclimatology following Cannon (2010) <doi:10.1002/hyp.7506>. |
Authors: | Alex J. Cannon [aut, cre]
|
Maintainer: | Alex J. Cannon <[email protected]> |
License: | GPL-3 |
Version: | 1.1.6-2 |
Built: | 2025-02-15 03:08:30 UTC |
Source: | https://github.com/cran/GEVcdn |
Parameters in a Generalized Extreme Value (GEV) distribution are specified as a function of covariates using a conditional density estimation network (CDN), which is a probabilistic variant of the multilayer perceptron neural network. If the covariate is time or is dependent on time, then the GEV CDN model can be used to perform nonlinear, nonstationary GEV analysis of hydrological or climatological time series. Owing to the flexibility of the neural network architecture, the model is capable of representing a wide range of nonstationary relationships, including those involving interactions between covariates. Model parameters are estimated by generalized maximum likelihood, an approach that is tailored to the estimation of GEV parameters from geophysical time series.
Procedures for fitting GEV CDN models are provided by the functions
gevcdn.fit
and gevcdn.bag
. Once a model has been
developed, gevcdn.evaluate
is used to evaluate the GEV
distribution parameters as a function of covariates. Confidence intervals for
GEV parameters and specified quantiles can be estimated using
gevcdn.bootstrap
. All other functions are used internally and
should not normally need to be called directly by the user.
Note: the GEV distribution functions dgev
, pgev
,
qgev
, and rgev
are from the archived
VGAM 1.0-6
package. The convention for the sign of the shape parameter
is opposite to that used in hydrology and thus differs from Cannon (2010).
Cannon, A.J., 2010. A flexible nonlinear modelling framework for nonstationary generalized extreme value analysis in hydroclimatology. Hydrological Processes, 24: 673-685. DOI: 10.1002/hyp.7506
Cannon, A.J., 2011. GEVcdn: an R package for nonstationary extreme value analysis by generalized extreme value conditional density estimation network, Computers & Geosciences, 37: 1532-1533. DOI: 10.1016/j.cageo.2011.03.005
Yee, T.W., 2018. VGAM: Vector Generalized Linear and Additive Models. R package version 1.0-6.
Generalized extreme value density, distribution, quantile, and random variate functions.
dgev(x, location = 0, scale = 1, shape = 0, log = FALSE, tolshape0 = sqrt(.Machine$double.eps)) pgev(q, location = 0, scale = 1, shape = 0, lower.tail = TRUE, log.p = FALSE) qgev(p, location = 0, scale = 1, shape = 0, lower.tail = TRUE, log.p = FALSE) rgev(n, location = 0, scale = 1, shape = 0) dgumbel(x, location = 0, scale = 1, log = FALSE) pgumbel(q, location = 0, scale = 1, lower.tail = TRUE, log.p = FALSE) qgumbel(p, location = 0, scale = 1, lower.tail = TRUE, log.p = FALSE) rgumbel(n, location = 0, scale = 1) is.Numeric(x, length.arg = Inf, integer.valued = FALSE, positive = FALSE)
dgev(x, location = 0, scale = 1, shape = 0, log = FALSE, tolshape0 = sqrt(.Machine$double.eps)) pgev(q, location = 0, scale = 1, shape = 0, lower.tail = TRUE, log.p = FALSE) qgev(p, location = 0, scale = 1, shape = 0, lower.tail = TRUE, log.p = FALSE) rgev(n, location = 0, scale = 1, shape = 0) dgumbel(x, location = 0, scale = 1, log = FALSE) pgumbel(q, location = 0, scale = 1, lower.tail = TRUE, log.p = FALSE) qgumbel(p, location = 0, scale = 1, lower.tail = TRUE, log.p = FALSE) rgumbel(n, location = 0, scale = 1) is.Numeric(x, length.arg = Inf, integer.valued = FALSE, positive = FALSE)
x , q
|
vector of quantiles. |
p |
vector of probabilities. |
n |
number of observations. |
location |
the location parameter mu. |
scale |
the (positive) scale parameter sigma. Must consist of positive values. |
shape |
the shape parameter xi. |
log |
Logical. |
lower.tail , log.p
|
Logical. |
tolshape0 |
Positive numeric. Threshold/tolerance value for resting whether xi is zero. If the absolute value of the estimate of xi is less than this value then it will be assumed zero and a Gumbel distribution will be used. |
length.arg , integer.valued , positive
|
Arguments of internally called is.Numeric function. |
Used to fit an ensemble of GEV CDN models using bootstrap aggregation (bagging) and, optionally, early stopping.
gevcdn.bag(x, y, iter.max = 1000, iter.step = 10, n.bootstrap = 30, n.hidden = 3, Th = gevcdn.logistic, fixed = NULL, init.range = c(-0.25, 0.25), scale.min = .Machine$double.eps, beta.p = 3.3, beta.q = 2, sd.norm = Inf, method = c("BFGS", "Nelder-Mead"), max.fails = 100, silent = TRUE, ...)
gevcdn.bag(x, y, iter.max = 1000, iter.step = 10, n.bootstrap = 30, n.hidden = 3, Th = gevcdn.logistic, fixed = NULL, init.range = c(-0.25, 0.25), scale.min = .Machine$double.eps, beta.p = 3.3, beta.q = 2, sd.norm = Inf, method = c("BFGS", "Nelder-Mead"), max.fails = 100, silent = TRUE, ...)
x |
covariate matrix with number of rows equal to the number of samples and number of columns equal to the number of variables. |
y |
column matrix of target values with number of rows equal to the number of samples. |
iter.max |
maximum number of iterations of optimization algorithm. |
iter.step |
number of iterations between which the value of the cost function is calculated on the out-of-bootstrap samples; used during stopped training. |
n.bootstrap |
number of ensemble members trained during bootstrap aggregation (bagging). |
number of hidden nodes in each GEV CDN ensemble member. |
|
Th |
hidden layer transfer function; defaults to |
fixed |
vector indicating GEV parameters to be held constant; elements chosen from |
init.range |
range for random weights on [ |
scale.min |
minimum allowable value for the GEV scale parameter. |
beta.p |
|
beta.q |
|
sd.norm |
|
method |
optimization method for |
max.fails |
maximum number of repeated exceptions allowed during optimization. |
silent |
logical value indicating whether or not cost function information should be reported every |
... |
additional arguments passed to the |
Bootstrap aggregation (bagging) (Breiman, 1996) is used as an alternative to
gevcdn.fit
for fitting an ensemble of nonstationary GEV CDN models
(Cannon, 2010). Each ensemble member is trained on bootstrapped x
and
y
sample pairs. As an added check on overfitting, early stopping,
whereby training is stopped prior to convergence of the optimization algorithm,
can be turned on. In this case, the "best" iteration for stopping optimization
is chosen based on model performance on out-of-bag samples. Additional details on
the bagging/early stopping procedure can be found in Cannon and Whitfield
(2001, 2002). Unlike gevcdn.fit
, where models of increasing complexity
are fitted and the one that satisfies some model selection criterion is chosen for
final use, model selection in gevcdn.bag
is implicit. Ensemble averaging and,
optionally, early stopping are used to limit model complexity. One need only
set n.hidden
to avoid underfitting.
Note: values of x
and y
need not be standardized or rescaled by
the user. All variables are automatically scaled to range between 0 and 1 prior
to fitting and parameters are automatically rescaled by
gevcdn.evaluate
.
a list of length n.bootstrap
with elements consisting of
W1 |
input-hidden layer weights |
W2 |
hidden-output layer weights |
Attributes indicating the minimum/maximum values of x
and y
; the
values of Th
, fixed
, scale.min
; a logical value
stopped.training
indicating whether or not early stopping was used;
and the minimum value of the cost function on the out-of-bag samples
cost.valid
are also returned.
Breiman, L., 1996. Bagging predictors. Machine Learning, 24(2): 123-140. DOI: 10.1007/BF00058655
Cannon, A.J., 2010. A flexible nonlinear modelling framework for nonstationary generalized extreme value analysis in hydroclimatology. Hydrological Processes, 24: 673-685. DOI: 10.1002/hyp.7506
Cannon, A.J. and P.H. Whitfield, 2001. Modeling transient pH depressions in coastal streams of British Columbia using neural networks, 37(1): 73-89. DOI: 10.1111/j.1752-1688.2001.tb05476.x
Cannon, A.J. and P.H. Whitfield, 2002. Downscaling recent streamflow conditions in British Columbia, Canada using ensemble neural network models. Journal of Hydrology, 259: 136-151. DOI: 10.1016/S0022-1694(01)00581-9
gevcdn.cost
, gevcdn.evaluate
,
gevcdn.fit
, gevcdn.bootstrap
,
dgev
, optim
## Generate synthetic data, quantiles x <- as.matrix(seq(0.1, 1, length = 50)) loc <- x^2 scl <- x/2 shp <- seq(-0.1, 0.3, length = length(x)) set.seed(100) y <- as.matrix(rgev(length(x), location = loc, scale = scl, shape = shp)) q <- sapply(c(0.1, 0.5, 0.9), qgev, location = loc, scale = scl, shape = shp) ## Not run: ## Fit ensemble of models with early stopping turned on weights.on <- gevcdn.bag(x = x, y = y, iter.max = 100, iter.step = 10, n.bootstrap = 10, n.hidden = 2) parms.on <- lapply(weights.on, gevcdn.evaluate, x = x) ## 10th, 50th, and 90th percentiles q.10.on <- q.50.on <- q.90.on <- matrix(NA, ncol=length(parms.on), nrow=nrow(x)) for(i in seq_along(parms.on)){ q.10.on[,i] <- qgev(p = 0.1, location = parms.on[[i]][,"location"], scale = parms.on[[i]][,"scale"], shape = parms.on[[i]][,"shape"]) q.50.on[,i] <- qgev(p = 0.5, location = parms.on[[i]][,"location"], scale = parms.on[[i]][,"scale"], shape = parms.on[[i]][,"shape"]) q.90.on[,i] <- qgev(p = 0.9, location = parms.on[[i]][,"location"], scale = parms.on[[i]][,"scale"], shape = parms.on[[i]][,"shape"]) } ## Plot data and quantiles matplot(cbind(y, q, rowMeans(q.10.on), rowMeans(q.50.on), rowMeans(q.90.on)), type = c("b", rep("l", 6)), lty = c(1, rep(c(1, 2, 1), 2)), lwd = c(1, rep(c(3, 2, 3), 2)), col = c("red", rep("orange", 3), rep("blue", 3)), pch = 19, xlab = "x", ylab = "y", main = "gevcdn.bag (early stopping on)") ## End(Not run)
## Generate synthetic data, quantiles x <- as.matrix(seq(0.1, 1, length = 50)) loc <- x^2 scl <- x/2 shp <- seq(-0.1, 0.3, length = length(x)) set.seed(100) y <- as.matrix(rgev(length(x), location = loc, scale = scl, shape = shp)) q <- sapply(c(0.1, 0.5, 0.9), qgev, location = loc, scale = scl, shape = shp) ## Not run: ## Fit ensemble of models with early stopping turned on weights.on <- gevcdn.bag(x = x, y = y, iter.max = 100, iter.step = 10, n.bootstrap = 10, n.hidden = 2) parms.on <- lapply(weights.on, gevcdn.evaluate, x = x) ## 10th, 50th, and 90th percentiles q.10.on <- q.50.on <- q.90.on <- matrix(NA, ncol=length(parms.on), nrow=nrow(x)) for(i in seq_along(parms.on)){ q.10.on[,i] <- qgev(p = 0.1, location = parms.on[[i]][,"location"], scale = parms.on[[i]][,"scale"], shape = parms.on[[i]][,"shape"]) q.50.on[,i] <- qgev(p = 0.5, location = parms.on[[i]][,"location"], scale = parms.on[[i]][,"scale"], shape = parms.on[[i]][,"shape"]) q.90.on[,i] <- qgev(p = 0.9, location = parms.on[[i]][,"location"], scale = parms.on[[i]][,"scale"], shape = parms.on[[i]][,"shape"]) } ## Plot data and quantiles matplot(cbind(y, q, rowMeans(q.10.on), rowMeans(q.50.on), rowMeans(q.90.on)), type = c("b", rep("l", 6)), lty = c(1, rep(c(1, 2, 1), 2)), lwd = c(1, rep(c(3, 2, 3), 2)), col = c("red", rep("orange", 3), rep("blue", 3)), pch = 19, xlab = "x", ylab = "y", main = "gevcdn.bag (early stopping on)") ## End(Not run)
Used to assist in the calculation of bootstrapped confidence intervals for GEV location, scale, and shape parameters, as well as for specified quantiles. Residual and parametric bootstrap estimates are supported.
gevcdn.bootstrap(n.bootstrap, x, y, iter.max = 1000, n.hidden = 2, Th = gevcdn.logistic, fixed = NULL, init.range = c(-0.25, 0.25), scale.min = .Machine$double.eps, beta.p = 3.3, beta.q = 2, sd.norm = Inf, n.trials = 5, method = c("BFGS", "Nelder-Mead"), boot.method = c("residual", "parametric"), init.from.prev = TRUE, max.fails = 100, probs = c(0.01, 0.05, 0.1, 0.25, 0.5, 0.75, 0.9, 0.95, 0.99), ...)
gevcdn.bootstrap(n.bootstrap, x, y, iter.max = 1000, n.hidden = 2, Th = gevcdn.logistic, fixed = NULL, init.range = c(-0.25, 0.25), scale.min = .Machine$double.eps, beta.p = 3.3, beta.q = 2, sd.norm = Inf, n.trials = 5, method = c("BFGS", "Nelder-Mead"), boot.method = c("residual", "parametric"), init.from.prev = TRUE, max.fails = 100, probs = c(0.01, 0.05, 0.1, 0.25, 0.5, 0.75, 0.9, 0.95, 0.99), ...)
n.bootstrap |
number of bootstrap trials used to calculate confidence intervals. |
x |
covariate matrix with number of rows equal to the number of samples and number of columns equal to the number of variables. |
y |
column matrix of target values with number of rows equal to the number of samples. |
iter.max |
maximum number of iterations of optimization algorithm. |
number of hidden nodes in each GEV CDN ensemble member. |
|
Th |
hidden layer transfer function; defaults to |
fixed |
vector indicating GEV parameters to be held constant; elements chosen from |
init.range |
range for random weights on [ |
scale.min |
minimum allowable value for the GEV scale parameter. |
beta.p |
|
beta.q |
|
sd.norm |
|
n.trials |
number of repeated trials used to avoid shallow local minima during optimization |
method |
optimization method for |
boot.method |
bootstrap method; must be chosen from |
init.from.prev |
logical value indicating whether or not optimization runs should be initialized from the final weights of the previous run. |
max.fails |
maximum number of repeated exceptions allowed during optimization |
probs |
vector of cumulative probabilities for which quantile confidence intervals are desired. |
... |
additional arguments passed to the |
Note: the boot
package provides a more
comprehensive suit of methods for estimating bootstrap-based confidence
intervals.
a list consisting of
weights.bootstrap |
a list of length |
parms.bootstrap |
a list of length |
location.bootstrap |
a matrix of GEV location parameters with number of rows equal to that of |
scale.bootstrap |
a matrix of GEV scale parameters with number of rows equal to that of |
shape.bootstrap |
a matrix of GEV shape parameters with number of rows equal to that of |
quantiles.bootstrap |
a list of length |
Cannon, A.J., 2010. A flexible nonlinear modelling framework for nonstationary generalized extreme value analysis in hydroclimatology. Hydrological Processes, 24: 673-685. DOI: 10.1002/hyp.7506
gevcdn.cost
, gevcdn.evaluate
,
gevcdn.fit
, gevcdn.bag
,
dgev
, optim
## Generate synthetic data x <- as.matrix(seq(0.1, 1, length = 50)) loc <- x^2 scl <- x/2 shp <- seq(-0.1, 0.3, length = length(x)) set.seed(100) y <- as.matrix(rgev(length(x), location = loc, scale = scl, shape = shp)) ## Not run: ## Fit 30 bootstrapped models CI <- gevcdn.bootstrap(n.bootstrap = 30, x = x, y = y, iter.max = 100, n.hidden = 2, Th = gevcdn.logistic, n.trials = 1, boot.method = "residual", probs = c(0.1, 0.5, 0.9)) ## Plot data and percentile confidence intervals for GEV parameters par(mfrow = c(2, 2)) matplot(x, y, type = "b", pch = 19, col = "red", xlab = "x", ylab = "y", main = "gevcdn.bootstrap") matplot(x, cbind(loc, t(apply(CI$location.bootstrap, 1, quantile, p = c(0.025, 0.975)))), type = c("l", "b", "b"), pch = 20, lwd = 3, col = c("black", rep("green", 2)), xlab = "x", ylab = "location", main = "location CI") matplot(x, cbind(scl, t(apply(CI$scale.bootstrap, 1, quantile, p = c(0.025, 0.975)))), type = c("l", "b", "b"), pch = 20, lwd = 3, col = c("black", rep("orange", 2)), xlab = "x", ylab = "scale", main = "scale CI") matplot(x, cbind(shp, t(apply(CI$shape.bootstrap, 1, quantile, p = c(0.025, 0.975)))), type = c("l", "b", "b"), pch = 20, lwd = 3, col = c("black", rep("brown", 2)), xlab = "x", ylab = "shape", main = "shape CI") ## End(Not run)
## Generate synthetic data x <- as.matrix(seq(0.1, 1, length = 50)) loc <- x^2 scl <- x/2 shp <- seq(-0.1, 0.3, length = length(x)) set.seed(100) y <- as.matrix(rgev(length(x), location = loc, scale = scl, shape = shp)) ## Not run: ## Fit 30 bootstrapped models CI <- gevcdn.bootstrap(n.bootstrap = 30, x = x, y = y, iter.max = 100, n.hidden = 2, Th = gevcdn.logistic, n.trials = 1, boot.method = "residual", probs = c(0.1, 0.5, 0.9)) ## Plot data and percentile confidence intervals for GEV parameters par(mfrow = c(2, 2)) matplot(x, y, type = "b", pch = 19, col = "red", xlab = "x", ylab = "y", main = "gevcdn.bootstrap") matplot(x, cbind(loc, t(apply(CI$location.bootstrap, 1, quantile, p = c(0.025, 0.975)))), type = c("l", "b", "b"), pch = 20, lwd = 3, col = c("black", rep("green", 2)), xlab = "x", ylab = "location", main = "location CI") matplot(x, cbind(scl, t(apply(CI$scale.bootstrap, 1, quantile, p = c(0.025, 0.975)))), type = c("l", "b", "b"), pch = 20, lwd = 3, col = c("black", rep("orange", 2)), xlab = "x", ylab = "scale", main = "scale CI") matplot(x, cbind(shp, t(apply(CI$shape.bootstrap, 1, quantile, p = c(0.025, 0.975)))), type = c("l", "b", "b"), pch = 20, lwd = 3, col = c("black", rep("brown", 2)), xlab = "x", ylab = "shape", main = "shape CI") ## End(Not run)
The generalized maximum likelihood (GML) cost function used for GEV CDN model fitting (Martins and Stedinger, 2000). Calculates the negative of the logarithm of the GML, which includes a shifted beta distribution prior for the GEV shape parameter. A normal distribution prior can also be set for the magnitude of the input-hidden layer weights, thus leading to weight penalty regularization.
gevcdn.cost(weights, x, y, n.hidden, Th, fixed, scale.min, beta.p, beta.q, sd.norm)
gevcdn.cost(weights, x, y, n.hidden, Th, fixed, scale.min, beta.p, beta.q, sd.norm)
weights |
weight vector of length returned by |
x |
covariate matrix with number of rows equal to the number of samples and number of columns equal to the number of variables. |
y |
column matrix of target values with number of rows equal to the number of samples. |
number of hidden nodes in the GEV CDN model. |
|
Th |
hidden layer transfer function; defaults to |
fixed |
vector indicating GEV parameters to be held constant; elements chosen from |
scale.min |
minimum allowable value for the GEV scale parameter. |
beta.p |
|
beta.q |
|
sd.norm |
|
Martins, E.S. and J.R. Stedinger, 2000. Generalized maximum-likelihood generalized extreme-value quantile estimators for hydrologic data. Water Resources Research, 36: 737-744. DOI: 10.1029/1999WR900330
gevcdn.fit
, gevcdn.bag
, dgev
,
optim
Evaluate a trained GEV CDN model, resulting in a matrix with columns corresponding to the location, scale, and shape parameters of the GEV distribution.
gevcdn.evaluate(x, weights)
gevcdn.evaluate(x, weights)
x |
covariate matrix with number of rows equal to the number of samples and number of columns equal to the number of variables. |
weights |
list containing GEV CDN input-hidden and hidden-output layer weight matrices from |
a matrix with number of rows equal to that of x
and columns
corresponding to the GEV location, scale, and shape parameters.
Cannon, A.J., 2010. A flexible nonlinear modelling framework for nonstationary generalized extreme value analysis in hydroclimatology. Hydrological Processes, 24: 673-685. DOI: 10.1002/hyp.7506
# Generate synthetic data x <- as.matrix(1:50) y <- as.matrix(rgev(length(x), location = 0, scale = 1, shape = 0.2)) ## Fit stationary model weights <- gevcdn.fit(x = x, y = y, Th = gevcdn.identity, fixed = c("location", "scale", "shape")) ## Evaluate GEV parameters parms <- gevcdn.evaluate(x, weights) cat("GEV parameters", parms[1,], "\n")
# Generate synthetic data x <- as.matrix(1:50) y <- as.matrix(rgev(length(x), location = 0, scale = 1, shape = 0.2)) ## Fit stationary model weights <- gevcdn.fit(x = x, y = y, Th = gevcdn.identity, fixed = c("location", "scale", "shape")) ## Evaluate GEV parameters parms <- gevcdn.evaluate(x, weights) cat("GEV parameters", parms[1,], "\n")
Fit a GEV CDN model via nonlinear optimization of the generalized maximum likelihood cost function.
gevcdn.fit(x, y, iter.max = 1000, n.hidden = 2, Th = gevcdn.logistic, fixed = NULL, init.range = c(-0.25, 0.25), scale.min = .Machine$double.eps, beta.p = 3.3, beta.q = 2, sd.norm = Inf, n.trials = 5, method = c("BFGS", "Nelder-Mead"), max.fails = 100, ...)
gevcdn.fit(x, y, iter.max = 1000, n.hidden = 2, Th = gevcdn.logistic, fixed = NULL, init.range = c(-0.25, 0.25), scale.min = .Machine$double.eps, beta.p = 3.3, beta.q = 2, sd.norm = Inf, n.trials = 5, method = c("BFGS", "Nelder-Mead"), max.fails = 100, ...)
x |
covariate matrix with number of rows equal to the number of samples and number of columns equal to the number of variables. |
y |
column matrix of target values with number of rows equal to the number of samples. |
iter.max |
maximum number of iterations of optimization algorithm. |
number of hidden nodes in the GEV CDN model. |
|
Th |
hidden layer transfer function; defaults to |
fixed |
vector indicating GEV parameters to be held constant; elements chosen from |
init.range |
range for random weights on [ |
scale.min |
minimum allowable value for the GEV scale parameter. |
beta.p |
|
beta.q |
|
sd.norm |
|
n.trials |
number of repeated trials used to avoid shallow local minima during optimization |
method |
optimization method for |
max.fails |
maximum number of repeated exceptions allowed during optimization |
... |
additional arguments passed to the |
Fit a nonstationary GEV CDN model (Cannon, 2010) by minimizing a cost function
based on the generalized maximum likelihood of Martins and Stedinger (2000).
The hidden layer transfer function Th
should be set to
gevcdn.logistic
for a nonlinear model and to
gevcdn.identity
for a linear model. In the nonlinear case, the
number of hidden nodes n.hidden
controls the overall complexity of the
model. GEV parameters can be held constant (i.e., stationary) via the
fixed
argument. The form of the shifted beta distribution prior for the
GEV shape parameter is controlled by the beta.p
and beta.q
arguments. By default, these are set to values used in Cannon (2010).
Other alternatives include values recommended by Martins and Stedinger (2000)
(beta.p = 9
and beta.q = 6
) or values following from
the normal distribution reported by Papalexiou and Koutsoyiannis (2013)
(beta.p = 71.1
and beta.q = 44.7
). Weight penalty
regularization for the magnitude of the input-hidden layer weights can
be applied by setting sd.norm
to a value less than Inf
.
Values of the Akaike information criterion (AIC), Akaike information criterion
with small sample size correction (AICc), and Bayesian information criterion
(BIC) are calculated to assist in model selection. It is possible for such
criteria to fail in the face of overfitting, for example with a nonlinear model
and n.hidden
set too high, as the GEV distribution may converge on one or
more samples. This can usually be diagnosed by inspecting the scale parameter
for near zero values. In this case, one can apply a weight penalty
(via sd.norm
), although this rules out the use of AIC/AICc/BIC for model
selection as the effective number of model parameters will no longer equal the
number of weights in the GEV CDN model. Alternatively, a lower threshold (as a
proportion of the range of y
) for the scale parameter can be set
via scale.min
. Finally, bootstrap aggregation is available via
gevcdn.bag
as a second method for fitting GEV CDN models.
Note: values of x
and y
need not be standardized or rescaled
by the user. All variables are automatically scaled to range between 0 and 1
prior to fitting and parameters are automatically rescaled by
gevcdn.evaluate
.
a list consisting of
W1 |
input-hidden layer weights |
W2 |
hidden-output layer weights |
Attributes indicating the minimum/maximum values of x
and y
; the
values of Th
, fixed
, scale.min
; the negative of the
logarithm of the generalized maximum likelihood GML
, the negative of
the logarithm of the likelihood NLL
, the value of the penalty term
penalty
, the Bayesian information criterion BIC
, the Akaike
information criterion with (AICc
) and without (AIC
) small sample
size correction; and the number of model parameters k
are also
returned.
Cannon, A.J., 2010. A flexible nonlinear modelling framework for nonstationary generalized extreme value analysis in hydroclimatology. Hydrological Processes, 24: 673-685. DOI: 10.1002/hyp.7506
Martins, E.S. and J.R. Stedinger, 2000. Generalized maximum-likelihood generalized extreme-value quantile estimators for hydrologic data. Water Resources Research, 36:737-744. DOI: 10.1029/1999WR900330
Papalexiou, S.M. and Koutsoyiannis, D., 2013. Battle of extreme value distributions: A global survey on extreme daily rainfall. Water Resources Research, 49(1), 187-201. DOI: 10.1029/2012WR012557
gevcdn.cost
, gevcdn.evaluate
,
gevcdn.bag
, gevcdn.bootstrap
,
dgev
, optim
## Generate synthetic data, quantiles x <- as.matrix(seq(0.1, 1, length = 50)) loc <- x^2 scl <- x/2 shp <- seq(-0.1, 0.3, length = length(x)) set.seed(100) y <- as.matrix(rgev(length(x), location = loc, scale = scl, shape = shp)) q <- sapply(c(0.1, 0.5, 0.9), qgev, location = loc, scale = scl, shape = shp) ## Define a hierarchy of models of increasing complexity models <- vector("list", 4) # Stationary model models[[1]] <- list(Th = gevcdn.identity, fixed = c("location", "scale", "shape")) # Linear model models[[2]] <- list(Th = gevcdn.identity) # Nonlinear, 1 hidden node models[[3]] <- list(n.hidden = 1, Th = gevcdn.logistic) # Nonlinear, 2 hidden nodes models[[4]] <- list(n.hidden = 2, Th = gevcdn.logistic) ## Fit models weights.models <- vector("list", length(models)) for(i in seq_along(models)){ weights.models[[i]] <- gevcdn.fit(x = x, y = y, n.trials = 1, n.hidden = models[[i]]$n.hidden, Th = models[[i]]$Th, fixed = models[[i]]$fixed) } ## Select model with minimum AICc models.AICc <- sapply(weights.models, attr, which = "AICc") weights.best <- weights.models[[which.min(models.AICc)]] parms.best <- gevcdn.evaluate(x, weights.best) ## 10th, 50th, and 90th percentiles q.best <- sapply(c(0.1, 0.5, 0.9), qgev, location = parms.best[,"location"], scale = parms.best[,"scale"], shape = parms.best[,"shape"]) ## Plot data and quantiles matplot(x, cbind(y, q, q.best), type = c("b", rep("l", 6)), lty = c(1, rep(c(1, 2, 1), 2)), lwd = c(1, rep(c(3, 2, 3), 2)), col = c("red", rep("orange", 3), rep("blue", 3)), pch = 19, xlab = "x", ylab = "y", main = "gevcdn.fit")
## Generate synthetic data, quantiles x <- as.matrix(seq(0.1, 1, length = 50)) loc <- x^2 scl <- x/2 shp <- seq(-0.1, 0.3, length = length(x)) set.seed(100) y <- as.matrix(rgev(length(x), location = loc, scale = scl, shape = shp)) q <- sapply(c(0.1, 0.5, 0.9), qgev, location = loc, scale = scl, shape = shp) ## Define a hierarchy of models of increasing complexity models <- vector("list", 4) # Stationary model models[[1]] <- list(Th = gevcdn.identity, fixed = c("location", "scale", "shape")) # Linear model models[[2]] <- list(Th = gevcdn.identity) # Nonlinear, 1 hidden node models[[3]] <- list(n.hidden = 1, Th = gevcdn.logistic) # Nonlinear, 2 hidden nodes models[[4]] <- list(n.hidden = 2, Th = gevcdn.logistic) ## Fit models weights.models <- vector("list", length(models)) for(i in seq_along(models)){ weights.models[[i]] <- gevcdn.fit(x = x, y = y, n.trials = 1, n.hidden = models[[i]]$n.hidden, Th = models[[i]]$Th, fixed = models[[i]]$fixed) } ## Select model with minimum AICc models.AICc <- sapply(weights.models, attr, which = "AICc") weights.best <- weights.models[[which.min(models.AICc)]] parms.best <- gevcdn.evaluate(x, weights.best) ## 10th, 50th, and 90th percentiles q.best <- sapply(c(0.1, 0.5, 0.9), qgev, location = parms.best[,"location"], scale = parms.best[,"scale"], shape = parms.best[,"shape"]) ## Plot data and quantiles matplot(x, cbind(y, q, q.best), type = c("b", rep("l", 6)), lty = c(1, rep(c(1, 2, 1), 2)), lwd = c(1, rep(c(3, 2, 3), 2)), col = c("red", rep("orange", 3), rep("blue", 3)), pch = 19, xlab = "x", ylab = "y", main = "gevcdn.fit")
gevcdn.identity
computes a trivial identity function. Used as the hidden layer transfer function for linear GEV CDN models.
gevcdn.identity(x)
gevcdn.identity(x)
x |
a numeric vector |
Random initialization of the weight vector used during fitting of the GEV CDN model.
gevcdn.initialize(x, n.hidden, init.range)
gevcdn.initialize(x, n.hidden, init.range)
x |
covariate matrix with number of rows equal to the number of samples and number of columns equal to the number of variables. |
number of hidden nodes in the GEV CDN model. |
|
init.range |
range for random weights on [ |
gevcdn.logistic
computes the logistic sigmoid (S-shaped) function. Used as the hidden layer transfer function for nonlinear GEV CDN models.
gevcdn.logistic(x)
gevcdn.logistic(x)
x |
a numeric vector |
Reshapes a weight vector used during fitting of the GEV CDN model into input-hidden and hidden-output layer weight matrices.
gevcdn.reshape(x, weights, n.hidden)
gevcdn.reshape(x, weights, n.hidden)
x |
covariate matrix with number of rows equal to the number of samples and number of columns equal to the number of variables. |
weights |
weight vector of length returned by |
number of hidden nodes in the GEV CDN model. |