Title: | 'Multiple Systems Estimation for Sparse Capture Data' |
---|---|
Description: | Implements the routines and algorithms developed and analysed in "Multiple Systems Estimation for Sparse Capture Data: Inferential Challenges when there are Non-Overlapping Lists" Chan, L, Silverman, B. W., Vincent, K (2019) <https://www.tandfonline.com/doi/full/10.1080/01621459.2019.1708748> and in "Bootstrapping multiple systems estimates to account for model selection" Silverman, B. W., Chan, L, Vincent, K (2023)<https://doi.org/10.1007/s11222-023-10346-9>. This package explicitly handles situations where there are pairs of lists which have no observed individuals in common. It deals correctly with parameters whose estimated values can be considered as being negative infinity. It also addresses other possible issues of non-existence and non-identifiability of maximum likelihood estimates. |
Authors: | Lax Chan [aut, cre], Bernard Silverman [aut], Kyle Vincent [aut] |
Maintainer: | Lax Chan <[email protected]> |
License: | GPL (>= 2) |
Version: | 3.0.1 |
Built: | 2024-11-03 03:49:55 UTC |
Source: | https://github.com/laxchan/sparsemse |
Given any encoded capture history and the number of lists, find all the encoded capture histories that are included in the original capture history
ancestors(k, nlists = 10)
ancestors(k, nlists = 10)
k |
An encoded capture history |
nlists |
The total number of lists |
a vector giving the encoded versions of the ancestors
Silverman, B. W., Chan, L. and Vincent, K., (2024). Bootstrapping Multiple Systems Estimates to Account for Model Selection Statistics and Computing, 34(44), Available from https://doi.org/10.1007/s11222-023-10346-9.
ancestors(2,10) ancestors(1,5)
ancestors(2,10) ancestors(1,5)
This is a simple data set based on three lists, which gives examples of models that
fail on one or the other of the criteria tested by checkident
. This is Table 2 in Chan, Silverman and Vincent (2021).
Artificial_3
Artificial_3
An object of class data.frame
with 4 rows and 4 columns.
If all three two-list effects are included in the fitted model then the linear program
in checkident
yields a strictly positive value but the matrix A is not of full column rank, so the parameters are not identifiable.
If the model contains AB either alone or in conjunction with one of AC and BC, then the linear program result is zero, so the MLE does not exist.
If only main effects are considered, or if either or both of AC and BC, but not AB are included,
then the model passes both tests.
Chan, L., Silverman, B. W., and Vincent, K. (2021). Multiple Systems Estimation for Sparse Capture Data: Inferential Challenges when there are Non-Overlapping Lists. Journal of American Statistcal Association, 116(535), 1297-1306, Available from https://www.tandfonline.com/doi/full/10.1080/01621459.2019.1708748.
This routine sorts the models in increasing order according to their BICs, returns the sorted models with their corresponding BICs and abundance. The original data as well as the maxorder of the models considered are returned as well.
assemble_bic( xdata, maxorder = dim(xdata)[2] - 2, checkexist = T, removeFRfail = T, ... )
assemble_bic( xdata, maxorder = dim(xdata)[2] - 2, checkexist = T, removeFRfail = T, ... )
xdata |
The original data matrix with capture histories and counts. |
maxorder |
Maximum order of models to be included |
checkexist |
If T then the Fienberg-Rinaldo condition is checked for each model |
removeFRfail |
If checkexist is T then models which fail the FR condition are removed from the results |
... |
Parameters to be fed to |
A list with the following components
A matrix with the models considered, their abundance, BIC and their order, sorted into increasing order of BIC
The original data matrix with capture histories and counts.
Order parameter that was feed into gethiermodels
data(hiermodels) data(Korea) assemble_bic(Korea, checkexist=T)
data(hiermodels) data(Korea) assemble_bic(Korea, checkexist=T)
The BCa confidence intervals use percentiles of the bootstrap distribution of the population size
, but adjust the percentile actually used. The adjusted percentiles depend on an
estimated bias parameter, and the quantile function of the estimated bias parameter is the proportion
of the bootstrap estimates that fall below the estimate from the original data, and an
estimated acceleration factor, which derivation depends on a jackknife approach. This routine is called internally
by estimatepopulation
.
bcaconfvalues( bootreps, popest, ahat, alpha = c(0.025, 0.05, 0.1, 0.16, 0.84, 0.9, 0.95, 0.975) )
bcaconfvalues( bootreps, popest, ahat, alpha = c(0.025, 0.05, 0.1, 0.16, 0.84, 0.9, 0.95, 0.975) )
bootreps |
Point estimates of total population sizes from each bootstrap sample. |
popest |
A point estimate of the total population of the original data set. |
ahat |
the estimated acceleration factor |
alpha |
Bootstrap quantiles of interests |
BCa confidence intervals
Chan, L., Silverman, B. W., and Vincent, K. (2021). Multiple Systems Estimation for Sparse Capture Data: Inferential Challenges when there are Non-Overlapping Lists. Journal of American Statistcal Association, 116(535), 1297-1306, Available from https://www.tandfonline.com/doi/full/10.1080/01621459.2019.1708748.
DiCiccio, T. J. and Efron, B. (1996). Bootstrap Confidence Intervals. Statistical Science, 40(3), 189-228.
Efron, B. (1987). Better Bootstrap Confidence Intervals. Journal of the American Statistical Association, 82(397), 171-185.
This routine carries out the simulation study as detailed in Section 3.4 of Chan, Silverman and Vincent (2021).
If the original data set has low counts, so that there is a possibility of a simulated data set containing empty lists, then it
may be advisable to use the option noninformativelist=TRUE
.
BICandbootstrapsim( zdat, nsims = 100, nboot = 100, pthresh = 0.02, iseed = 1234, alpha = c(0.025, 0.05, 0.1, 0.16, 0.5, 0.84, 0.9, 0.95, 0.975), noninformativelist = F, verbose = F, ... )
BICandbootstrapsim( zdat, nsims = 100, nboot = 100, pthresh = 0.02, iseed = 1234, alpha = c(0.025, 0.05, 0.1, 0.16, 0.5, 0.84, 0.9, 0.95, 0.975), noninformativelist = F, verbose = F, ... )
zdat |
Data matrix with |
nsims |
Number of simulations to be carried out. |
nboot |
Number of bootstrap replications for each simulation |
pthresh |
p-value threshold used in |
iseed |
seed for initialization. |
alpha |
bootstrap quantiles of interests. |
noninformativelist |
if |
verbose |
If |
... |
other arguments. |
A list with components as below
popest
Total population point estimate from the original data using
estimatepopulation.0
with default threshold.
BICmodels
The best model chosen by the BIC at each simulation.
BICvals
Point estimates of the total population and standard error of the best model chosen by the BIC at each simulation.
simreps
Counts associated to each capture history at each simulation.
modelmat
A full capture history matrix excluding the row corresponding to the dark figure.
popestsim
Total population estimate given by the BCa method in each simulation.
BCaquantiles
bootstrap confidence intervals given by the BCa method.
BICconf
confidence interval given by the BIC method.
Chan, L., Silverman, B. W., and Vincent, K. (2021). Multiple Systems Estimation for Sparse Capture Data: Inferential Challenges when there are Non-Overlapping Lists. Journal of American Statistcal Association, 116(535), 1297-1306, Available from https://www.tandfonline.com/doi/full/10.1080/01621459.2019.1708748.
DiCiccio, T. J. and Efron, B. (1996). Bootstrap Confidence Intervals. Statistical Science, 40(3), 189-228.
Rivest, L-P. and Baillargeon, S. (2014) Rcapture. CRAN package. Available from Available from https://CRAN.R-project.org/package=Rcapture.
zdat=UKdat_5 BICandbootstrapsim(zdat,nsims=1000, nboot=100, pthresh=0.02, iseed=1234, noninformativelist=T)
zdat=UKdat_5 BICandbootstrapsim(zdat,nsims=1000, nboot=100, pthresh=0.02, iseed=1234, noninformativelist=T)
Returns acceleration factors for each value of ntop for given modelorder
bicktopahatcal(z, modelorder)
bicktopahatcal(z, modelorder)
z |
The output of |
modelorder |
the order of the considered models |
the estimated acceleration factor
Silverman, B. W., Chan, L. and Vincent, K., (2024). Bootstrapping Multiple Systems Estimates to Account for Model Selection Statistics and Computing, 34(44), Available from https://doi.org/10.1007/s11222-023-10346-9.
data(Korea) z=assemble_bic(Korea, checkexist=T) z=jackknifecal(z, checkexist=T) nmods= dim(z$jackabund)[1] modscores = (1:nmods) modelorder= order(modscores) bicktopahatcal(z,modelorder)
data(Korea) z=assemble_bic(Korea, checkexist=T) z=jackknifecal(z, checkexist=T) nmods= dim(z$jackabund)[1] modscores = (1:nmods) modelorder= order(modscores) bicktopahatcal(z,modelorder)
This routine takes the ouput of find_bic_rank_matrix
and produce a BIC rank matrix with a specified order of rank breaking
ties with each previous column.
BICrank_tiebreak(BICmatrix_prop, k)
BICrank_tiebreak(BICmatrix_prop, k)
BICmatrix_prop |
The output from |
k |
The order of BIC rank |
A BIC rank matrix with the last column breaking ties with the previous ones.
Silverman, B. W., Chan, L. and Vincent, K., (2024). Bootstrapping Multiple Systems Estimates to Account for Model Selection Statistics and Computing, 34(44), Available from https://doi.org/10.1007/s11222-023-10346-9.
data(Korea) z=assemble_bic(Korea) z=jackknifecal(z) BICmatrix_prop= find_bic_rank_matrix(z) BICmatrix_break = BICrank_tiebreak(BICmatrix_prop, 2)
data(Korea) z=assemble_bic(Korea) z=jackknifecal(z) BICmatrix_prop= find_bic_rank_matrix(z) BICmatrix_break = BICrank_tiebreak(BICmatrix_prop, 2)
A routine for naive user
bootstrap_mse( zdat, maxorder = dim(zdat)[2] - 2, nboot = 10000, iseed = 1234, alpha = c(0.025, 0.1, 0.9, 0.975) )
bootstrap_mse( zdat, maxorder = dim(zdat)[2] - 2, nboot = 10000, iseed = 1234, alpha = c(0.025, 0.1, 0.9, 0.975) )
zdat |
Data matrix with |
maxorder |
Maximum order of models to be included. |
nboot |
Number of bootstrap replications. |
iseed |
seed for initialisation |
alpha |
Levels of confidence intervals to be constructed and assessed |
A list with the following components
BCa confidence intervals for each considered model
Corresponding probabilities of the estimates
Silverman, B. W., Chan, L. and Vincent, K., (2024). Bootstrapping Multiple Systems Estimates to Account for Model Selection Statistics and Computing, 34(44), Available from https://doi.org/10.1007/s11222-023-10346-9.
data(Korea) bootstrap_mse(Korea)
data(Korea) bootstrap_mse(Korea)
This routine takes the output from assemble_bic
or subsetmat
and returns bootstrap
abundance matrix and BIC matrix. This version makes use of the checkident.2
.
bootstrapcal( z, nboot = 1000, iseed = 1234, checkexist = T, saveinterval = Inf, savefile = "bootout.Rdata" )
bootstrapcal( z, nboot = 1000, iseed = 1234, checkexist = T, saveinterval = Inf, savefile = "bootout.Rdata" )
z |
Results from |
nboot |
The number of bootstrap replications. |
iseed |
Integer seed to allow for replicability. |
checkexist |
If |
saveinterval |
If this is set to a finite value, the output list |
savefile |
The file to which the output will be saved if |
The original input list z
with the additional components
Bootstrap abundance matrix
Bootstrap BIC matrix
If there are already components with these names they will be overwritten.
Silverman, B. W., Chan, L. and Vincent, K., (2024). Bootstrapping Multiple Systems Estimates to Account for Model Selection Statistics and Computing, 34(44), Available from https://doi.org/10.1007/s11222-023-10346-9.
z=assemble_bic(Korea) bootstrapcal(z, checkexist=T, saveinterval=50, savefile="Koreabootresults.Rdata")
z=assemble_bic(Korea) bootstrapcal(z, checkexist=T, saveinterval=50, savefile="Koreabootresults.Rdata")
Call the resulting set the "boundary". Supposing that the current set of captures is a hierarchical model, that property
will be preserved if a capture in the boundary is added to it. The routine is called internally by find_neighbour_hierarchies
.
boundary_captures(kcap, nlists)
boundary_captures(kcap, nlists)
kcap |
An encoded capture history that corresponds to the row number of the capture history data set |
nlists |
The total number of lists |
a vector giving the encoded versions of the descendants
Silverman, B. W., Chan, L. and Vincent, K., (2024). Bootstrapping Multiple Systems Estimates to Account for Model Selection Statistics and Computing, 34(44), Available from https://doi.org/10.1007/s11222-023-10346-9.
modelstr = "[12,23]" nlists=4 zhierroots = convert_from_hierarchy(modelstr, F) zhier = unique(ancestors(zhierroots, nlists)) boundary_captures(zhier,nlists)
modelstr = "[12,23]" nlists=4 zhierroots = convert_from_hierarchy(modelstr, F) zhier = unique(ancestors(zhierroots, nlists)) boundary_captures(zhier,nlists)
For multiple systems estimation model corresponding to a specified set of two-list effects, set up the GLM model formula and data matrix.
buildmodel(zdat, mX)
buildmodel(zdat, mX)
zdat |
Data matrix with |
mX |
A |
A list with components as below.
datamatrix
A matrix with all possible capture histories, other than those equal to or containing non-overlapping pairs
indexed by parameters
that are within the model specified by mX
. A non-overlapping pair is a pair of lists such that
no case is observed in both lists,
regardless of whether it is present on any other lists. If
is within the model specified by
mX
,
all capture histories containing both and
are
then excluded.
modelform
The model formula suitable to be called by the Generalized Linear Model function glm
. Model terms corresponding to
non-overlapping pairs are not included, because they are handled by removing appropriate rows from the data matrix supplied to glm. The list
of non-overlapping pairs are provided in emptyoverlaps
. See Chan, Silverman and Vincent (2021) for details.
emptyoverlaps
A matrix with two rows, whose columns give the indices of non-overlapping pairs of lists where the parameter indexed by
the pair is within the specified model. The column names give the names of the lists
corresponding to each pair.
Chan, L., Silverman, B. W., and Vincent, K. (2021). Multiple Systems Estimation for Sparse Capture Data: Inferential Challenges when there are Non-Overlapping Lists. Journal of American Statistcal Association, 116(535), 1297-1306, Available from https://www.tandfonline.com/doi/full/10.1080/01621459.2019.1708748.
data(NewOrl) buildmodel(NewOrl, mX=NULL) #Build a matrix that contains all two-list effects m=dim(NewOrl)[2]-1 mX = t(expand.grid(1:m, 1:m)); mX = mX[ , mX[1,]<mX[2,]] # With one two-list effect buildmodel(NewOrl, mX=mX[,1]) #With three two-list effects buildmodel(NewOrl, mX=mX[,1:3])
data(NewOrl) buildmodel(NewOrl, mX=NULL) #Build a matrix that contains all two-list effects m=dim(NewOrl)[2]-1 mX = t(expand.grid(1:m, 1:m)); mX = mX[ , mX[1,]<mX[2,]] # With one two-list effect buildmodel(NewOrl, mX=mX[,1]) #With three two-list effects buildmodel(NewOrl, mX=mX[,1:3])
This routine builds a model matrix as required by the linear program check checkident
and checks if the matrix is of full rank.
In addition, for each individual list, and for each pair of lists included in the model,
it returns the total count of individuals appearing on the specific list or lists whether or not in combination with other lists.
buildmodelmatrix(zdat, mX = NULL)
buildmodelmatrix(zdat, mX = NULL)
zdat |
Data matrix with |
mX |
A |
A list with components as below
modmat
The matrix that maps the parameters in the model (excluding any corresponding to non-overlapping lists) to the log expected value
of the counts of capture histories that do not contain non-overlapping pairs in the data.
tvec
A vector indexed by the parameters in the model, excluding those corresponding to non-overlapping pairs of lists. For each parameter
the vector contains the total count of individuals in all the capture histories that include both the lists indexed by that parameter.
rankdef
The column rank deficiency of the matrix modmat
. If rankdef = 0
, the matrix has full column rank.
data(NewOrl) buildmodelmatrix(NewOrl, mX=NULL)
data(NewOrl) buildmodelmatrix(NewOrl, mX=NULL)
This routine efficiently checks whether every possible model satisfies the conditions for the existence and identifiability of an extended maximum likelihood estimate. For identifiability it is only necessary to check the model containing all two-list effects. For existence, the approach set out in Chan, Silverman and Vincent (2021) is used. This uses the linear programming approach described in a more general and abstract context by Fienberg and Rinaldo (2012).
checkallmodels(zdat, nreport = 1024)
checkallmodels(zdat, nreport = 1024)
zdat |
Data matrix with |
nreport |
A message is printed for every |
If the extended maximum likelihood estimator exists for a particular model, then it will still exist if one or more overlapping pairs are removed from the model. This allows the search to be carried out efficiently, as follows:
1. Search over ‘top-level’ models which contain all overlapping pairs. The number of such models is , where
is the number of non-overlapping pairs, because there are
possible choices of the set of non-overlapping pairs to include in the model.
2. For each top-level model, if the estimate exists there is no need to consider that model further. If the estimate does not exist, then a branch search is carried out over the tree structure obtained by successively leaving out overlapping pairs.
The routine prints a message if the model with all two-list parameters is not identifiable. As set out above, it gives regular progress reports in the case where there are a large number of models to be checked.
If all models give estimates which exist, then a message is printed to that effect.
If there are models for which the estimate does not exist, the routine reports the number of such models and returns a matrix whose rows give the parameters included in them.
Chan, L., Silverman, B. W., and Vincent, K. (2021). Multiple Systems Estimation for Sparse Capture Data: Inferential Challenges when there are Non-Overlapping Lists. Journal of American Statistcal Association, 116(535), 1297-1306, Available from https://www.tandfonline.com/doi/full/10.1080/01621459.2019.1708748.
Fienberg, S. E. and Rinaldo, A. (2012). Maximum likelihood estimation in log-linear models. Ann. Statist. 40, 996-1023. Supplementary material: Technical report, Carnegie Mellon University. Available from http://www.stat.cmu.edu/~arinaldo/Fienberg_Rinaldo_Supplementary_Material.pdf.
data(Artificial_3) data(Western) checkallmodels(Artificial_3) checkallmodels(Western)
data(Artificial_3) data(Western) checkallmodels(Artificial_3) checkallmodels(Western)
Apply the linear programming test as derived by Fienberg and Rinaldo (2012), and a calculation of the rank of the design matrix, to check whether a particular model yields an identifiable maximum likelihood estimate based on the given data. The linear programming problem is as described on page 3 of Fienberg and Rinaldo (2012), with a typographical error corrected. Further details are given by Chan, Silverman and Vincent (2021).
checkident(zdat, mX = 0, verbose = FALSE)
checkident(zdat, mX = 0, verbose = FALSE)
zdat |
Data matrix with |
mX |
A |
verbose |
Specifies the output. If |
If verbose=FALSE
, then return the error code ierr
which is 0 if there are no errors, 1 if the linear program test shows that the maximum likelihood
estimate does not exist, 2 if it is not identifiable, and 3 if both tests are failed.
If verbose=TRUE
, then return a list with components as below
ierr
As described above.
zlp
Linear programming object, in particular giving the value of the objective function at optimum.
Chan, L., Silverman, B. W., and Vincent, K. (2021). Multiple Systems Estimation for Sparse Capture Data: Inferential Challenges when there are Non-Overlapping Lists. Journal of American Statistcal Association, 116(535), 1297-1306, Available from https://www.tandfonline.com/doi/full/10.1080/01621459.2019.1708748.
Fienberg, S. E. and Rinaldo, A. (2012). Maximum likelihood estimation in log-linear models. Ann. Statist. 40, 996-1023. Supplementary material: Technical report, Carnegie Mellon University. Available from http://www.stat.cmu.edu/~arinaldo/Fienberg_Rinaldo_Supplementary_Material.pdf.
data(Artificial_3) #Build a matrix that contains all two-list effects m=dim(Artificial_3)[2]-1 mX = t(expand.grid(1:m, 1:m)); mX = mX[ , mX[1,]<mX[2,]] # When the model is not identifiable checkident(Artificial_3,mX=mX, verbose=TRUE) # When the maximum likelihood estimate does not exist checkident(Artificial_3, mX=mX[,1],verbose=TRUE) #Model passes both tests checkident(Artificial_3, mX=mX[,2:3],verbose=TRUE)
data(Artificial_3) #Build a matrix that contains all two-list effects m=dim(Artificial_3)[2]-1 mX = t(expand.grid(1:m, 1:m)); mX = mX[ , mX[1,]<mX[2,]] # When the model is not identifiable checkident(Artificial_3,mX=mX, verbose=TRUE) # When the maximum likelihood estimate does not exist checkident(Artificial_3, mX=mX[,1],verbose=TRUE) #Model passes both tests checkident(Artificial_3, mX=mX[,2:3],verbose=TRUE)
This routine performs the Fienberg-Rinaldo linear program check in the framework of extended maximum likelihood estimates, the parameters estimates exist if and only if the return value of the check is nonzero
checkident.1(parset, datlist)
checkident.1(parset, datlist)
parset |
Either the hierarchical representation of the model, or the vector of the corresponding capture histories to the model. |
datlist |
The output of |
The value of the linear program. The parameter estimates within the extended ML framework exist if and only if this value is nonzero.
Silverman, B. W., Chan, L. and Vincent, K., (2022). Bootstrapping Multiple Systems Estimates to Account for Model Selection
Fienberg, S. E. and Rinaldo, A. (2012). Maximum likelihood estimation in log-linear models. Ann. Statist. 40, 996-1023. Supplementary material: Technical report, Carnegie Mellon University. Available from http://www.stat.cmu.edu/~arinaldo/Fienberg_Rinaldo_Supplementary_Material.pdf.
data(Korea) datlist = ingest_data(Korea) parset ="[0,0,1]" checkident.1(parset,datlist)
data(Korea) datlist = ingest_data(Korea) parset ="[0,0,1]" checkident.1(parset,datlist)
Suppose we have a collection of different data outcomes on the same set of capture histories and a vector of models. Typically the data outcomes will be bootstrap replications. This routine finds the unique support patterns among the data and hence economises the task of finding which model/data combinations satisfy the Fienberg-Rinaldo condition
checkident.2(x, xcap, zmods)
checkident.2(x, xcap, zmods)
x |
a matrix of data observations for a common capture matrix |
xcap |
the incidence matrix of the capture histories corresponding to the rows of x |
zmods |
a vector of models |
a matrix with rows corresponding to the models and columns to the columns of x, with elements taking the value T if the FR linear program for a vector of 0s and 1s with the same zero pattern as the x data yields a strictly positive value
Silverman, B. W., Chan, L. and Vincent, K., (2024). Bootstrapping Multiple Systems Estimates to Account for Model Selection Statistics and Computing, 34(44), Available from https://doi.org/10.1007/s11222-023-10346-9.
data(Korea) z=assemble_bic(Korea) z=bootstrapcal(z) n2=dim(z$xdata)[2] checkident.2(z$bootreplications, z$xdata[,-n2], dimnames(z$res)[[1]])
data(Korea) z=assemble_bic(Korea) z=bootstrapcal(z) n2=dim(z$xdata)[2] checkident.2(z$bootreplications, z$xdata[,-n2], dimnames(z$res)[[1]])
This routine leaves out a particular set of parameters corresponding to the two-list effects from the parameter set theta.
For the resulting model, it constructs the linear programming
problem to check whether the extended maximum likelihood estimates of the parameters exists. It is called internally by checkallmodels
.
checkthetasubset(zset, amat, tvec, nlists)
checkthetasubset(zset, amat, tvec, nlists)
zset |
set of indices that is not included, numbered among the two-list effects only |
amat |
a design matrix |
tvec |
vector of sufficient statistics |
nlists |
number of lists in the original capture-recapture matrix |
If the return result is TRUE
, the linear program shows that the extended maximum likelihood estimate does not exist.
If the return result is FALSE
, the estimate exists.
Chan, L., Silverman, B. W., and Vincent, K. (2021). Multiple Systems Estimation for Sparse Capture Data: Inferential Challenges when there are Non-Overlapping Lists. Journal of American Statistcal Association, 116(535), 1297-1306, Available from https://www.tandfonline.com/doi/full/10.1080/01621459.2019.1708748.
Given any encoded capture history that corresponds to the row number of the capture history data set and the number of lists, find the encoded capture histories which are obtained by adding one more list in turn
child_captures(k, nlists)
child_captures(k, nlists)
k |
An encoded capture history that corresponds to the row number of the capture history data set |
nlists |
The total number of lists |
a vector giving the encoded versions of the children
Silverman, B. W., Chan, L. and Vincent, K., (2024). Bootstrapping Multiple Systems Estimates to Account for Model Selection Statistics and Computing, 34(44), Available from https://doi.org/10.1007/s11222-023-10346-9.
child_captures(2,5) child_captures(1,4)
child_captures(2,5) child_captures(1,4)
Given a hierarchical model, find the vector of all the corresponding encoded captures
convert_from_hierarchy(modelstr, findancestors = T)
convert_from_hierarchy(modelstr, findancestors = T)
modelstr |
A given hierarchical model |
findancestors |
If T then find all the captures. If F then just return the encoded defining histories of the hierarchy |
The encoded capture histories that corresponds to the row number of the capture history data set
Silverman, B. W., Chan, L. and Vincent, K., (2024). Bootstrapping Multiple Systems Estimates to Account for Model Selection Statistics and Computing, 34(44), Available from https://doi.org/10.1007/s11222-023-10346-9.
modelstr = "[12,23]" convert_from_hierarchy(modelstr) modelstr = "[12,3]" convert_from_hierarchy(modelstr, findancestors=F)
modelstr = "[12,23]" convert_from_hierarchy(modelstr) modelstr = "[12,3]" convert_from_hierarchy(modelstr, findancestors=F)
Given a vector of encoded captures defining a hierarchical model, re-express it in hierarchical model form
convert_to_hierarchy(kcap, nlists)
convert_to_hierarchy(kcap, nlists)
kcap |
A vector of captures |
nlists |
The number of lists |
A hierarchical representation of the vector of encoded captures.
Silverman, B. W., Chan, L. and Vincent, K., (2024). Bootstrapping Multiple Systems Estimates to Account for Model Selection Statistics and Computing, 34(44), Available from https://doi.org/10.1007/s11222-023-10346-9.
kcap=c(1,2,3,5,4) nlists=3 convert_to_hierarchy(kcap, nlists)
kcap=c(1,2,3,5,4) nlists=3 convert_to_hierarchy(kcap, nlists)
The routine counts the number of subsets of size three of lists such that every pair of lists in the triple overlaps. If the number is zero, then the model with all two-list effects is unidentifiable.
count_triples(zdat)
count_triples(zdat)
zdat |
Data matrix with |
a count of subsets of size three of lists such that every pair of lists in the triple overlaps.
data(Western) data(Artificial_3) count_triples(Western) count_triples(Artificial_3)
data(Western) data(Artificial_3) count_triples(Western) count_triples(Artificial_3)
Given a capture history as a number and the number of lists, decode it into a logical vector giving presence or absence in the capture history.
decode_capture(k, nlists)
decode_capture(k, nlists)
k |
The capture history to be decoded |
nlists |
The number of lists |
A logical vector of length nlists
giving presence or absence in the capture history
Silverman, B. W., Chan, L. and Vincent, K., (2024). Bootstrapping Multiple Systems Estimates to Account for Model Selection Statistics and Computing, 34(44), Available from https://doi.org/10.1007/s11222-023-10346-9.
decode_capture(2,5) decode_capture(1,4)
decode_capture(2,5) decode_capture(1,4)
Given any encoded capture history, find all the encoded capture histories that include the original capture history and any other lists
descendants(k, nlists, omitk = FALSE)
descendants(k, nlists, omitk = FALSE)
k |
An encoded capture history |
nlists |
The total number of lists |
omitk |
Determine whether the original capture history is included as a descendant of itself. If |
a vector giving the encoded versions of the descendants
Silverman, B. W., Chan, L. and Vincent, K., (2024). Bootstrapping Multiple Systems Estimates to Account for Model Selection Statistics and Computing, 34(44), Available from https://doi.org/10.1007/s11222-023-10346-9.
descendants(2,5) descendants(5,10)
descendants(2,5) descendants(5,10)
Construct bootstrap replications and use the downhill fit method to obtain point estimates of total population sizes from each bootstrap sample.
downhill_bootstrapcal( xdata, nboot = 1000, iseed = 1234, checkid = T, verbose = F, maxorder = dim(xdata)[2] - 2 )
downhill_bootstrapcal( xdata, nboot = 1000, iseed = 1234, checkid = T, verbose = F, maxorder = dim(xdata)[2] - 2 )
xdata |
original data matrix |
nboot |
number of bootstrap replicates |
iseed |
random seed |
checkid |
If it is T, then |
verbose |
If TRUE, return the list of extra output from |
maxorder |
Maximum order of models to be included |
Point estimates of total population sizes from each bootstrap sample.
Silverman, B. W., Chan, L. and Vincent, K., (2024). Bootstrapping Multiple Systems Estimates to Account for Model Selection Statistics and Computing, 34(44), Available from https://doi.org/10.1007/s11222-023-10346-9.
data(Korea) downhill_bootstrapcal(Korea)
data(Korea) downhill_bootstrapcal(Korea)
Find a local optimum by downhill search among hierarchical models
downhill_fit( counts, desmat, maxorder = dim(desmat)[2] - 1, checkid = T, niter = 20, verbose = F )
downhill_fit( counts, desmat, maxorder = dim(desmat)[2] - 1, checkid = T, niter = 20, verbose = F )
counts |
Observed counts for the capture histories defined by desmat |
desmat |
Incidence matrix defining the capture histories observed with counts given by counts |
maxorder |
Maximum order of models to be included |
checkid |
If it is T, then |
niter |
Number of iterations |
verbose |
Specifies the output, if F then only returns the best value, if T, returns a more detailed list of objects |
A list with the following components
Optimal hierarchical model
hierarchical model with the minimum value
hierarhical models considered
Values of function
Silverman, B. W., Chan, L. and Vincent, K., (2024). Bootstrapping Multiple Systems Estimates to Account for Model Selection Statistics and Computing, 34(44), Available from https://doi.org/10.1007/s11222-023-10346-9.
data(Korea) xdata=Korea counts = xdata[,dim(xdata)[2]] desmat = xdata[,1:(dim(xdata)[2]-1)] downhill_fit(counts, desmat)
data(Korea) xdata=Korea counts = xdata[,dim(xdata)[2]] desmat = xdata[,1:(dim(xdata)[2]-1)] downhill_fit(counts, desmat)
The routine carries out pointwise, bootstrap and jackknife BCa confidence interval calculation and it calls
the following routines downhill_fit
, downhill_bootstrapcal
, downhill_jackknifecal
and bcaconfvalues
.
downhill_funs( xdata, maxorder = dim(xdata)[2] - 2, checkid = T, verbose = F, nboot = 1000, alpha = c(0.025, 0.05, 0.95, 0.975) )
downhill_funs( xdata, maxorder = dim(xdata)[2] - 2, checkid = T, verbose = F, nboot = 1000, alpha = c(0.025, 0.05, 0.95, 0.975) )
xdata |
The data matrix |
maxorder |
Maximum order of models to be included |
checkid |
if TRUE, do the Fienburg-Renaldo test |
verbose |
if TRUE, return the list of extra output from |
nboot |
The number of bootstrap replications |
alpha |
Bootstrap quantiles of interests. |
A list with the following components
A point estimate using the downhill fit approach
Point estimates of total population sizes from each bootstrap sample
The estimated acceleration factor
BCa confidence intervals
Silverman, B. W., Chan, L. and Vincent, K., (2024). Bootstrapping Multiple Systems Estimates to Account for Model Selection Statistics and Computing, 34(44), Available from https://doi.org/10.1007/s11222-023-10346-9.
data(Korea) xdata=Korea downhill_funs(xdata)
data(Korea) xdata=Korea downhill_funs(xdata)
It uses the downhill approach to calculate the jackknife abundance and returns the estimated acceleration factor
downhill_jackknifecal(xdata, checkid = T, maxorder = dim(xdata)[2] - 2)
downhill_jackknifecal(xdata, checkid = T, maxorder = dim(xdata)[2] - 2)
xdata |
original data matrix |
checkid |
If it is T, then |
maxorder |
Maximum order of models to be included |
the estimated acceleration factor
Silverman, B. W., Chan, L. and Vincent, K., (2024). Bootstrapping Multiple Systems Estimates to Account for Model Selection Statistics and Computing, 34(44), Available from https://doi.org/10.1007/s11222-023-10346-9.
data(Korea) downhill_jackknifecal(Korea)
data(Korea) downhill_jackknifecal(Korea)
Given a 0/1 capture history, encode it as number that corresponds to the row number of the capture history data set
encode_capture(z)
encode_capture(z)
z |
The capture history to be encoded, as a logical vector or a vector of 0s and 1s |
The capture history encoded as a number that corresponds to the row number of the capture history data set
encode_capture(c(1,0,0,0,0)) encode_capture(c(1,1,1,1,0)) encode_capture(c(T,F,T,F))
encode_capture(c(1,0,0,0,0)) encode_capture(c(1,1,1,1,0)) encode_capture(c(T,F,T,F))
This routine implements the bootstrapping and jacknife approach as detailed in Section 3.3 of Chan, Silverman and Vincent (2021).
It calls the routine estimatepopulation.0
and so is the preferred routine to be called if a user wishes to estimate the
population and obtain BCa confidence intervals.
estimatepopulation( zdat, nboot = 1000, pthresh = 0.02, iseed = 1234, alpha = c(0.025, 0.05, 0.1, 0.16, 0.84, 0.9, 0.95, 0.975), ... )
estimatepopulation( zdat, nboot = 1000, pthresh = 0.02, iseed = 1234, alpha = c(0.025, 0.05, 0.1, 0.16, 0.84, 0.9, 0.95, 0.975), ... )
zdat |
Data matrix with |
nboot |
Number of bootstrap replications. |
pthresh |
p-value threshold used in |
iseed |
seed for initialisation. |
alpha |
Bootstrap quantiles of interests. |
... |
other arguments which will be passed to |
A list with components as below:
popest
point estimate of the total population of the original data set
MSEfit
model fitted to the data, in the format described in modelfit
bootreps
point estimates of total population sizes from each bootstrap sample
ahat
the estimated acceleration factor
BCaquantiles
BCa confidence intervals
Chan, L., Silverman, B. W., and Vincent, K. (2021). Multiple Systems Estimation for Sparse Capture Data: Inferential Challenges when there are Non-Overlapping Lists. Journal of American Statistcal Association, 116(535), 1297-1306, Available from https://www.tandfonline.com/doi/full/10.1080/01621459.2019.1708748.
DiCiccio, T. J. and Efron, B. (1996). Bootstrap Confidence Intervals. Statistical Science, 40(3), 189-228.
data(Korea) zdat=Korea estimatepopulation(zdat,nboot=10)
data(Korea) zdat=Korea estimatepopulation(zdat,nboot=10)
estimatepopulation
should be used instead.This routine estimates the total population size, which includes the dark figure, together with confidence intervals as specified. It also returns the details of the fitted model. The user can choose whether to fit main effects only, to fit a particular model containing specified two-list parameters, or to choose the model using the stepwise approach described by Chan, Silverman and Vincent (2021).
estimatepopulation.0( zdat, method = "stepwise", quantiles = c(0.025, 0.975), mX = NULL, pthresh = 0.02 )
estimatepopulation.0( zdat, method = "stepwise", quantiles = c(0.025, 0.975), mX = NULL, pthresh = 0.02 )
zdat |
Data matrix with |
method |
If |
quantiles |
Quantiles of interest for confidence intervals. |
mX |
A |
pthresh |
Threshold p-value used if |
A list with components as below
estimate
Point estimate and confidence interval estimates corresponding to specified quantiles.
MSEfit
The model fitted to the data in the format described in modelfit
.
Chan, L., Silverman, B. W., and Vincent, K. (2021). Multiple Systems Estimation for Sparse Capture Data: Inferential Challenges when there are Non-Overlapping Lists. Journal of American Statistcal Association, 116(535), 1297-1306, Available from https://www.tandfonline.com/doi/full/10.1080/01621459.2019.1708748.
data(NewOrl) data(NewOrl_5) estimatepopulation.0(NewOrl, method="stepwise", quantiles=c(0.025,0.975)) estimatepopulation.0(NewOrl_5, method="main", quantiles=c(0.01, 0.05,0.95, 0.99))
data(NewOrl) data(NewOrl_5) estimatepopulation.0(NewOrl, method="stepwise", quantiles=c(0.025,0.975)) estimatepopulation.0(NewOrl_5, method="main", quantiles=c(0.01, 0.05,0.95, 0.99))
Find BIC rank matrix up to a specified number of bic ranks for a given data set
find_bic_rank_matrix(zsortbic, nbicranks = 5)
find_bic_rank_matrix(zsortbic, nbicranks = 5)
zsortbic |
Output from applying |
nbicranks |
The number of bic ranks to be propagated |
A bic rank matrix encoding how many steps the models considered need to get to the optimal one.
Silverman, B. W., Chan, L. and Vincent, K., (2024). Bootstrapping Multiple Systems Estimates to Account for Model Selection Statistics and Computing, 34(44), Available from https://doi.org/10.1007/s11222-023-10346-9.
data(Korea) zsortbic=assemble_bic(Korea, checkexist=T) find_bic_rank_matrix(zsortbic, nbicranks=5)
data(Korea) zsortbic=assemble_bic(Korea, checkexist=T) find_bic_rank_matrix(zsortbic, nbicranks=5)
Find all neighbouring hierarchical model to a given one
find_neighbour_hierarchies( modelstr, nlists = NA, keepmaineffects = T, maxorder = nlists - 1 )
find_neighbour_hierarchies( modelstr, nlists = NA, keepmaineffects = T, maxorder = nlists - 1 )
modelstr |
Model string written in hierarchical form |
nlists |
Number of lists. |
keepmaineffects |
If T, keep the main effects. If F remove. |
maxorder |
Maximum order of models to be included |
neighbour hierarchical models
Silverman, B. W., Chan, L. and Vincent, K., (2024). Bootstrapping Multiple Systems Estimates to Account for Model Selection Statistics and Computing, 34(44), Available from https://doi.org/10.1007/s11222-023-10346-9.
modelstr = "[12,23]" find_neighbour_hierarchies(modelstr)
modelstr = "[12,23]" find_neighbour_hierarchies(modelstr)
Given a matrix (for example of bootstrap replications) construct the matrix of unique patterns of non-zeroes, together with a vector of pointers back to that matrix.
find_unique_patterns(x)
find_unique_patterns(x)
x |
a matrix |
The original data x
with the additional components
matrix of unique patterns of non-zeroes/zeroes in the columns of x
vector of length dim(x)[2] giving the column of yuniq corresponding to each column of x
Silverman, B. W., Chan, L. and Vincent, K., (2024). Bootstrapping Multiple Systems Estimates to Account for Model Selection Statistics and Computing, 34(44), Available from https://doi.org/10.1007/s11222-023-10346-9.
data(Korea) z=assemble_bic(Korea) z=bootstrapcal(z) find_unique_patterns(z$bootreplications)
data(Korea) z=assemble_bic(Korea) z=bootstrapcal(z) find_unique_patterns(z$bootreplications)
Fit a hierarchical model taking account of possible sparsity
fit_hier_model(xdatin, hiermod, bicRcap = T, checkid = F)
fit_hier_model(xdatin, hiermod, bicRcap = T, checkid = F)
xdatin |
data obtained using |
hiermod |
hierarchical model to fit |
bicRcap |
if T then use the Rcapture convention that the BIC sample size is the number of cases observed. Otherwise use the number of cells in the Poisson log linear model. |
checkid |
if T then |
Silverman, B. W., Chan, L. and Vincent, K., (2024). Bootstrapping Multiple Systems Estimates to Account for Model Selection Statistics and Computing, 34(44), Available from https://doi.org/10.1007/s11222-023-10346-9.
data(Korea) xdatin = ingest_data(Korea) fit_hier_model(xdatin,"[12,23]") fit_hier_model(xdatin,"[12,3]")
data(Korea) xdatin = ingest_data(Korea) fit_hier_model(xdatin,"[12,23]") fit_hier_model(xdatin,"[12,3]")
Extracts from a larger vector of hierarchical models the ones satisfying the given criterion
gethiermodels(nlists, maxorder = nlists - 1, modelvec = hiermodels)
gethiermodels(nlists, maxorder = nlists - 1, modelvec = hiermodels)
nlists |
Number of lists |
maxorder |
Maximum order of models to be returned (defaults to nlists-1) |
modelvec |
vector of hierarchical models (defaults to hiermodels) |
A list of models satisfying the given criteria
Silverman, B. W., Chan, L. and Vincent, K., (2024). Bootstrapping Multiple Systems Estimates to Account for Model Selection Statistics and Computing, 34(44), Available from https://doi.org/10.1007/s11222-023-10346-9.
data(hiermodels) # Five lists with maximum order of 4 gethiermodels(nlists=5,maxorder=4) # Five lists with maximum order of 2 gethiermodels(nlists=5, maxorder=2)
data(hiermodels) # Five lists with maximum order of 4 gethiermodels(nlists=5,maxorder=4) # Five lists with maximum order of 2 gethiermodels(nlists=5, maxorder=2)
Hierarchical models from two lists to six lists
hiermodels
hiermodels
An object of class character
of length 39783.
These data provide hierarchical models constructed from two lists to six lists of maximal order 2.
Perform various preprocessing tasks on the data
ingest_data(xdat)
ingest_data(xdat)
xdat |
Data matrix of the usual kind |
A list with the following elements
Numbers of observations indexed by encoded histories
For each capture history, total number of observations for that capture history and all its descendants
Total number of lists
Names of the lists, constructed to be A, B, ... if necessary
The input data matrix
A vector indicating which parameters are not estimable, because they are strict descendants of parameters
which would be already estimated to be if they are included in the model
The inclusion matrix as constructed by make_master_design
Silverman, B. W., Chan, L. and Vincent, K., (2024). Bootstrapping Multiple Systems Estimates to Account for Model Selection Statistics and Computing, 34(44), Available from https://doi.org/10.1007/s11222-023-10346-9.
#Korea data data(Korea) ingest_data(Korea) #Kosovo data data(Kosovo) ingest_data(Kosovo)
#Korea data data(Korea) ingest_data(Korea) #Kosovo data data(Kosovo) ingest_data(Kosovo)
This routine reproduces Figure 1 of Chan, Silverman and Vincent (2021).
investigateAIC(nsim = 10000, Nsamp = 1000, seed = 1001)
investigateAIC(nsim = 10000, Nsamp = 1000, seed = 1001)
nsim |
The number of simulation replications |
Nsamp |
The expected value of the total population size within each simulation |
seed |
The random number seed |
Simulations are carried out for two different three-list models. In one model, the probabilities of capture are 0.01, 0.04 and 0.2 for the three lists respectively, while in the other the probability is 0.3 on all three lists. In both cases, captures on the lists occur independently of each other, so there are no two-list effects. The first model is chosen to be somewhat more typical of the sparse capture case, of the kind which often occurs in the human trafficking context, while the second, more reminiscent of a classical mark-recapture study.
The probability of an individual having each possible capture history is first evaluated.
Then these probabilities are multiplied by Nsamp = 1000
and, for each simulation
replicate, Poisson random values with expectations equal to these values are generated
to give a full set of observed capture histories;
together with the null capture history the expected number of counts
(population size) is equal to Nsamp
.
Inference was carried out both for the model with main effects only, and for the model with the addition of
a correlation effect between the first two lists.
The reduction in deviance between the two models was determined. Ten thousand simulations were carried out.
Checking for compliance with the conditions for existence and identifiability of the estimates shows that a very small number of the simulations for the sparse model (two out of ten thousand) fail the checks for existence even within the extended maximum likelihood context. Detailed investigation shows that in neither of these cases is the dark figure itself not estimable; although the parameters themselves cannot all be estimated, there is a maximum likelihood estimate of the expected capture frequencies, and hence the deviance can still be calculated.
The routine produces QQ-plots
of the resulting deviance reductions against quantiles of the distribution,
for
nsim
simulation replications.
An nsim
2 matrix giving the changes in deviance for each replication for each of the two models.
Chan, L., Silverman, B. W., and Vincent, K. (2021). Multiple Systems Estimation for Sparse Capture Data: Inferential Challenges when there are Non-Overlapping Lists. Journal of American Statistcal Association, 116(535), 1297-1306, Available from https://www.tandfonline.com/doi/full/10.1080/01621459.2019.1708748.
This routine takes the output from subsetmat
or from assemble_bic
and returns the jackknife
abundance matrix and jackknife BIC matrix.
jackknifecal(z, checkexist = T)
jackknifecal(z, checkexist = T)
z |
Results from |
checkexist |
If |
A list with the following components
Jackknife abundance matrix
Jackknife BIC matrix
Capture counts in the same order as the columns of jackabund
and jackbic
Silverman, B. W., Chan, L. and Vincent, K., (2024). Bootstrapping Multiple Systems Estimates to Account for Model Selection Statistics and Computing, 34(44), Available from https://doi.org/10.1007/s11222-023-10346-9.
data(Korea) z=assemble_bic(Korea) jackknifecal(z,checkexist=T)
data(Korea) z=assemble_bic(Korea) jackknifecal(z,checkexist=T)
Korean woman held in sexual slavery by the Japanese military
Korea
Korea
An object of class matrix
(inherits from array
) with 7 rows and 4 columns.
These data are collected into three lists. Full details are given in Figure 1 and the Data section of Ball et al. (2018)
Ball, P., Shin, E. H-S. and Yang, H. (2018). There may have been 14 undocumented Korean "comfort women" in Palembang, Indonesia. Technical Report, Humans Rights Data Analysis Group. Available from https://hrdag.org/wp-content/uploads/2018/12/KP-Palemban-ests.pdf
Data on 4400 observed killings in the Kosovo war between 20 March and 22 June 1999
Kosovo
Kosovo
An object of class data.frame
with 15 rows and 5 columns.
These data give the numbers of cases on each possible combination of four lists. The lists are labelled as follows: EXH = exhumations; ABA = American Bar Association Central and East European Law Initiative; OSCE = Organization for Security and Cooperation in Europe; HRW = Human Rights Watch. All 15 combinations have a nonzero count.
Ball, P., W. Betts, F. Scheuren, J. Dudukovich, and J. Asher (2002). Killings and Refugee Flow in Kosovo March-June 1999. American Association for the Advancement of Science. A Report to the International Criminal Tribunal for the Former Yugoslavia.
Find BCa confidence intervals using ktop idea.
ktopBCa( z, BICmatrix_break, alpha = c(0.025, 0.05, 0.1, 0.16, 0.2, 0.5, 0.8, 0.84, 0.9, 0.95, 0.975), maxorder = Inf, BCaFD = FALSE )
ktopBCa( z, BICmatrix_break, alpha = c(0.025, 0.05, 0.1, 0.16, 0.2, 0.5, 0.8, 0.84, 0.9, 0.95, 0.975), maxorder = Inf, BCaFD = FALSE )
z |
The output of |
BICmatrix_break |
output from |
alpha |
Levels of confidence intervals to be constructed and assessed |
maxorder |
Maximum order of models to be included |
BCaFD |
If T, return also the probabilities of the estimates |
A list with the following components
BCa confidence intervals for each considered model
Corresponding probabilities of the estimates
Silverman, B. W., Chan, L. and Vincent, K., (2024). Bootstrapping Multiple Systems Estimates to Account for Model Selection Statistics and Computing, 34(44), Available from https://doi.org/10.1007/s11222-023-10346-9.
data(Korea) z=assemble_bic(Korea) z=bootstrapcal(z) z=jackknifecal(z) BICmatrix_prop= find_bic_rank_matrix(z) BICmatrix_break = BICrank_tiebreak(BICmatrix_prop, 2) ktopBCa(z,BICmatrix_break)
data(Korea) z=assemble_bic(Korea) z=bootstrapcal(z) z=jackknifecal(z) BICmatrix_prop= find_bic_rank_matrix(z) BICmatrix_break = BICrank_tiebreak(BICmatrix_prop, 2) ktopBCa(z,BICmatrix_break)
This is the master design matrix which maps parameters to observations. Rows correspond to observations and columns to parameters.
make_master_design(nlists)
make_master_design(nlists)
nlists |
The number of lists |
A matrix whose element is 1 if the expected log of observation
depends on parameter
,
in other words if
is an ancestor of
.
Silverman, B. W., Chan, L. and Vincent, K., (2024). Bootstrapping Multiple Systems Estimates to Account for Model Selection Statistics and Computing, 34(44), Available from https://doi.org/10.1007/s11222-023-10346-9.
#Create master design matrix with 5 lists make_master_design(5) #Create master design matrix with 3 lists make_master_design(3)
#Create master design matrix with 5 lists make_master_design(5) #Create master design matrix with 3 lists make_master_design(3)
This routine fits a specified model to multiple systems estimation data, taking account of the possibility of empty overlaps between pairs of observed lists.
modelfit(zdat, mX = NULL, check = TRUE)
modelfit(zdat, mX = NULL, check = TRUE)
zdat |
Data matrix with |
mX |
A |
check |
If |
A list with components as below
fit
Details of the fit of the specified model as output by glm
. The Akaike information criterion is adjusted to take account
of the number of parameters corresponding to non-overlapping pairs.
emptyoverlaps
Matrix with two rows, giving the list pairs within the model for which no cases are observed in common.
Each column gives the indices of a pair of lists, with the names of the lists in the column name.
poisspempty
the Poisson p-values of the parameters corresponding to non-overlapping pairs.
data(NewOrl) modelfit(NewOrl,mX= c(1,3), check=TRUE)
data(NewOrl) modelfit(NewOrl,mX= c(1,3), check=TRUE)
This routine returns the order of the model given the model character string
modelorder(x)
modelorder(x)
x |
a model character string |
the order of the model character string
Silverman, B. W., Chan, L. and Vincent, K., (2024). Bootstrapping Multiple Systems Estimates to Account for Model Selection Statistics and Computing, 34(44), Available from https://doi.org/10.1007/s11222-023-10346-9.
x="[1,2,3,4,5]" modelorder(x) y="[12,24,25,35,45]" modelorder(y)
x="[1,2,3,4,5]" modelorder(x) y="[12,24,25,35,45]" modelorder(y)
Victims related to human trafficking in the Netherlands
Ned
Ned
An object of class data.frame
with 24 rows and 7 columns.
These data are collected into six lists. Full details are given in Table 2 of Silverman (2020).
Silverman, B. W. (2020). Model fitting in Multiple Systems Analysis for the quantification of Modern Slavery: Classical and Bayesian approaches Journal of Royal Statistical Society: Series A, 183(3), 691-736, Available from https://rss.onlinelibrary.wiley.com/doi/full/10.1111/rssa.12505
Netherlands data consolidated into five lists
Ned_5
Ned_5
An object of class data.frame
with 17 rows and 6 columns.
This reduces the Netherlands data Ned
into five lists, constructed by combining
the two smallest lists I and O into a single list.
Victims related to human trafficking in Greater New Orleans
NewOrl
NewOrl
An object of class data.frame
with 19 rows and 9 columns.
These data are collected into 8 lists. For reasons of confidentiality the lists are only labelled as A, B, ..., H. Full details are given in Bales, Murphy and Silverman (2020).
K. Bales, L. Murphy and B. W. Silverman (2020). How many trafficked people are there in Greater New Orleans? Journal of Human Trafficking, 6(4), 375-384, available from https://www.tandfonline.com/doi/full/10.1080/23322705.2019.1634936
New Orleans data consolidated into five lists
NewOrl_5
NewOrl_5
An object of class data.frame
with 14 rows and 6 columns.
This reduces the New Orleans data NewOrl
into five lists, constructed by combining
the four smallest lists B, E, F and G into a single list.
Find BCa confidence intervals for all possible ntop
ntopBCa( z, alpha = c(0.025, 0.05, 0.1, 0.16, 0.2, 0.5, 0.8, 0.84, 0.9, 0.95, 0.975), maxorder = Inf, BCaFD = FALSE )
ntopBCa( z, alpha = c(0.025, 0.05, 0.1, 0.16, 0.2, 0.5, 0.8, 0.84, 0.9, 0.95, 0.975), maxorder = Inf, BCaFD = FALSE )
z |
The output of |
alpha |
Levels of confidence intervals to be constructed and assessed |
maxorder |
Maximum order of models to be included |
BCaFD |
If T, return also the probabilities of the estimates |
A list with the following components
BCa confidence intervals for each considered model
Corresponding probabilities of the estimates
Silverman, B. W., Chan, L. and Vincent, K., (2024). Bootstrapping Multiple Systems Estimates to Account for Model Selection Statistics and Computing, 34(44), Available from https://doi.org/10.1007/s11222-023-10346-9.
data(Korea) z=assemble_bic(Korea, checkexist=T) z=bootstrapcal(z, checkexist=T) z=jackknifecal(z,checkexist=T) ntopBCa(z)
data(Korea) z=assemble_bic(Korea, checkexist=T) z=bootstrapcal(z, checkexist=T) z=jackknifecal(z,checkexist=T) ntopBCa(z)
Given a matrix with capture histories only, the routine orders the capture histories first by the number of 1s in the capture history and then lexicographically by columns.
ordercaptures(zmat)
ordercaptures(zmat)
zmat |
Data matrix with |
A data matrix that is ordered first by the number of 1s in the capture history and then lexicographically by columns.
Chan, L., Silverman, B. W., and Vincent, K. (2021). Multiple Systems Estimation for Sparse Capture Data: Inferential Challenges when there are Non-Overlapping Lists. Journal of American Statistcal Association, 116(535), 1297-1306, Available from https://www.tandfonline.com/doi/full/10.1080/01621459.2019.1708748.
#UK 5 list data data(UKdat_5) # Obtain matrix with capture histories only UKdat_5_cap <- UKdat_5[, 1:dim(UKdat_5)[2]-1] ordercaptures(UKdat_5_cap)
#UK 5 list data data(UKdat_5) # Obtain matrix with capture histories only UKdat_5_cap <- UKdat_5[, 1:dim(UKdat_5)[2]-1] ordercaptures(UKdat_5_cap)
Given any encoded capture history and the number of lists, find the encoded capture histories which are obtained by leaving out just one list in turn
parent_captures(k, nlists = 10)
parent_captures(k, nlists = 10)
k |
An encoded capture history that corresponds to the row number of the capture history data set |
nlists |
The total number of lists |
a vector giving the encoded versions of the parents
Silverman, B. W., Chan, L. and Vincent, K., (2024). Bootstrapping Multiple Systems Estimates to Account for Model Selection Statistics and Computing, 34(44), Available from https://doi.org/10.1007/s11222-023-10346-9.
parent_captures(2,10) parent_captures(1,4)
parent_captures(2,10) parent_captures(1,4)
The routine cleans up the data set by removing any lists that contain no data, any lists which contain all the observed data, and any list whose results duplicate those of another list. If as a result the original data set has no list left, it returns a matrix with the value of the total count.
removenoninformativelists(zdat)
removenoninformativelists(zdat)
zdat |
Data matrix with |
data matrix that contains no duplicate lists, no lists with no data, and no lists that contain all the observed data. If all lists are removed, the total count is returned.
Chan, L., Silverman, B. W., and Vincent, K. (2021). Multiple Systems Estimation for Sparse Capture Data: Inferential Challenges when there are Non-Overlapping Lists. Journal of American Statistcal Association, 116(535), 1297-1306, Available from https://www.tandfonline.com/doi/full/10.1080/01621459.2019.1708748.
This routine sorts the models in increasing order according to their BICs, returns the sorted models with their corresponding BICs and abundance. The original data as well as the maxorder of the models considered are returned as well.
sortmodelsbic( xdata, maxorder = dim(xdata)[2] - 2, checkexist = T, removeFRfail = T, ... )
sortmodelsbic( xdata, maxorder = dim(xdata)[2] - 2, checkexist = T, removeFRfail = T, ... )
xdata |
The original data matrix with capture histories and counts. |
maxorder |
Maximum order of models to be included |
checkexist |
If T then the Fienberg-Rinaldo condition is checked for each model |
removeFRfail |
If checkexist is T then models which fail the FR condition are removed from the results |
... |
Parameters to be fed to |
A list with the following components
A matrix with the models considered, their abundance, BIC and their order, sorted into increasing order of BIC
The original data matrix with capture histories and counts.
Order parameter that was feed into gethiermodels
data(hiermodels) data(Korea) sortmodelsbic(Korea, checkexist=T)
data(hiermodels) data(Korea) sortmodelsbic(Korea, checkexist=T)
Starting with a model with main effects only, two-list parameters are added one by one.
At each stage the parameter with the lowest p-value is added, provided that p-value is lower than pthresh
, and provided that the resulting
model does not fail either of the tests in checkident
.
stepwisefit(zdat, pthresh = 0.02)
stepwisefit(zdat, pthresh = 0.02)
zdat |
Data matrix with |
pthresh |
this is the threshold below which the p-value of the newly added parameter needs to be in order to be included in the model.
If |
For each candidate two-list parameter for possible addition to the model, the p-value is calculated as follows. The total of cases occurring on both lists indexed by the parameter (regardless of whether or not they are on any other lists) is calculated. On the null hypothesis that the effect is not included in the model, this statistic has a Poisson distribution whose mean depends on the parameters within the model. The one-sided Poisson p-value of the observed statistic is calculated.
A list with components as below
fit
Details of the fit of the specified model as output by glm
. The Akaike information criterion is adjusted to take account
of the number of parameters corresponding to non-overlapping pairs.
emptyoverlaps
Matrix with two rows, giving the list pairs within the model for which no cases are observed in common.
Each column gives the indices of a pair of lists, with the names of the lists in the column name.
poisspempty
the Poisson p-values of the parameters corresponding to non-overlapping pairs.
data(NewOrl) stepwisefit(NewOrl, pthresh=0.02)
data(NewOrl) stepwisefit(NewOrl, pthresh=0.02)
The idea of this routine is to reduce either or both of ntopmodels
and maxorder
without the need to recalculate
any actual model fits.
subsetmat(z, ntopmodels = Inf, maxorder = Inf)
subsetmat(z, ntopmodels = Inf, maxorder = Inf)
z |
output from |
ntopmodels |
number of top models. If (taking into account any change in the maximum order of models) there are fewer than
|
maxorder |
the maximum order of the models to be considered. If not specified, it will be set to the corresponding value in the input data. |
The routine subsets the results matrix as part of the output given by assemble_bic
based on specified parameters ntopmodels
and maxorder
. It returns the subsetted matrix,
original data matrix with capture histories and counts and the new actual value of maxorder
(reducing it
from the input value if necessary or if the default input value of is used).
A list with the following components
a matrix containing models being considered, abundance, BIC and their ordered after being subsetted by maxorder and ntopmodels
Original data matrix with counts and capture histories
The maximum order of models considered after subsetting
Jackknife abundance matrix, subsetted by maxorder and ntopmodels
Jackknife BIC matrix, subsetted by maxorder and ntopmodels
Capture counts in the same order as the columns of jackabund
and jackbic
Bootstrap abundance matrix, subsetted by maxorder and ntopmodels
Bootstrap BIC matrix, subsetted by maxorder and ntopmodels
If the input only has the output from assemble_bic
, the last five items of the list
do not appear.
data(UKdat_5) #Example 1 z=assemble_bic(UKdat_5) subsetmat(z,ntopmodels=100,maxorder=2) subsetmat(z) #Example 2 z=assemble_bic(UKdat_5) z=jackknifecal(z) z=bootstrapcal(z) subsetmat(z,ntopmodels=100,maxorder=2)
data(UKdat_5) #Example 1 z=assemble_bic(UKdat_5) subsetmat(z,ntopmodels=100,maxorder=2) subsetmat(z) #Example 2 z=assemble_bic(UKdat_5) z=jackknifecal(z) z=bootstrapcal(z) subsetmat(z,ntopmodels=100,maxorder=2)
The following routine returns a list of all subsets of a given set for which a specified property is TRUE
.
It is assumed that if the property is FALSE
for any particular subset, it is also FALSE
for all supersets of that subset.
This enables a branch search strategy to be used to obviate the need to search over supersets of subsets already eliminated from consideration.
It is used within the hierarchical search step of checkallmodels
.
subsetsearch(n, checkfun, testnull = TRUE, ...)
subsetsearch(n, checkfun, testnull = TRUE, ...)
n |
an integer such that the search is over all subsets of |
checkfun |
a function which takes arguments |
testnull |
If |
... |
other arguments |
A list of all subsets for which the value is TRUE
Chan, L., Silverman, B. W., and Vincent, K. (2021). Multiple Systems Estimation for Sparse Capture Data: Inferential Challenges when there are Non-Overlapping Lists. Journal of American Statistcal Association, 116(535), 1297-1306, Available from https://www.tandfonline.com/doi/full/10.1080/01621459.2019.1708748.
This routine finds rows with the same capture history and consolidates them into a single row whose count is the sum of counts of
the relevant rows. If includezerocounts = TRUE
then it also includes rows for all the capture histories with zero count; otherwise
these are all removed.
tidylists(zdat, includezerocounts = FALSE)
tidylists(zdat, includezerocounts = FALSE)
zdat |
Data matrix with |
includezerocounts |
If |
A data matrix in the form specified above, including all capture histories with zero counts if includezerocounts=TRUE
.
data(NewOrl) tidylists(NewOrl,includezerocounts=TRUE)
data(NewOrl) tidylists(NewOrl,includezerocounts=TRUE)
Data from the UK 2013 strategic assessment
UKdat
UKdat
An object of class data.frame
with 25 rows and 7 columns.
This is a table of six lists used in the research published by the Home Office as part of the strategy leading to the Modern Slavery Act 2015. The data are considered in six lists, labelled as follows: LA–Local authorities; NG–Non-government organisations; PF–Police forces; GO–Government organisations; GP–General public; NCA–National Crime Agency. Each of the first six columns in the data frame corresponds to one of these lists. Each of the rows of the data frame corresponds to a possible combination of lists, with value 1 in the relevant column if the list is in that particular combination. The last column of the data frame states the number of cases observed in that particular combination of lists. Combinations of lists for which zero cases are observed are omitted.
UK data consolidated into five lists
UKdat_5
UKdat_5
An object of class data.frame
with 18 rows and 6 columns.
This reduces the UK data UKdat
into five lists, constructed by combining
the PF and NCA lists into a single PFNCA list
These data are collected into 5 lists. For reasons of confidentiality the lists are only labelled as A, B, C, D and E. Full details are given in Farrell, Dank, Kfafian, Lockwood, Pfeffer, Hughes and Vincent (2019).
Western
Western
An object of class data.frame
with 13 rows and 6 columns.
Farrell, A., Dank, M., Kfafian, M., Lockwood, S., Pfeffer, R., Hughes, A., and Vincent, K. (2019). Capturing human trafficking victimization through crime reporting. Technical Report 2015-VF-GX-0105, National Institute of Justice. Available from https://www.ncjrs.gov/pdffiles1/nij/grants/252520.pdf.