Recent zbMATH articles in MSC 62Fhttps://www.zbmath.org/atom/cc/62F2021-11-25T18:46:10.358925ZWerkzeugBooks review of: P. Müller et al., Bayesian nonparametric data analysishttps://www.zbmath.org/1472.000092021-11-25T18:46:10.358925Z"Bouza, C. N."https://www.zbmath.org/authors/?q=ai:bouza-herrera.carlos-narcisoReview of [Zbl 1333.62003].The exponentiated discrete inverse Rayleigh distributionhttps://www.zbmath.org/1472.600242021-11-25T18:46:10.358925Z"Hamed Mashhadzadeh, Zahra"https://www.zbmath.org/authors/?q=ai:hamed-mashhadzadeh.zahra"Mirmostafaee, S. M. T. K."https://www.zbmath.org/authors/?q=ai:mirmostafaee.s-m-t-kSummary: In this paper, a new distribution called the exponentiated discrete inverse Rayleigh distribution is introduced, which is an extension of the discrete inverse Rayleigh distribution. This new discrete distribution is a discrete analogue of the continuous exponentiated inverse Rayleigh distribution. In this paper, we discuss the shapes of probability mass and hazard rate functions, the moments of the new distribution and data generation. The maximum likelihood estimation of the parameters is also studied. Finally, an example is given to demonstrate an application of the new distribution.Use of the Lévy distribution to adjust data with asymmetry and extreme valueshttps://www.zbmath.org/1472.600272021-11-25T18:46:10.358925Z"Martínez Naranjo, Jessica Lizeth"https://www.zbmath.org/authors/?q=ai:martinez-naranjo.jessica-lizeth"Alvear Rodríguez, Carlos Armando"https://www.zbmath.org/authors/?q=ai:alvear-rodriguez.carlos-armando"Tovar Cueva, José Rafael"https://www.zbmath.org/authors/?q=ai:tovar-cueva.jose-rafaelSummary: In order to propose a statistical methodology that allows to model asymmetric data using the Lévy distribution, a simulation study is presented under nine different scenarios to evaluate the estimation of the parameters of the distribution in the two approaches of statistics (Classical and Bayesian). The probability distributions Log-Normal, Lévy and Lévy Standard were considered to model the behavior of two real data sets with positive asymmetry, finding that the Lévy distribution fitted well to the proposed data set, therefore the Lévy distribution can be considered as a candidate to adjust asymmetric data with the presence of extreme values.Heavy-tailed distributions, correlations, kurtosis and Taylor's law of fluctuation scalinghttps://www.zbmath.org/1472.600302021-11-25T18:46:10.358925Z"Cohen, Joel E."https://www.zbmath.org/authors/?q=ai:cohen.joel-e"Davis, Richard A."https://www.zbmath.org/authors/?q=ai:davis.richard-a"Samorodnitsky, Gennady"https://www.zbmath.org/authors/?q=ai:samorodnitsky.gennady-pSummary: \textit{N. S. Pillai} and \textit{X.-L. Meng} [Ann. Stat. 44, No. 5, 2089--2097 (2016; Zbl 1349.62036), p. 2091] speculated that `the dependence among [random variables, rvs] can be overwhelmed by the heaviness of their marginal tails\dots'. We give examples of statistical models that support this speculation. While under natural conditions the sample correlation of regularly varying (RV) rvs converges to a generally random limit, this limit is zero when the rvs are the reciprocals of powers greater than one of arbitrarily (but imperfectly) positively or negatively correlated normals. Surprisingly, the sample correlation of these RV rvs multiplied by the sample size has a limiting distribution on the negative half-line. We show that the asymptotic scaling of Taylor's Law (a power-law variance function) for RV rvs is, up to a constant, the same for independent and identically distributed observations as for reciprocals of powers greater than one of arbitrarily (but imperfectly) positively correlated normals, whether those powers are the same or different. The correlations and heterogeneity do not affect the asymptotic scaling. We analyse the sample kurtosis of heavy-tailed data similarly. We show that the least-squares estimator of the slope in a linear model with heavy-tailed predictor and noise unexpectedly converges much faster than when they have finite variances.Convergence in mean and central limit theorems for weighted sums of martingale difference random vectors with infinite \(r\)th momentshttps://www.zbmath.org/1472.600422021-11-25T18:46:10.358925Z"Dung, L. V."https://www.zbmath.org/authors/?q=ai:dung.le-viet"Son, T. C."https://www.zbmath.org/authors/?q=ai:son.tran-cao|son.ta-cong"Tu, T. T."https://www.zbmath.org/authors/?q=ai:tu.teng-tao|tu.ton-thatSummary: Let \((X_{nj};1\leq j\leq m_n,n\geq 1)\) be an array of rowwise \(\mathbb{R}^d\)-valued martingale difference \((d\geq 1)\) with respect to \(\sigma\)-fields \((\mathcal{F}_{nj};0\leq j\leq m_n,n\geq 1)\) and let \((C_{nj};1\leq j\leq m_n,n\geq 1)\) be an array of \(m\times d\) matrices of real numbers, where \((m_n;n\geq 1)\) is a sequence of positive integers such that \(m_n\rightarrow\infty\) as \(n\rightarrow\infty\). The aim of this paper is to establish convergence in mean and central limit theorems for weighted sums type \(S_n=\sum_{j=1}^{m_n}C_{nj}X_{nj}\) under some conditions of slow variation at infinity. We also apply the obtained results to study the asymptotic properties of estimates in some statistical models. In addition, two illustrative examples and their simulation are given. This study is motivated by models arising in economics, telecommunications, hydrology, and physics applications where the innovations are often dependent on each other and have infinite variances.Consistency of empirical Bayes and kernel flow for hierarchical parameter estimationhttps://www.zbmath.org/1472.620122021-11-25T18:46:10.358925Z"Chen, Yifan"https://www.zbmath.org/authors/?q=ai:chen.yifan"Owhadi, Houman"https://www.zbmath.org/authors/?q=ai:owhadi.houman"Stuart, Andrew M."https://www.zbmath.org/authors/?q=ai:stuart.andrew-mSummary: Gaussian process regression has proven very powerful in statistics, machine learning and inverse problems. A crucial aspect of the success of this methodology, in a wide range of applications to complex and real-world problems, is hierarchical modeling and learning of hyperparameters. The purpose of this paper is to study two paradigms of learning hierarchical parameters: one is from the probabilistic Bayesian perspective, in particular, the empirical Bayes approach that has been largely used in Bayesian statistics; the other is from the deterministic and approximation theoretic view, and in particular the kernel flow algorithm that was proposed recently in the machine learning literature. Analysis of their consistency in the large data limit, as well as explicit identification of their implicit bias in parameter learning, are established in this paper for a Matérn-like model on the torus. A particular technical challenge we overcome is the learning of the regularity parameter in the Matérn-like field, for which consistency results have been very scarce in the spatial statistics literature. Moreover, we conduct extensive numerical experiments beyond the Matérn-like model, comparing the two algorithms further. These experiments demonstrate learning of other hierarchical parameters, such as amplitude and lengthscale; they also illustrate the setting of model misspecification in which the kernel flow approach could show superior performance to the more traditional empirical Bayes approach.A weighted composite likelihood approach to inference from clustered survey data under a two-level modelhttps://www.zbmath.org/1472.620172021-11-25T18:46:10.358925Z"Dumitrescu, Laura"https://www.zbmath.org/authors/?q=ai:dumitrescu.laura"Qian, Wei"https://www.zbmath.org/authors/?q=ai:qian.wei"Rao, J. N. K."https://www.zbmath.org/authors/?q=ai:rao.jon-n-kSummary: Two-level models are widely used for analysing clustered survey data with the design structure matching the model hierarchy. Hypothesis testing on model parameters is studied, using a weighted composite likelihood approach that takes account of the survey design features. In particular, the asymptotic normality of the weighted composite likelihood estimators is established. Using this result, the asymptotic distributions of a generalised score test statistic and a likelihood ratio type test statistic, under a null composite hypothesis on the model parameters, is established. Results of a limited simulation study on the finite sample performance of the proposed tests are reported.Discrete generalized half-normal distribution and its applications in quantile regressionhttps://www.zbmath.org/1472.620262021-11-25T18:46:10.358925Z"Gallardo, Diego I."https://www.zbmath.org/authors/?q=ai:gallardo.diego-i"Gómez-Déniz, Emilio"https://www.zbmath.org/authors/?q=ai:gomez-deniz.emilio"Gómez, Héctor W."https://www.zbmath.org/authors/?q=ai:gomez.hector-wSummary: A new discrete two-parameter distribution is introduced by discretizing a generalized half-normal distribution. The model is useful for fitting overdispersed as well as underdispersed data. The failure function can be decreasing, bathtub shaped or increasing. A reparameterization of the distribution is introduced for use in a regression model based on the median. The behaviour of the maximum likelihood estimates is studied numerically, showing good performance in finite samples. Three real data set applications reveal that the new model can provide a better explanation than some other competitors.A Bayesian solution to the Behrens-Fisher problemhttps://www.zbmath.org/1472.620292021-11-25T18:46:10.358925Z"Girón, Fco. Javier"https://www.zbmath.org/authors/?q=ai:giron-gonzalez-torre.francisco-javier"del Castillo, Carmen"https://www.zbmath.org/authors/?q=ai:del-castillo.carmenSummary: A simple solution to the Behrens-Fisher problem based on Bayes factors is presented, and its relation with the Behrens-Fisher distribution is explored. The construction of the Bayes factor is based on a simple hierarchical model, and has a closed form based on the densities of general Behrens-Fisher distributions. Simple asymptotic approximations of the Bayes factor, which are functions of the Kullback-Leibler divergence between normal distributions, are given, and it is also proved to be consistent. Some examples and comparisons are also presented.Analysis of ``learn-as-you-go'' (LAGO) studieshttps://www.zbmath.org/1472.620332021-11-25T18:46:10.358925Z"Nevo, Daniel"https://www.zbmath.org/authors/?q=ai:nevo.daniel"Lok, Judith J."https://www.zbmath.org/authors/?q=ai:lok.judith-j"Spiegelman, Donna"https://www.zbmath.org/authors/?q=ai:spiegelman.donnaThis paper is a study on a complex multicomponent package. It calls Learn-As-you-GO (LAGO) adaptive studies. The authors describe the LAGO design. They propose a relevant estimator and study its asymptotic properties, hypothesis tests and confidence intervals. They present a simulation. They present an illustrative analysis of the BetterBirth Study, it applies to public health. They then discuss their results and future research.A revisit to Le Cam's first lemmahttps://www.zbmath.org/1472.620342021-11-25T18:46:10.358925Z"Babu, G. Jogesh"https://www.zbmath.org/authors/?q=ai:babu.gutti-jogesh"Li, Bing"https://www.zbmath.org/authors/?q=ai:li.bing|li.bing.1Summary: Le Cam's first lemma [\textit{L. Le Cam}, Univ. California Publ. Stat. 3, 37--98 (1960; Zbl 0104.12701)] is of fundamental importance to modern theory of statistical inference: it is a key result in the foundation of the Convolution Theorem, which implies a very general form of the optimality of the maximum likelihood estimate and any statistic that is asymptotically equivalent to it. This lemma is also important for developing asymptotically efficient tests. In this note we give a relatively simple but detailed proof of Le Cam's first lemma. Our proof allows us to grasp the central idea by making analogies between contiguity and absolute continuity, and is particularly attractive when teaching this lemma in a classroom setting.New efficient spline estimation for varying-coefficient models with two-step knot number selectionhttps://www.zbmath.org/1472.620352021-11-25T18:46:10.358925Z"Jin, Jun"https://www.zbmath.org/authors/?q=ai:jin.jun"Ma, Tiefeng"https://www.zbmath.org/authors/?q=ai:ma.tiefeng"Dai, Jiajia"https://www.zbmath.org/authors/?q=ai:dai.jiajiaSummary: One of the advantages for the varying-coefficient model is to allow the coefficients to vary as smooth functions of other variables and the coefficients functions can be estimated easily through a simple B-spline approximations method. This leads to a simple one-step estimation procedure. We show that such a one-step method cannot be optimal when some coefficient functions possess different degrees of smoothness. Under the regularity conditions, the consistency and asymptotic normality of the two step B-spline estimators are also derived. A few simulation studies show that the gain by the two-step procedure can be quite substantial. The methodology is illustrated by an AIDS data set.Stress-strength parameter estimation based on type-II progressive censored samples for a Weibull-half-logistic distributionhttps://www.zbmath.org/1472.620362021-11-25T18:46:10.358925Z"Kazemi, Ramin"https://www.zbmath.org/authors/?q=ai:kazemi.ramin"Kohansal, Akram"https://www.zbmath.org/authors/?q=ai:kohansal.akramSummary: To produce more flexible model, the Bayesian and classical inference of the stress-strength parameter, \(R\), is studied under Type-II progressive censored samples, when stress and strength are two independent Weibull-half-logistic variables. In classical inference, the maximum likelihood estimation, approximation maximum likelihood estimation, uniformly minimum variance unbiased estimate and asymptotic confidence intervals of \(R\) are considered. Moreover, in Bayesian inference, two approximation Bayes estimates, exact Bayes estimate and highest posterior density intervals of \(R\), are derived. These estimations are considered in different cases. Furthermore, the Monte Carlo simulations are applied to compare the performance of different methods. Two data sets are analyzed for illustrative aims.Parameter estimation for long-memory stochastic volatility at discrete observationhttps://www.zbmath.org/1472.620372021-11-25T18:46:10.358925Z"Wang, Xiaohui"https://www.zbmath.org/authors/?q=ai:wang.xiaohui"Zhang, Weiguo"https://www.zbmath.org/authors/?q=ai:zhang.weiguo.1Summary: Ordinary least squares estimators of variogram parameters in long-memory stochastic volatility are studied in this paper. We use the discrete observations for practical purposes under the assumption that the Hurst parameter \(H \in(1 / 2,1)\) is known. Based on the ordinary least squares method, we obtain both the explicit estimators for drift and diffusion by minimizing the distance function between the variogram and the data periodogram. Furthermore, the resulting estimators are shown to be consistent and to have the asymptotic normality. Numerical examples are also presented to illustrate the performance of our method.Contaminant transport forecasting in the subsurface using a Bayesian frameworkhttps://www.zbmath.org/1472.620382021-11-25T18:46:10.358925Z"Al-Mamun, A."https://www.zbmath.org/authors/?q=ai:al-mamun.abdullah"Barber, J."https://www.zbmath.org/authors/?q=ai:barber.joel-r|barber.james-r|barber.jared|barber.james-s|barber.janet-e|barber.jarrett-j|barber.john-l"Ginting, V."https://www.zbmath.org/authors/?q=ai:ginting.victor"Pereira, F."https://www.zbmath.org/authors/?q=ai:pereira.f-t|pereira.francisco-revson-f|pereira.fernando-c-n|lobo-pereira.fernando|pereira.fabio-henrique|pereira.f-n|pereira.fabiola-s-f|pereira.f-m-f-l|pereira.fabio-r|pereira.fatima-f|pereira.felipe|pereira.fernanda|pereira.felipe-de-c|pereira.felipe-a-c|pereira.fernando-carlos|pereira.flavio-i-m|pereira.fernando-magno-quintao|pereira.fernando-a|pereira.francisco-camara|pereira.francisco-b"Rahunanthan, A."https://www.zbmath.org/authors/?q=ai:rahunanthan.arunasalamSummary: In monitoring subsurface aquifer contamination, we want to predict quantities -- fractional flow curves of pollutant concentration -- using subsurface fluid flow models with expertise and limited data. A Bayesian approach is considered here and the complexity associated with the simulation study presents an ongoing practical challenge. We use a Karhunen-Loève expansion for the permeability field in conjunction with GPU computing within a two-stage Markov Chain Monte Carlo (MCMC) method. Further reduction in computing costs is addressed by running several MCMC chains. We compare convergence criteria to quantify the uncertainty of predictions. Our contributions are two-fold: we first propose a fitting procedure for the Multivariate Potential Scale Reduction Factor (MPSRF) data that allows us to estimate the number of iterations for convergence. Then we present a careful analysis of ensembles of fractional flow curves suggesting that, for the problem at hand, the number of iterations required for convergence through the MPSRF analysis is excessive. Thus, for practical applications, our results provide an indication that an analysis of the posterior distributions of quantities of interest provides a reliable criterion to terminate MCMC simulations for quantifying uncertainty.Confidence in confidence distributions!https://www.zbmath.org/1472.620392021-11-25T18:46:10.358925Z"Cunen, Céline"https://www.zbmath.org/authors/?q=ai:cunen.celine"Hjort, Nils Lid"https://www.zbmath.org/authors/?q=ai:hjort.nils-lid"Schweder, Tore"https://www.zbmath.org/authors/?q=ai:schweder.toreSummary: The recent article [\textit{M. S. Balch} et al., Proc. R. Soc. Lond., A, Math. Phys. Eng. Sci. 475, No. 2227, Article ID 20180565, 20 p. (2019; Zbl 1472.62173)] points to certain difficulties with Bayesian analysis when used for models for satellite conjuntion and ensuing operative decisions. Here, we supplement these previous analyses and findings with further insights, uncovering what we perceive of as being the crucial points, explained in a prototype set-up where exact analysis is attainable. We also show that a different and frequentist method, involving confidence distributions, is free of the false confidence syndrome.Semi-parametric adjustment to computer modelshttps://www.zbmath.org/1472.620402021-11-25T18:46:10.358925Z"Wang, Yan"https://www.zbmath.org/authors/?q=ai:wang.yan.7"Tuo, Rui"https://www.zbmath.org/authors/?q=ai:tuo.ruiSummary: Computer simulations are widely used in scientific exploration and engineering designs. However, computer outputs usually do not match the reality perfectly because the computer models are built under certain simplifications and approximations. When physical observations are also available, statistical methods can be applied to estimate the discrepancy between the computer output and the physical response. In this article, we propose a semi-parametric method for statistical adjustments to computer models. The proposed method is proven to enjoy nice theoretical properties. We use three numerical studies and a real example to examine the predictive performance of the proposed method. The results show that the proposed method outperforms existing methods.Matrix optimization based Euclidean embedding with outliershttps://www.zbmath.org/1472.620412021-11-25T18:46:10.358925Z"Zhang, Qian"https://www.zbmath.org/authors/?q=ai:zhang.qian"Zhao, Xinyuan"https://www.zbmath.org/authors/?q=ai:zhao.xinyuan"Ding, Chao"https://www.zbmath.org/authors/?q=ai:ding.chaoSummary: Euclidean embedding from noisy observations containing outlier errors is an important and challenging problem in statistics and machine learning. Many existing methods would struggle with outliers due to a lack of detection ability. In this paper, we propose a matrix optimization based embedding model that can produce reliable embeddings and identify the outliers jointly. We show that the estimators obtained by the proposed method satisfy a non-asymptotic risk bound, implying that the model provides a high accuracy estimator with high probability when the order of the sample size is roughly the degree of freedom up to a logarithmic factor. Moreover, we show that under some mild conditions, the proposed model also can identify the outliers without any prior information with high probability. Finally, numerical experiments demonstrate that the matrix optimization-based model can produce configurations of high quality and successfully identify outliers even for large networks.Quantification of model uncertainty on path-space via goal-oriented relative entropyhttps://www.zbmath.org/1472.620422021-11-25T18:46:10.358925Z"Birrell, Jeremiah"https://www.zbmath.org/authors/?q=ai:birrell.jeremiah"Katsoulakis, Markos A."https://www.zbmath.org/authors/?q=ai:katsoulakis.markos-a"Rey-Bellet, Luc"https://www.zbmath.org/authors/?q=ai:rey-bellet.lucSummary: Quantifying the impact of parametric and model-form uncertainty on the predictions of stochastic models is a key challenge in many applications. Previous work has shown that the relative entropy rate is an effective tool for deriving path-space uncertainty quantification (UQ) bounds on ergodic averages. In this work we identify appropriate information-theoretic objects for a wider range of quantities of interest on path-space, such as hitting times and exponentially discounted observables, and develop the corresponding UQ bounds. In addition, our method yields tighter UQ bounds, even in cases where previous relative-entropy-based methods also apply, \textit{e.g.}, for ergodic averages. We illustrate these results with examples from option pricing, non-reversible diffusion processes, stochastic control, semi-Markov queueing models, and expectations and distributions of hitting times.A variance shift model for detection of outliers in the linear measurement error modelhttps://www.zbmath.org/1472.620702021-11-25T18:46:10.358925Z"Babadi, Babak"https://www.zbmath.org/authors/?q=ai:babadi.babak"Rasekh, Abdolrahman"https://www.zbmath.org/authors/?q=ai:rasekh.abdolrahman"Rasekhi, Ali Akbar"https://www.zbmath.org/authors/?q=ai:rasekhi.ali-akbar"Zare, Karim"https://www.zbmath.org/authors/?q=ai:zare.karim"Zadkarami, Mohammad Reza"https://www.zbmath.org/authors/?q=ai:zadkarami.mohammad-rezaSummary: We present a variance shift model for a linear measurement error model using the corrected likelihood of \textit{T. Nakamura} [Biometrika 77, No. 1, 127--137 (1990; Zbl 0691.62066)]. This model assumes that a single outlier arises from an observation with inflated variance. The corrected likelihood ratio and the score test statistics are proposed to determine whether the \(i\)th observation has an inflated variance. A parametric bootstrap procedure is used to obtain empirical distributions of the test statistics and a simulation study has been used to show the performance of proposed tests. Finally, a real data example is given for illustration.Bayesian feature selection with strongly regularizing priors maps to the Ising modelhttps://www.zbmath.org/1472.620942021-11-25T18:46:10.358925Z"Fisher, Charles K."https://www.zbmath.org/authors/?q=ai:fisher.charles-k"Mehta, Pankaj"https://www.zbmath.org/authors/?q=ai:mehta.pankajSummary: Identifying small subsets of features that are relevant for prediction and classification tasks is a central problem in machine learning and statistics. The feature selection task is especially important, and computationally difficult, for modern data sets where the number of features can be comparable to or even exceed the number of samples. Here, we show that feature selection with Bayesian inference takes a universal form and reduces to calculating the magnetizations of an Ising model under some mild conditions. Our results exploit the observation that the evidence takes a universal form for strongly regularizing priors -- priors that have a large effect on the posterior probability even in the infinite data limit. We derive explicit expressions for feature selection for generalized linear models, a large class of statistical techniques that includes linear and logistic regression. We illustrate the power of our approach by analyzing feature selection in a logistic regression-based classifier trained to distinguish between the letters B and D in the notMNIST data set.Affine-transformation invariant clustering modelshttps://www.zbmath.org/1472.620962021-11-25T18:46:10.358925Z"Huang, Hsin-Hsiung"https://www.zbmath.org/authors/?q=ai:huang.hsin-hsiung"Yang, Jie"https://www.zbmath.org/authors/?q=ai:yang.jie.4|yang.jie.1|yang.jie.2|yang.jie.3Summary: We develop a cluster process which is invariant with respect to unknown affine transformations of the feature space without knowing the number of clusters in advance. Specifically, our proposed method can identify clusters invariant under (I) orthogonal transformations, (II) scaling-coordinate orthogonal transformations, and (III) arbitrary nonsingular linear transformations corresponding to models I, II, and III, respectively and represent clusters with the proposed heatmap of the similarity matrix. The proposed Metropolis-Hasting algorithm leads to an irreducible and aperiodic Markov chain, which is also efficient at identifying clusters reasonably well for various applications. Both the synthetic and real data examples show that the proposed method could be widely applied in many fields, especially for finding the number of clusters and identifying clusters of samples of interest in aerial photography and genomic data.Convergence rates for Bayesian estimation and testing in monotone regressionhttps://www.zbmath.org/1472.621062021-11-25T18:46:10.358925Z"Chakraborty, Moumita"https://www.zbmath.org/authors/?q=ai:chakraborty.moumita"Ghosal, Subhashis"https://www.zbmath.org/authors/?q=ai:ghosal.subhashisSummary: Shape restrictions such as monotonicity on functions often arise naturally in statistical modeling. We consider a Bayesian approach to the estimation of a monotone regression function and testing for monotonicity. We construct a prior distribution using piecewise constant functions. For estimation, a prior imposing monotonicity of the heights of these steps is sensible, but the resulting posterior is harder to analyze theoretically. We consider a ``projection-posterior'' approach, where a conjugate normal prior is used, but the monotonicity constraint is imposed on posterior samples by a projection map onto the space of monotone functions. We show that the resulting posterior contracts at the optimal rate \(n^{-1/3}\) under the \(\mathbb{L}_1\)-metric and at a nearly optimal rate under the empirical \(\mathbb{L}_p\)-metrics for \(0< p\le 2\). The projection-posterior approach is also computationally more convenient. We also construct a Bayesian test for the hypothesis of monotonicity using the posterior probability of a shrinking neighborhood of the set of monotone functions. We show that the resulting test has a universal consistency property and obtain the separation rate which ensures that the resulting power function approaches one.A pseudo knockoff filter for correlated featureshttps://www.zbmath.org/1472.621082021-11-25T18:46:10.358925Z"Chen, Jiajie"https://www.zbmath.org/authors/?q=ai:chen.jiajie"Hou, Anthony"https://www.zbmath.org/authors/?q=ai:hou.anthony"Hou, Thomas Y."https://www.zbmath.org/authors/?q=ai:hou.thomas-yizhaoSummary: In [\textit{R. F. Barber} and \textit{E. J. Candès}, Ann. Stat. 43, No. 5, 2055--2085 (2015; Zbl 1327.62082)], the authors introduced a new variable selection procedure called the knockoff filter to control the false discovery rate (FDR) and proved that this method achieves exact FDR control. Inspired by the work by Barber \& Candès [loc. cit.], we propose a pseudo knockoff filter that inherits some advantages of the original knockoff filter and has more flexibility in constructing its knockoff matrix. Moreover, we perform a number of numerical experiments that seem to suggest that the pseudo knockoff filter with the half Lasso statistic has FDR control and offers more power than the original knockoff filter with the Lasso Path or the half Lasso statistic for the numerical examples that we consider in this paper. Although we cannot establish rigourous FDR control for the pseudo knockoff filter, we provide some partial analysis of the pseudo knockoff filter with the half Lasso statistic and establish a uniform false discovery proportion bound and an expectation inequality.Erratum to: ``Asymptotic normality of total least squares estimator in a multivariate errors-in-variables model \(AX = B\)''https://www.zbmath.org/1472.621112021-11-25T18:46:10.358925Z"Kukush, Alexander"https://www.zbmath.org/authors/?q=ai:kukush.alexander-g"Tsaregorodtsev, Yaroslav"https://www.zbmath.org/authors/?q=ai:tsaregorodtsev.yaroslavFrom the text: The following list provides a description of the changes made to the publication since the original version [the authors, ibid. 3, No. 1, 47--57 (2016; Zbl 1419.62167)] was printed.
\begin{itemize}
\item[1.] On the left-hand side of (3.3), it should be
\[
\sqrt{m}(\hat{X}_{tls}-X_0).
\]
\item[2.] On the left-hand side of the formula below Remark 11, it should be
\[
\sqrt{m}(\hat{X}_{tls}-X_0).
\]
\item[3.] In the text below (4.11), instead of ``zero expression'' it should be ``zero expectation''.
\end{itemize}Inference in high dimensional linear measurement error modelshttps://www.zbmath.org/1472.621122021-11-25T18:46:10.358925Z"Li, Mengyan"https://www.zbmath.org/authors/?q=ai:li.mengyan"Li, Runze"https://www.zbmath.org/authors/?q=ai:li.runze"Ma, Yanyuan"https://www.zbmath.org/authors/?q=ai:ma.yanyuanThis article develops statistical inference on the parameters associated with the error-prone covariates in the high dimensional linear model with a finite number of covariates measured with errors. A a new corrected decorrelated score test and a corresponding score type estimator are proposed. It is shown that the limiting distribution of the new corrected decorrelated score test statistic has a standard normal distribution under the null hypothesis and retains power under the local alternatives around zero. Asymptotic normality of the new proposed estimator is also proved, so that asymptotic confidence intervals can be constructed. The finite-sample performance of the proposed inference procedure is examined through simulation studies. An application to real data collected in a clinical trial designed to determine the long-term effects of different inhaled treatments for mild to moderate childhood asthma, where phenotypic information and genome-wide SNP data are accessible, is presented.Significance-based community detection in weighted networkshttps://www.zbmath.org/1472.621232021-11-25T18:46:10.358925Z"Palowitch, John"https://www.zbmath.org/authors/?q=ai:palowitch.john"Bhamidi, Shankar"https://www.zbmath.org/authors/?q=ai:bhamidi.shankar"Nobel, Andrew B."https://www.zbmath.org/authors/?q=ai:nobel.andrew-bSummary: Community detection is the process of grouping strongly connected nodes in a network. Many community detection methods for un-weighted networks have a theoretical basis in a null model. Communities discovered by these methods therefore have interpretations in terms of statistical significance. In this paper, we introduce a null for weighted networks called the continuous configuration model. First, we propose a community extraction algorithm for weighted networks which incorporates iterative hypothesis testing under the null. We prove a central limit theorem for edge-weight sums and asymptotic consistency of the algorithm under a weighted stochastic block model. We then incorporate the algorithm in a community detection method called CCME. To benchmark the method, we provide a simulation framework involving the null to plant ``background'' nodes in weighted networks with communities. We show that the empirical performance of CCME on these simulations is competitive with existing methods,
particularly
when overlapping communities and background nodes are present. To further validate the method, we present two real-world networks with potential background nodes and analyze them with CCME, yielding results that reveal macro-features of the corresponding systems.General-order observation-driven models: ergodicity and consistency of the maximum likelihood estimatorhttps://www.zbmath.org/1472.621362021-11-25T18:46:10.358925Z"Sim, Tepmony"https://www.zbmath.org/authors/?q=ai:sim.tepmony"Douc, Randal"https://www.zbmath.org/authors/?q=ai:douc.randal"Roueff, François"https://www.zbmath.org/authors/?q=ai:roueff.francoisSummary: The class of observation-driven models (ODMs) includes many models of non-linear time series which, in a fashion similar to, yet different from, hidden Markov models (HMMs), involve hidden variables. Interestingly, in contrast to most HMMs, ODMs enjoy likelihoods that can be computed exactly with computational complexity of the same order as the number of observations, making maximum likelihood estimation the privileged approach for statistical inference for these models. A celebrated example of general order ODMs is the GARCH \((p,q)\) model, for which ergodicity and inference has been studied extensively. However little is known on more general models, in particular integer-valued ones, such as the log-linear Poisson GARCH or the NBIN-GARCH of order \((p,q)\) about which most of the existing results seem restricted to the case \(p=q=1\). Here we fill this gap and derive ergodicity conditions for general ODMs. The consistency and the asymptotic normality of the maximum likelihood estimator (MLE) can then be derived using the method already developed for first order ODMs.Asymptotic analysis of model selection criteria for general hidden Markov modelshttps://www.zbmath.org/1472.621382021-11-25T18:46:10.358925Z"Yonekura, Shouto"https://www.zbmath.org/authors/?q=ai:yonekura.shouto"Beskos, Alexandros"https://www.zbmath.org/authors/?q=ai:beskos.alexandros"Singh, Sumeetpal S."https://www.zbmath.org/authors/?q=ai:singh.sumeetpal-sSummary: The paper obtains analytical results for the asymptotic properties of Model Selection Criteria -- widely used in practice -- for a general family of hidden Markov models (HMMs), thereby substantially extending the related theory beyond typical `i.i.d.-like' model structures and filling in an important gap in the relevant literature. In particular, we look at the Bayesian and Akaike Information Criteria (BIC and AIC) and the model evidence. In the setting of nested classes of models, we prove that BIC and the evidence are strongly consistent for HMMs (under regularity conditions), whereas AIC is not weakly consistent. Numerical experiments support our theoretical results.Nonrigid registration using Gaussian processes and local likelihood estimationhttps://www.zbmath.org/1472.621412021-11-25T18:46:10.358925Z"Wiens, Ashton"https://www.zbmath.org/authors/?q=ai:wiens.ashton"Kleiber, William"https://www.zbmath.org/authors/?q=ai:kleiber.william"Nychka, Douglas"https://www.zbmath.org/authors/?q=ai:nychka.douglas-w"Barnhart, Katherine R."https://www.zbmath.org/authors/?q=ai:barnhart.katherine-rSummary: Surface registration, the task of aligning several multidimensional point sets, is a necessary task in many scientific fields. In this work, a novel statistical approach is developed to solve the problem of nonrigid registration. While the application of an affine transformation results in rigid registration, using a general nonlinear function to achieve nonrigid registration is necessary when the point sets require deformations that change over space. The use of a local likelihood-based approach using windowed Gaussian processes provides a flexible way to accurately estimate the nonrigid deformation. This strategy also makes registration of massive data sets feasible by splitting the data into many subsets. The estimation results yield spatially-varying local rigid registration parameters. Gaussian process surface models are then fit to the parameter fields, allowing prediction of the transformation parameters at unestimated locations, specifically at observation locations in the unregistered data set. Applying these transformations results in a global, nonrigid registration. A penalty on the transformation parameters is included in the likelihood objective function. Combined with smoothing of the local estimates from the surface models, the nonrigid registration model can prevent the problem of overfitting. The efficacy of the nonrigid registration method is tested in two simulation studies, varying the number of windows and number of points, as well as the type of deformation. The nonrigid method is applied to a pair of massive remote sensing elevation data sets exhibiting complex geological terrain, with improved accuracy and uncertainty quantification in a cross validation study versus two rigid registration methods.Distributions associated with simultaneous multiple hypothesis testinghttps://www.zbmath.org/1472.621562021-11-25T18:46:10.358925Z"Yu, Chang"https://www.zbmath.org/authors/?q=ai:yu.chang"Zelterman, Daniel"https://www.zbmath.org/authors/?q=ai:zelterman.danielSummary: We develop the distribution for the number of hypotheses found to be statistically significant using the rule from \textit{R. J. Simes} [Biometrika 73, 751--754 (1986; Zbl 0613.62067)] for controlling the family-wise error rate (FWER). We find the distribution of the number of statistically significant \(p\)-values under the null hypothesis and show this follows a normal distribution under the alternative. We propose a parametric distribution \(\Psi_I( \cdot )\) to model the marginal distribution of \(p\)-values sampled from a mixture of null uniform and non-uniform distributions under different alternative hypotheses. The \(\Psi_I\) distribution is useful when there are many different alternative hypotheses and these are not individually well understood. We fit \(\Psi_I\) to data from three cancer studies and use it to illustrate the distribution of the number of notable hypotheses observed in these examples. We model dependence in sampled \(p\)-values using a latent variable. These methods can be combined to illustrate a power analysis in planning a larger study on the basis of a smaller pilot experiment.Verifying compliance with ballast water standards: a decision-theoretic approachhttps://www.zbmath.org/1472.621572021-11-25T18:46:10.358925Z"Costa, Eliardo G."https://www.zbmath.org/authors/?q=ai:costa.eliardo-g"Paulino, Carlos Daniel"https://www.zbmath.org/authors/?q=ai:paulino.carlos-daniel-mimoso"Singer, Julio M."https://www.zbmath.org/authors/?q=ai:da-motta-singer.julioSummary: We construct credible intervals to estimate the mean organism (zooplankton and phytoplankton) concentration in ballast water via a decision-theoretic approach. To obtain the required optimal sample size, we use a total cost minimization criterion defined as the sum of the sampling cost and the Bayes risk either under a Poisson or a negative binomial model for organism counts, both with a gamma prior distribution. Such credible intervals may be employed to verify whether the ballast water discharged from a ship is in compliance with international standards. We also conduct a simulation study to evaluate the credible interval lengths associated with the proposed optimal sample sizes.Areawise significance tests for windowed recurrence network analysishttps://www.zbmath.org/1472.621582021-11-25T18:46:10.358925Z"Lekscha, Jaqueline"https://www.zbmath.org/authors/?q=ai:lekscha.jaqueline"Donner, Reik V."https://www.zbmath.org/authors/?q=ai:donner.reik-vSummary: Many time-series analysis techniques use sliding window approaches or are repeatedly applied over a continuous range of parameters. When combined with a significance test, intrinsic correlations among the pointwise analysis results can make falsely positive significant points appear as continuous patches rather than as isolated points. To account for this effect, we present an areawise significance test that identifies such false-positive patches. For this purpose, we numerically estimate the decorrelation length of the statistic of interest by calculating correlation functions between the analysis results and require an areawise significant point to belong to a patch of pointwise significant points that is larger than this decorrelation length. We apply our areawise test to results from windowed traditional and scale-specific recurrence network analysis in order to identify dynamical anomalies in time series of a non-stationary Rössler system and tree ring width index values from Eastern Canada. Especially, in the palaeoclimate context, the areawise testing approach markedly reduces the number of points that are identified as significant and therefore highlights only the most relevant features in the data. This provides a crucial step towards further establishing recurrence networks as a tool for palaeoclimate data analysis.Gas source parameters estimation and localization with Gaussian mixture filtering method in sensor networkshttps://www.zbmath.org/1472.621712021-11-25T18:46:10.358925Z"Zhang, Yong"https://www.zbmath.org/authors/?q=ai:zhang.yong|zhang.yong.4|zhang.yong.13|zhang.yong.9|zhang.yong.8|zhang.yong.14|zhang.yong.5|zhang.yong.12|zhang.yong.1|zhang.yong.7|zhang.yong.10|zhang.yong.11|zhang.yong.2"Zhang, Liyi"https://www.zbmath.org/authors/?q=ai:zhang.liyi"Han, Jianfeng"https://www.zbmath.org/authors/?q=ai:han.jianfeng"Geng, Yanxiang"https://www.zbmath.org/authors/?q=ai:geng.yanxiang"Li, Jinzhao"https://www.zbmath.org/authors/?q=ai:li.jinzhaoSummary: As for the highly non-linear diffusion in the gas leakage accident environment, a gas leakage source parameter estimation and localization algorithm was implemented based on Gauss mixture models in this paper. Firstly, the space model of the gas leakage diffusion was given with the state vector augmentation and the measurements observed by the sensor node. Secondly, an improved EM-PM algorithm was proposed for the estimation of the unknown gas source parameters and the state prediction of gas leakage diffusion simultaneously. Then, the sensor node selection utility function was designed for the real-time scheduling and motion control of sensor nodes, which was based on the conditional information entropy of the posterior probability distribution and realized by the gradient operation of the utility function. Finally, it could be confirmed that the proposed algorithm could effectively fulfill the gas leakage sources parameters estimation and localization with the simulation, and it could get higher estimation accuracy within less time compared to traditional EKF and PF methods.Bayesian classification for dating archaeological sites via projectile pointshttps://www.zbmath.org/1472.621742021-11-25T18:46:10.358925Z"Armero, Carmen"https://www.zbmath.org/authors/?q=ai:armero.carmen"García-Donato, Gonzalo"https://www.zbmath.org/authors/?q=ai:garcia-donato.gonzalo"Jimenez-Puerto, Joaquín"https://www.zbmath.org/authors/?q=ai:jimenez-puerto.joaquin"Pardo-Gordó, Salvador"https://www.zbmath.org/authors/?q=ai:pardo-gordo.salvador"Bernabeu, Joan"https://www.zbmath.org/authors/?q=ai:bernabeu.joanSummary: Dating is a key element for archaeologists. We propose a Bayesian approach to provide chronology to sites that have neither radiocarbon dating nor clear stratigraphy and whose only information comes from lithic arrowheads. This classifier is based on the Dirichlet-multinomial inferential process and posterior predictive distributions. The procedure is applied to predict the period of a set of undated sites located in the east of the Iberian Peninsula during the 4th and 3rd millennium cal BC.Balanced data assimilation for highly oscillatory mechanical systemshttps://www.zbmath.org/1472.650072021-11-25T18:46:10.358925Z"Hastermann, Gottfried"https://www.zbmath.org/authors/?q=ai:hastermann.gottfried"Reinhardt, Maria"https://www.zbmath.org/authors/?q=ai:reinhardt.maria"Klein, Rupert"https://www.zbmath.org/authors/?q=ai:klein.rupert"Reich, Sebastian"https://www.zbmath.org/authors/?q=ai:reich.sebastianSummary: Data assimilation algorithms are used to estimate the states of a dynamical system using partial and noisy observations. The ensemble Kalman filter has become a popular data assimilation scheme due to its simplicity and robustness for a wide range of application areas. Nevertheless, this filter also has limitations due to its inherent assumptions of Gaussianity and linearity, which can manifest themselves in the form of dynamically inconsistent state estimates. This issue is investigated here for balanced, slowly evolving solutions to highly oscillatory Hamiltonian systems which are prototypical for applications in numerical weather prediction. It is demonstrated that the standard ensemble Kalman filter can lead to state estimates that do not satisfy the pertinent balance relations and ultimately lead to filter divergence. Two remedies are proposed, one in terms of blended asymptotically consistent time-stepping schemes, and one in terms of minimization-based postprocessing methods. The effects of these modifications to the standard ensemble Kalman filter are discussed and demonstrated numerically for balanced motions of two prototypical Hamiltonian reference systems.Affine invariant interacting Langevin dynamics for Bayesian inferencehttps://www.zbmath.org/1472.651182021-11-25T18:46:10.358925Z"Garbuno-Inigo, Alfredo"https://www.zbmath.org/authors/?q=ai:garbuno-inigo.alfredo"Nüsken, Nikolas"https://www.zbmath.org/authors/?q=ai:nusken.nikolas"Reich, Sebastian"https://www.zbmath.org/authors/?q=ai:reich.sebastianThis paper proposes a computational method for generating samples from a given high-dimensional target distribution of the form \[ \pi(u) = \frac1Z \exp(-\Phi(u)) \] where \(\Phi\) is a suitable potential and \(Z\) is a normalization constant; this is a fundamental task in, e.g., Bayesian inverse problems. As an alternative to the widely known (Markov chain) Monte Carlo methods, the proposed method is based on Langevin dynamics under which the target distribution is invariant. Specifically, they propose a stochastic process of \(N\) interacting particles given by \[ d u^{(i)}_t = -C(U_t) \nabla_{u^{(i)}}\Phi(u^{(i)}_t)dt + \frac{D+1}{N}(u^{(i)}_t - m(U_t))dt + \sqrt2 C^{1/2}(U_t)dW_t^{(i)}, \] where \(u^{(i)}_t\in\mathbb{R}^D\) denotes position of the \(i\)th particle at time \(t\) (which are all collected into the vector \(U_t\)), \(C(U_t)\) is the empirical covariance matrix, \(m(U_t)\) is the empirical mean, and \(C^{1/2}(U_t)\) is a generalized square root than can be directly computed using the deviations of the particles from the empirical mean. The second term here is a correction term that guarantees for \(N>D+1\) (under suitable assumptions on the potential and the initial ensemble) that these dynamics are invariant under affine transformations, which prevents inefficient sampling if the empirical covariance matrix is a poor approximation of the target covariance measure in Bayesian inverse problems with Gaussian posteriors.
They also provide a gradient-free variant that replaces \(\nabla_{u^{(i)}}\Phi(u^{(i)}_t)\) by an approximation that is exact for Bayesian inverse problems with affine forward operators and is related to classical Ensemble Kalman-Bucy Filter as well as to the more recent Ensemble Kalman inversion.
The performance of this method is illustrated for the typical model problem of Darcy flow inversion.A block successive lower-bound maximization algorithm for the maximum pseudo-likelihood estimation of fully visible Boltzmann machineshttps://www.zbmath.org/1472.681512021-11-25T18:46:10.358925Z"Nguyen, Hien D."https://www.zbmath.org/authors/?q=ai:nguyen.hien-d-t"Wood, Ian A."https://www.zbmath.org/authors/?q=ai:wood.ian-aSummary: Maximum pseudo-likelihood estimation (MPLE) is an attractive method for training fully visible Boltzmann machines (FVBMs) due to its computational scalability and the desirable statistical properties of the MPLE. No published algorithms for MPLE have been proven to be convergent or monotonic. In this note, we present an algorithm for the MPLE of FVBMs based on the block successive lower-bound maximization (BSLM) principle. We show that the BSLM algorithm monotonically increases the pseudo-likelihood values and that the sequence of BSLM estimates converges to the unique global maximizer of the pseudo-likelihood function. The relationship between the BSLM algorithm and the gradient ascent (GA) algorithm for MPLE of FVBMs is also discussed, and a convergence criterion for the GA algorithm is given.Feature ranking for multi-target regressionhttps://www.zbmath.org/1472.681542021-11-25T18:46:10.358925Z"Petković, Matej"https://www.zbmath.org/authors/?q=ai:petkovic.matej"Kocev, Dragi"https://www.zbmath.org/authors/?q=ai:kocev.dragi"Džeroski, Sašo"https://www.zbmath.org/authors/?q=ai:dzeroski.sasoThis paper considers multi-task regression (MTR) where the goal is to learn a model that predicts several target variables simultaneously. In particular the authors address the task of feature ranking to score the importance of descriptive attributes. While there is several work on feature ranking in single-task regression, this paper presents one of the first feature ranking methods for MTR. It introduces two methods for feature ranking: one based on an ensemble of predictive clustering trees and one as an extension of RReliefF. Extensive experimental results are reported to justify the effectiveness of the proposed methods.A novel parameter estimation method for Boltzmann machineshttps://www.zbmath.org/1472.681602021-11-25T18:46:10.358925Z"Takenouchi, Takashi"https://www.zbmath.org/authors/?q=ai:takenouchi.takashiSummary: We propose a novel estimator for a specific class of probabilistic models on discrete spaces such as the Boltzmann machine. The proposed estimator is derived from minimization of a convex risk function and can be constructed without calculating the normalization constant, whose computational cost is exponential order. We investigate statistical properties of the proposed estimator such as consistency and asymptotic normality in the framework of the estimating function. Small experiments show that the proposed estimator can attain comparable performance to the maximum likelihood expectation at a much lower computational cost and is applicable to high-dimensional data.Linearized Bayesian inference for Young's modulus parameter field in an elastic model of slender structureshttps://www.zbmath.org/1472.740182021-11-25T18:46:10.358925Z"Fatehiboroujeni, Soheil"https://www.zbmath.org/authors/?q=ai:fatehiboroujeni.soheil"Petra, Noemi"https://www.zbmath.org/authors/?q=ai:petra.noemi"Goyal, Sachin"https://www.zbmath.org/authors/?q=ai:goyal.sachinSummary: The deformations of several slender structures at nano-scale are conceivably sensitive to their non-homogenous elasticity. Owing to their small scale, it is not feasible to discern their elasticity parameter fields accurately using observations from physical experiments. Molecular dynamics simulations can provide an alternative or additional source of data. However, the challenges still lie in developing computationally efficient and robust methods to solve inverse problems to infer the elasticity parameter field from the deformations. In this paper, we formulate an inverse problem governed by a linear elastic model in a Bayesian inference framework. To make the problem tractable, we use a Gaussian approximation of the posterior probability distribution that results from the Bayesian solution of the inverse problem of inferring Young's modulus parameter fields from available data. The performance of the computational framework is demonstrated using two representative loading scenarios, one involving cantilever bending and the other involving stretching of a helical rod (an intrinsically curved structure). The results show that smoothly varying parameter fields can be reconstructed satisfactorily from noisy data. We also quantify the uncertainty in the inferred parameters and discuss the effect of the quality of the data on the reconstructions.Statistical interpolation of spatially varying but sparsely measured 3D geo-data using compressive sensing and variational Bayesian inferencehttps://www.zbmath.org/1472.860352021-11-25T18:46:10.358925Z"Zhao, Tengyuan"https://www.zbmath.org/authors/?q=ai:zhao.tengyuan"Wang, Yu"https://www.zbmath.org/authors/?q=ai:wang.yu.9|wang.yu.5|wang.yu|wang.yu.8|wang.yu.1|wang.yu.2|wang.yu.4|wang.yu.3Summary: Real geo-data are three-dimensional (3D) and spatially varied, but measurements are often sparse due to time, resource, and/or technical constraints. In these cases, the quantities of interest at locations where measurements are missing must be interpolated from the available data. Several powerful methods have been developed to address this problem in real-world applications over the past several decades, such as two-point geo-statistical methods (e.g., kriging or Gaussian process regression, GPR) and multiple-point statistics (MPS). However, spatial interpolation remains challenging when the number of measurements is small because a suitable covariance function is difficult to select and the parameters are challenging to estimate from a small number of measurements. Note that a covariance function form and its parameters are key inputs for some methods (e.g., kriging or GPR). MPS is a non-parametric simulation method that combines training images as prior knowledge for sparse measurements. However, the selection of a suitable training image for continuous geo-quantities (e.g., soil or rock properties) faces certain difficulties and may become increasingly complicated when the geo-data to be interpolated are high-dimensional (e.g., 3D) and exhibit non-stationary (e.g., with unknown trends or non-stationary covariance structure) and/or anisotropic characteristics. This paper proposes a non-parametric approach that systematically combines compressive sensing and variational Bayesian inference for statistical interpolation of 3D geo-data. The method uses sparse measurements and their locations as the input and provides interpolated values at unsampled locations with quantified interpolation uncertainty as the output. The proposed method is illustrated using a series of numerical 3D examples, and the results indicate a reasonably good performance.Sample size for estimating organism concentration in ballast water: a Bayesian approachhttps://www.zbmath.org/1472.922502021-11-25T18:46:10.358925Z"Costa, Eliardo G."https://www.zbmath.org/authors/?q=ai:costa.eliardo-g"Paulino, Carlos Daniel"https://www.zbmath.org/authors/?q=ai:paulino.carlos-daniel-mimoso"Singer, Julio M."https://www.zbmath.org/authors/?q=ai:da-motta-singer.julioSummary: Estimation of microorganism concentration in ballast water tanks is important to evaluate and possibly to prevent the introduction of invasive species in stable ecosystems. For such purpose, the number of organisms in ballast water aliquots must be counted and used to estimate their concentration with some precision requirement. Poisson and negative binomial models have been employed to describe the organism distribution in the tank, but determination of sample sizes required to generate estimates with pre-specified precision is still not well established. A Bayesian approach is a flexible alternative to accommodate adequate models that account for the heterogeneous distribution of the organisms and may provide a sequential way of enhancing the estimation procedure by updating the prior distribution along the ballast water discharging process. We adopt such an approach to compute sample sizes required to construct credible intervals obtained via two optimality criteria that have not been employed in this context. Such intervals may be used in the decision with respect to compliance with the D-2 standard of the Ballast Water Management Convention. We also conduct a simulation study to verify whether the credible intervals obtained with the proposed sample sizes satisfy the precision criteria.