×

Estimating population size using the network scale up method. (English) Zbl 1454.62048

Summary: We develop methods for estimating the size of hard-to-reach populations from data collected using network-based questions on standard surveys. Such data arise by asking respondents how many people they know in a specific group (e.g., people named Michael, intravenous drug users). The Network Scale up Method (NSUM) is a tool for producing population size estimates using these indirect measures of respondents’ networks. P. D. Killworth et al. [“A social network approach to estimating seroprevalence in the United States”, Soc. Netw. 20, No. 1, 23–50 (1998; doi:10.1016/S0378-8733(96)00305-X); “Estimation of seroprevalence, rape, and homelessness in the United States using a social network approach”, Eval. Rev. 22, 289–308 (1998; doi:10.1177/0193841X9802200205)] proposed maximum likelihood estimators of population size for a fixed effects model in which respondents’ degrees or personal network sizes are treated as fixed. We extend this by treating personal network sizes as random effects, yielding principled statements of uncertainty. This allows us to generalize the model to account for variation in people’s propensity to know people in particular subgroups (barrier effects), such as their tendency to know people like themselves, as well as their lack of awareness of or reluctance to acknowledge their contacts’ group memberships (transmission bias). NSUM estimates also suffer from recall bias, in which respondents tend to underestimate the number of members of larger groups that they know, and conversely for smaller groups. We propose a data-driven adjustment method to deal with this. Our methods perform well in simulation studies, generating improved estimates and calibrated uncertainty intervals, as well as in back estimates of real sample data. We apply them to data from a study of HIV/AIDS prevalence in Curitiba, Brazil. Our results show that when transmission bias is present, external information about its likely extent can greatly improve the estimates. The methods are implemented in the NSUM R package.

MSC:

62D05 Sampling theory, sample surveys
62P25 Applications of statistics to social sciences
91D30 Social networks; opinion dynamics

Software:

NSUM; Gibbsit; R
PDFBibTeX XMLCite
Full Text: DOI arXiv Euclid

References:

[1] Bernard, R. H., Johnsen, E., Killworth, P. and Robinson, S. (1989). Estimating the size of an average personal network and of an event subpopulation. In The Small World (M. Kochen, ed.) 159-175. Ablex Press, New Jersey.
[2] Bernard, R. H., Johnsen, E., Killworth, P. and Robinson, S. (1991). Estimating the size of an average personal network and of an event subpopulation: Some empirical results. Soc. Sci. Res. 20 109-121.
[3] De Valpine, P. (2003). Better inferences from population-dynamics experiments using Monte Carlo state-space likelihood methods. Ecology 84 3064-3077.
[4] Diggle, P. J., Heagerty, P. J., Liang, K.-Y. and Zeger, S. L. (2002). Analysis of Longitudinal Data , 2nd ed. Oxford Statistical Science Series 25 . Oxford Univ. Press, Oxford. · Zbl 1031.62002
[5] Ezoe, S., Morooka, T., Noda, T., Sabin, M. L. and Koike, S. (2012). Population size estimation of men who have sex with men through the network scale-up method in Japan. PLoS ONE 7 e31184.
[6] Gelman, A. and Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statist. Sci. 28 457-472. · Zbl 1386.65060
[7] Jeffreys, H. (1961). Theory of Probability , 3rd ed. Clarendon Press, Oxford. · Zbl 0116.34904
[8] Kadushin, C., Killworth, P., Bernard, H. and Beveridge, A. (2006). Scale-up methods as applied to estimates of heroin use. J. Drug Issues 36 417.
[9] Killworth, P., Johnsen, E., McCarty, C., Shelley, G. and Bernard, H. (1998a). A social network approach to estimating seroprevalence in the United States. Soc. Netw. 20 23-50.
[10] Killworth, P., McCarty, C., Bernard, H., Shelley, G. and Johnsen, E. (1998b). Estimation of seroprevalence, rape, and homelessness in the United States using a social network approach. Evaluation Review 22 289-308.
[11] Killworth, P. D., McCarty, C., Bernard, H. R., Johnsen, E. C., Domini, J. and Shelley, G. A. (2003). Two interpretations of reports of knowledge of subpopulation sizes. Soc. Netw. 25 141-160.
[12] Killworth, P. D., McCarty, C., Johnsen, E. C., Bernard, H. R. and Shelley, G. A. (2006). Investigating the variation of personal network size under unknown error conditions. Sociol. Methods Res. 35 84-112.
[13] McCarty, C., Killworth, P. D., Bernard, H. R., Johnsen, E. C. and Shelley, G. A. (2001). Comparing two methods for estimating network size. Human Organ. 60 28-39.
[14] McCormick, T. H., Salganik, M. J. and Zheng, T. (2010). How many people do you know? Efficiently estimating personal network size. J. Amer. Statist. Assoc. 105 59-70. · Zbl 1397.62051 · doi:10.1198/jasa.2009.ap08518
[15] McCormick, T. H. and Zheng, T. (2007). Adjusting for recall bias in “How many X’s do you know?” surveys. In Proceedings of the Joint Statistical Meetings American Statistical Association, Washington, DC.
[16] McCormick, T. H. and Zheng, T. (2012). Latent demographic profile estimation in hard-to-reach groups. Ann. Appl. Stat. 6 1795-1813. · Zbl 1257.62122 · doi:10.1214/12-AOAS569
[17] Mielke, P. Jr. (1975). Convenient beta distribution likelihood techniques for describing and comparing meteorological data. J. Appl. Meteorol. 14 985-990.
[18] Paniotto, V., Petrenko, T., Kupriyanov, V. and Pakhok, O. (2009). Estimating the size of populations with high risk for HIV using the network scale-up method. Analytical report, Kiev International Institute of Sociology.
[19] Raftery, A. E. (1988). Inference and prediction for the binomial N parameter: A hierarchical Bayes approach. Biometrika 75 223-228. · Zbl 0638.62034 · doi:10.1093/biomet/75.2.223
[20] Raftery, A. E. and Lewis, S. M. (1996). Implementing MCMC. In Markov Chain Monte Carlo in Practice (W. R. Gilks, S. Richardson and D. J. Spiegelhalter, eds.) 115-130. Chapman & Hall, London. · Zbl 0844.62101
[21] Ripley, B. D. and Thompson, M. (1987). Regression techniques for the detection of analytical bias. Analyst 112 377-383.
[22] Salganik, M., Fazito, D., Bertoni, N., Abdo, A., Mello, M. and Bastos, F. (2011a). Assessing network scale-up estimates for groups most at risk of HIV/AIDS: Evidence from a multiple-method study of heavy drug users in Curitiba, Brazil. Am. J. Epidemiol. 174 1190-1196.
[23] Salganik, M. J., Mello, M. B., Abdo, A. H., Bertoni, N., Fazito, D. and Bastos, F. I. (2011b). The game of contacts: Estimating the social visibility of groups. Soc. Netw. 33 70-78.
[24] Skellam, J. G. (1948). A probability distribution derived from the binomial distribution by regarding the probability of success as variable between the sets of trials. J. Roy. Statist. Soc. Ser. B 10 257-261. · Zbl 0032.41903
[25] Zheng, T., Salganik, M. J. and Gelman, A. (2006). How many people do you know in prison?: Using overdispersion in count data to estimate social structure in networks. J. Amer. Statist. Assoc. 101 409-423. · Zbl 1119.62388 · doi:10.1198/016214505000001168
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.