×

Sampling on bipartite networks: a comparative analysis of eight crawling methods. (English) Zbl 1457.90024

Summary: Sampling networks via crawling has become a feasible and widely used approach when the global network information is difficult to obtain. But there is little focus on two-mode networks, i.e. bipartite networks in which nodes can be divided into two disjoint partitions. In this paper, we adopt eight popular crawling methods (BFS, DFS, FFS, RW, SNS, MHRW, MDRW and RDS) from studies of one-mode networks and evaluate their applicability and performance on bipartite networks. Simulation results show that Metropolis-Hastings random walk (MHRW), maximum-degree random walk (MDRW) and respondent-driven sampling (RDS) perform better than the other methods, and population estimates from them are minimally affected by the structures of degree distribution, number of nodes in two node layers, degree correlation and communities. In addition, we find that strategies used in the sampling design – selection approaches for seed nodes, the number of seed nodes, and the number of branches – have very little influence on the estimation bias. Finally, we list suggestions for the selection of crawling methods on bipartite networks under different situations.

MSC:

90B10 Deterministic network models in operations research
91D30 Social networks; opinion dynamics
65C05 Monte Carlo methods
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Chiang, W. C.; Lin, H. H.; Huang, C. S.; Lo, L. J.; Wan, S. Y., Proc. Natl Acad. Sci. USA, 102, 4221, (2005) · doi:10.1073/pnas.0501179102
[2] Leskovec, J.; Faloutsos, C., Sampling from large graphs, 631-636, (2006)
[3] Rejaie, R.; Torkjazi, M.; Valafar, M.; Willinger, W., Netw. IEEE, 24, 32-37, (2010) · doi:10.1109/MNET.2010.5578916
[4] Sang, H. L.; Kim, P. J.; Jeong, H., Phys. Rev. E, 73, (2006) · doi:10.1103/PhysRevE.73.016102
[5] Krishnamurthy, V.; Faloutsos, M.; Chrobak, M.; Lao, L.; Cui, J. H.; Percus, A. G., Reducing large internet topologies for faster simulations, 328-341, (2005)
[6] Salehi, M.; Rabiee, H. R.; Rajabi, A., Chaos, 22, 2202-2229, (2012) · Zbl 1332.91096 · doi:10.1063/1.4712602
[7] Lu, X., Respondent-Driven Sampling Theory, Limitations & Improvements, (2013)
[8] Ribeiro, B.; Wang, P.; Murail, F.; Towsleyl, D., Proc. IEEE INFOCOM, vol 131, 1692-1700, (2012)
[9] Ribeiro, B.; Gauvin, W.; Liu, B.; Towsley, D., On MySpace account spans and double pareto-like distribution of friends, 1-6, (2010)
[10] Maiya, A. S.; Berger-Wolf, T. Y., Benefits of bias:towards better characterization of network sampling, 105-113, (2011), San Diego, CA: ACM, San Diego, CA
[11] Gjoka, M.; Kurant, M.; Butts, C. T.; Markopoulou, A., 1-9, (2010), San Diego, CA
[12] Cormen, T. H.; Leiserson, C. E.; Rivest, R. L.; Stein, C., 1297-1305, (2012)
[13] Lovasz, L., Lecture Notes Math., 8, 285-303, (1993)
[14] Biernacki, P.; Dan, W., Sociol. Methods Res., 10, 141-163, (1981) · doi:10.1177/004912418101000205
[15] Heckathorn, D. D., Soc. Problems, 44, 174-199, (1997) · doi:10.2307/3096941
[16] Hastings, W. K., Biometrika, 57, 97-109, (1970) · Zbl 0219.65008 · doi:10.1093/biomet/57.1.97
[17] Bar-Yossef, Z.; Berg, A.; Chien, S.; Fakcharoenphol, J.; Weitz, D., Approximating aggregate queries about web pages via random walks, 535-544, (2000)
[18] Kurant, M.; Markopoulou, A.; Thiran, P., IEEE J. Sel. Areas Commun., 29, 1799-1809, (2011) · doi:10.1109/JSAC.2011.111005
[19] Avrachenkov, K.; Borkar, V. S.; Kadavankandy, A.; Sreedharan, J. K., Comparison of Random Walk Based Techniques for Estimating Network Averages, (2016), Berlin: Springer, Berlin
[20] Ribeiro, B.; Towsley, D., Estimating and sampling graphs with multidimensional random walks, 390-403, (2010)
[21] Lu, X., Soc. Netw., 35, 669-685, (2013) · doi:10.1016/j.socnet.2013.10.001
[22] Rezvanian, A.; Rahmati, M.; Meybodi, M. R., Physica A, 396, 224-234, (2014) · doi:10.1016/j.physa.2013.11.015
[23] Rezvanian, A.; Meybodi, M. R., Physica A, 424, 254-268, (2015) · Zbl 1400.91488 · doi:10.1016/j.physa.2015.01.030
[24] Jalali, Z. S.; Rezvanian, A.; Meybodi, M. R., Int. J. Mod. Phys. C, 27, (2016) · doi:10.1142/S0129183116500522
[25] Salehi, M.; Rabiee, H. R.; Nabavi, N.; Pooya, S., Characterizing Twitter with respondent-driven sampling, 1211-1217, (2012)
[26] Stutzbach, D.; Rejaie, R.; Duffield, N.; Sen, S.; Willinger, W., IEEE/ACM Trans. Netw., 17, 377-390, (2009) · doi:10.1109/TNET.2008.2001730
[27] Stutzbach, D.; Rejaie, R.; Duffield, N.; Sen, S., Sampling techniques for large, dynamic graphs, IEEE INFOCOM IEEE International Conference on Computer Communications, 1-6, (2006)
[28] Bell, D. C.; Erbaugh, E. B.; Serrano, T.; Daytonshotts, C. A.; Montoya, I. D., Soc. Sci. Res., 62, 350-361, (2017) · doi:10.1016/j.ssresearch.2016.08.016
[29] Salganik, M. J.; Heckathorn, D. D., Sociol. Methodol., 34, 193-239, (2017) · doi:10.1111/j.0081-1750.2004.00152.x
[30] Chen, S.; Lu, X., Sci. Rep., 7, 3268, (2017) · doi:10.1038/s41598-017-03379-4
[31] Guimera, R.; Uzzi, B.; Spiro, J.; Amaral, L. A N., Science, 308, 697-702, (2005) · doi:10.1126/science.1106340
[32] Schilling, M. A.; Phelps, C. C., Manage. Sci., 53, 1113-1126, (2007) · Zbl 1232.91395 · doi:10.1287/mnsc.1060.0624
[33] Barabâsi, A. L.; Jeong, H.; Néda, Z.; Ravasz, E.; Schubert, A.; Vicsek, T., Physica A, 311, 590-614, (2002) · Zbl 0996.91086 · doi:10.1016/S0378-4371(02)00736-7
[34] Börner, K.; Maru, J. T.; Goldstone, R. L., Proc. Natl Acad. Sci., 101, 5266-5273, (2004)
[35] Uetz, P., Nature, 403, 623-627, (2000) · doi:10.1038/35001009
[36] Li, S., Science, 303, 540-543, (2004) · doi:10.1126/science.1091403
[37] Medo, M.; Mariani, M. S.; Zeng, A.; Zhang, Y. C., Sci. Rep., 6, (2016) · doi:10.1038/srep34218
[38] Ren, Z. M.; Kong, Y.; Shang, M. S.; Zhang, Y. C., Phys. Lett. A, 380, 2608-2614, (2016) · doi:10.1016/j.physleta.2016.06.009
[39] Zhang, P.; Wang, D.; Xiao, J., Physica A, 471, 147-153, (2017) · Zbl 1400.68216 · doi:10.1016/j.physa.2016.11.076
[40] Hu, X.; Mai, Z.; Zhang, H.; Xue, Y.; Zhou, W.; Chen, X., A hybrid recommendation model based on weighted bipartite graph and collaborative filtering, 119-122, (2016), Piscataway, NJ: IEEE, Piscataway, NJ
[41] Zhou, T., Internet Res., 21, 67-81, (2012) · doi:10.1108/10662241111104884
[42] Wu, L.; Zhang, J.; Zhao, M., PLoS One, 9, (2014)
[43] Estrada, E.; Rodríguez-Velázquez, J. A., Phys. Rev. E, 72, (2005) · doi:10.1103/PhysRevE.72.046105
[44] Peltomäki, M.; Alava, M., J. Stat. Mech., (2006)
[45] Zhou, T.; Ren, J.; Medo, M.; Zhang, Y. C., Phys. Rev. E, 76, (2007) · doi:10.1103/PhysRevE.76.046115
[46] Qiao, J.; Meng, Y. Y.; Chen, H.; Huang, H. Q.; Li, G. Y., Physica A, 457, 270-279, (2016) · Zbl 1400.91486 · doi:10.1016/j.physa.2016.03.106
[47] Ohkubo, J.; Tanaka, K.; Horiguchi, T., Phys. Rev. E, 72, (2005) · doi:10.1103/PhysRevE.72.036120
[48] Saavedra, S.; Reed-Tsochas, F.; Uzzi, B., Nature, 457, 463-466, (2009) · doi:10.1038/nature07532
[49] Zhang, P.; Wang, J.; Li, X.; Li, M.; Di, Z.; Fan, Y., Physica A, 387, 6869-6875, (2008) · doi:10.1016/j.physa.2008.09.006
[50] Kheirkhahzadeh, M.; Lancichinetti, A.; Rosvall, M., Phys. Rev. E, 93, (2016) · doi:10.1103/PhysRevE.93.032309
[51] Miklós, I.; Erdös, P. L.; Soukup, L., Electron. J. Comb., 20, 229-240, (2010)
[52] Cormen, T. H.; Leiserson, C. E.; Rivest, R. L.; Stein, C., Introduction to Algorithms, (2009), Cambridge, MA: The MIT Press, Cambridge, MA · Zbl 1187.68679
[53] Goel, S.; Salganik, M. J., Proc. Natl Acad. Sci., 107, 6743-6747, (2010) · doi:10.1073/pnas.1000261107
[54] Lu, X.; Bengtsson, L.; Britton, T.; Camitz, M.; Kim, B. J.; Thorson, A.; Liljeros, F., J. R. Stat. Soc., 175, 191-216, (2012) · doi:10.1111/j.1467-985X.2011.00711.x
[55] Molloy, M.; Reed, B., Comb. Probab. Comput., 7, 295-305, (1998) · Zbl 0916.05064 · doi:10.1017/S0963548398003526
[56] Newman, M. E.; Watts, D. J.; Strogatz, S. H., Proc. Natl Acad. Sci., 99, 2566-2572, (2002) · Zbl 1114.91362 · doi:10.1073/pnas.012582999
[57] Newman, M. E., Proc. Natl Acad. Sci. USA, 98, 404-409, (2000) · Zbl 1065.00518 · doi:10.1073/pnas.98.2.404
[58] Latapy, M.; Magnien, C.; Vecchio, N. D., Soc. Netw., 30, 31-48, (2008) · doi:10.1016/j.socnet.2007.04.006
[59] Albert, R.; Barabási, A. L., Rev. Mod. Phys., 74, 47, (2002) · Zbl 1205.82086 · doi:10.1103/RevModPhys.74.47
[60] Erdős, P.; Rényi, A., Publ. Math., 6, 290-297, (1959) · Zbl 0092.15705
[61] Bonsón, E.; Bednárová, M., Online Inf. Rev., 37, 969-984, (2013) · doi:10.1108/OIR-09-2012-0159
[62] Golbeck, J.; Hendler, J., Filmtrust: movie recommendations using trust in web-based social networks, 282-286, (2006)
[63] Newman, M. E., Phys. Rev. Lett., 89, (2002) · doi:10.1103/PhysRevLett.89.208701
[64] Ramasco, J. J.; Dorogovtsev, S. N.; Pastorsatorras, R., Phys. Rev. E, 70, (2004) · doi:10.1103/PhysRevE.70.036106
[65] Girvan, M.; Newman, M. E., Proc. Natl Acad. Sci., 99, 7821-7826, (2002) · Zbl 1032.91716 · doi:10.1073/pnas.122653799
[66] Lancichinetti, A.; Fortunato, S.; Radicchi, F., Phys. Rev. E, 78, (2008) · doi:10.1103/PhysRevE.78.046110
[67] Illenberger, J.; Flötteröd, G., Soc. Netw., 34, 701-711, (2012) · doi:10.1016/j.socnet.2012.09.001
[68] González-Bailón, S.; Wang, N.; Rivero, A.; Borge-Holthoefer, J.; Moreno, Y., Soc. Netw., 38, 16-27, (2014) · doi:10.1016/j.socnet.2014.01.004
[69] Li, R. H.; Yu, J. X.; Qin, L.; Mao, R.; Jin, T., On random walk based graph sampling, 927-938, (2015), Piscataway, NJ: IEEE, Piscataway, NJ
[70] Lee, C. H.; Xu, X.; Eun, D. Y., Beyond random walk and Metropolis-Hastings samplers: why you should not backtrack for unbiased graph sampling, ACM SIGMETRICS Performance Evaluation Review, vol 40, 319-330, (2012), New York: ACM, New York · doi:10.1145/2318857.2254795
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.