×

Detecting a botnet in a network. (English) Zbl 1493.62036

Summary: We formalize the problem of detecting the presence of a botnet in a network as a hypothesis testing problem where we observe a single instance of a graph. The null hypothesis, corresponding to the absence of a botnet, is modeled as a random geometric graph where every vertex is assigned a location on a \(d\)-dimensional torus and two vertices are connected when their distance is smaller than a certain threshold. The alternative hypothesis is similar, except that there is a small number of vertices, called the botnet, that ignore this geometric structure and simply connect randomly to every other vertex with a prescribed probability.
We present two tests that are able to detect the presence of such a botnet. The first test is based on the idea that botnet vertices tend to form large isolated stars that are not present under the null hypothesis. The second test uses the average graph distance, which becomes significantly shorter under the alternative hypothesis. We show that both these tests are asymptotically optimal. However, numerical simulations show that the isolated star test performs significantly better than the average distance test on networks of moderate size. Finally, we construct a robust scheme based on the isolated star test that is also able to identify the vertices in the botnet.

MSC:

62C20 Minimax procedures in statistical decision theory
05C12 Distance in graphs
05C69 Vertex subsets with special properties (dominating sets, independent sets, cliques, etc.)
05C80 Random graphs (graph-theoretic aspects)
60C05 Combinatorial probability
60D05 Geometric probability and stochastic geometry
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] E. Arias-Castro and N. Verzelen, Community detection in dense random networks. Ann. Statist. 42 (2014), no. 3, 940-969 Zbl 1305.62035 MR 3210992 · Zbl 1305.62035
[2] M. Barthélemy, Spatial networks. Phys. Rep. 499 (2011), no. 1-3, 1-101 MR 2770962
[3] S. Bhamidi, J. M. Steele, and T. Zaman, Twitter event networks and the superstar model. Ann. Appl. Probab. 25 (2015), no. 5, 2462-2502 Zbl 1334.60204 MR 3375881 · Zbl 1334.60204
[4] K. Bogerd, R. M. Castro, R. van der Hofstad, and N. Verzelen, Detecting a planted community in an inhomogeneous random graph. Bernoulli 27 (2021), no. 2, 1159-1188 Zbl 1468.91113 MR 4255230 · Zbl 1468.91113
[5] M. Boguñá, F. Papadopoulos, and D. Krioukov, Sustaining the internet with hyperbolic mapping. Nature Communications 1 (2010), Paper No. 62, 8pp
[6] R. Boppana and M. M. Halldórsson, Approximating maximum independent sets by excluding subgraphs. BIT 32 (1992), no. 2, 180-196 Zbl 0761.68044 MR 1172185 · Zbl 0761.68044
[7] M. Bradonjić, R. Elsässer, T. Friedrich, T. Sauerwald, and A. Stauffer, Efficient broadcast on random geometric graphs. In Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1412-1421, SIAM, Philadelphia, PA, 2010 Zbl 1288.05246 MR 2809754
[8] G. Bresler and D. Nagaraj, Optimal single sample tests for structured versus unstructured network data. In Proceedings of the 31st Conference on Learning Theory, pp. 1657-1690, Proceedings of Machine Learning Research, PMLR, 2018.
[9] K. Bringmann, R. Keusch, and J. Lengler, Geometric inhomogeneous random graphs. Theoret. Comput. Sci. 760 (2019), 35-54 Zbl 1414.05264 MR 3913223 · Zbl 1414.05264
[10] S. Bubeck, J. Ding, R. Eldan, and M. Z. Rácz, Testing for high-dimensional geometry in random graphs. Random Structures Algorithms 49 (2016), no. 3, 503-532 Zbl 1349.05315 MR 3545825 · Zbl 1349.05315
[11] L. H. Y. Chen, Poisson approximation for dependent trials. Ann. Probability 3 (1975), no. 3, 534-545 Zbl 0335.60016 MR 428387 · Zbl 0335.60016
[12] R. M. Corless, G. H. Gonnet, D. E. G. Hare, D. J. Jeffrey, and D. E. Knuth, On the Lambert W function. Adv. Comput. Math. 5 (1996), no. 4, 329-359 Zbl 0863.65008 MR 1414285 · Zbl 0863.65008
[13] H. Crane and M. Xu, Inference on the history of a randomly growing tree. J. R. Stat. Soc. Ser. B. Stat. Methodol. 83 (2021), no. 4, 639-668 MR 4319997 · Zbl 07555273
[14] J. Dall and M. Christensen, Random geometric graphs. Phys. Rev. E (3) 66 (2002), no. 1, Paper No. 016121, 9pp MR 1919736
[15] M. Deijfen, R. van der Hofstad, and G. Hooghiemstra, Scale-free percolation. Ann. Inst. Henri Poincaré Probab. Stat. 49 (2013), no. 3, 817-838 Zbl 1274.60290 MR 3112435 · Zbl 1274.60290
[16] N. G. des Mesnards, D. S. Hunter, Z. el Hjouji, and T. Zaman, Detecting bots and assessing their impact in social networks, 2018 arXiv:1810.12398
[17] J. Díaz, D. Mitsche, G. Perarnau, and X. Pérez-Giménez, On the relation between graph distance and Euclidean distance in random geometric graphs. Adv. in Appl. Probab. 48 (2016), no. 3, 848-864 Zbl 1348.05188 MR 3568895 · Zbl 1348.05188
[18] R. B. Ellis, J. L. Martin, and C. Yan, Random geometric graph diameter in the unit ball. Algorithmica 47 (2007), no. 4, 421-438 Zbl 1117.05099 MR 2304988 · Zbl 1117.05099
[19] M. Feily, A. Shahrestani, and S. Ramadass, A survey of botnet and botnet detection. In 2009 Third International Conference on Emerging Security Information, Systems and Technologies (SECURWARE), pp. 268-273, IEEE Computer Society, 2009
[20] T. Friedrich, T. Sauerwald, and A. Stauffer, Diameter and broadcast time of random geometric graphs in arbitrary dimensions. Algorithmica 67 (2013), no. 1, 65-88 Zbl 1275.05050 MR 3072817 · Zbl 1275.05050
[21] C. Gao and J. Lafferty. Testing network structure using relations between small subgraph probabilities, 2017 arXiv:1704.06742
[22] S. García, M. Grill, J. Stiborek, and A. Zunino, An empirical comparison of botnet detection methods. Computers and Security 45 (2014), 100-123
[23] G. Bet, K. Bogerd, R. M. Castro and R. van der Hofstad
[24] S. García, A. Zunino, and M. Campo, Survey on network-based botnet detection methods. Security and Communication Networks 7 (2014), no. 5, 878-903
[25] E. N. Gilbert, Random plane networks. J. Soc. Indust. Appl. Math. 9 (1961), 533-543 Zbl 0112.09403 MR 132566 · Zbl 0112.09403
[26] T. Hagerup and C. Rüb, A guided tour of Chernoff bounds. Inform. Process. Lett. 33 (1990), no. 6, 305-308 Zbl 0702.60021 MR 1045520 · Zbl 0702.60021
[27] J. M. Hammersley, The distribution of distance in a hypersphere. Ann. Math. Statistics 21 (1950), 447-452 Zbl 0039.39301 MR 37481 · Zbl 0039.39301
[28] N. A. Heard, D. J. Weston, K. Platanioti, and D. J. Hand, Bayesian anomaly detection methods for social networks. Ann. Appl. Stat. 4 (2010), no. 2, 645-662 Zbl 1194.62021 MR 2758643 · Zbl 1194.62021
[29] W. Hoeffding, A class of statistics with asymptotically normal distribution. Ann. Math. Statistics 19 (1948), 293-325 Zbl 0032.04101 MR 26294 · Zbl 0032.04101
[30] F. den Hollander, Probability theory: The coupling method. Lectures Notes, Leiden Uni-versity, 2012
[31] G. A. Kabatjanskiȋ and V. I. Levenšteȋn, On bounds for packings on a sphere and in space. Problems of Information Transmission 14 (1978), no. 1, 1-17 Zbl 0407.52005 MR 0514023
[32] D. Krioukov, F. Papadopoulos, M. Kitsak, A. Vahdat, and M. Boguñá, Hyperbolic geometry of complex networks. Phys. Rev. E (3) 82 (2010), no. 3, Paper No. 036106, 18pp MR 2787998
[33] H. D. Mittelmann and F. Vallentin, High-accuracy semidefinite programming bounds for kissing numbers. Experiment. Math. 19 (2010), no. 2, 175-179 Zbl 1279.11070 MR 2676746 · Zbl 1279.11070
[34] M. Mitzenmacher and E. Upfal, Probability and computing. Second edn., Cambridge Univ. Press, Cambridge, 2017 Zbl 1368.60002 MR 3674428
[35] M. Mongiovì, P. Bogdanov, R. Ranca, E. E. Papalexakisy, C. Faloutsos, and A. K. Singh, NetSpot: Spotting significant anomalous regions on dynamic networks. In Proceedings of the 2013 SIAM International Conference on Data Mining (SDM), pp. 28-36, SIAM, 2013
[36] O. R. Musin, The problem of the twenty-five spheres. Russ. Math. Surv. 58 (2003), no. 4(352), 794-795 Zbl 1059.52023 MR 2042912 · Zbl 1059.52023
[37] S. Muthukrishnan and G. Pandurangan, The bin-covering technique for thresholding random geometric graph properties. In Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 989-998, ACM, New York, 2005 Zbl 1297.05221 MR 2298358 · Zbl 1297.05221
[38] Y. Park, C. E. Priebe, and A. Youssef, Anomaly detection in time series of graphs using fusion of graph invariants. IEEE Journal on Selected Topics in Signal Processing 7 (2013), no. 1, 67-75 MR 2873481
[39] M. Penrose, Random geometric graphs. Oxford Studies in Probability 5, Oxford Univ. Press, Oxford, 2003 Zbl 1029.60007 MR 1986198
[40] D. Shah and T. Zaman, Rumors in a network: who’s the culprit? IEEE Trans. Inform. Theory 57 (2011), no. 8, 5163-5181 Zbl 1366.91126 MR 2849111 · Zbl 1366.91126
[41] A. B. Tsybakov, Introduction to nonparametric estimation. Springer Ser. Statist., Springer, New York, 2009 Zbl 1176.62032 MR 2724359 · Zbl 1176.62032
[42] N. Verzelen and E. Arias-Castro, Community detection in sparse random networks. Ann. Appl. Probab. 25 (2015), no. 6, 3465-3510 Zbl 1326.05145 MR 3404642 · Zbl 1326.05145
[43] D. J. Watts, Small worlds. Princeton Studies in Complexity, Princeton Univ. Press, Princeton, NJ, 1999 Zbl 1368.05139 MR 1716136
[44] H. R. Zeidanloo, M. J. Zadeh, Shooshtari, P. V. Amoli, M. Safari, and M. Zamani, A taxonomy of botnet detection techniques. In Proceedings of the 3rd IEEE International Conference on Computer Science and Information Technology. Vol. 2, pp. 158-162, IEEE, 2010. Received 21 May, 2020; revised 11 January, 2021
[45] G. Bet, Dipartimento di Matematica e Informatica, Università degli Studi di Firenze, Viale Morgagni 65, 50134 Florence, Italy E-mail: gianmarco.bet@unifi.it
[46] K. Bogerd, Department of Mathematics and Computer Science, Eindhoven University of Technology, P.O. Box 513, 5600 MB Eindhoven, Netherlands E-mail: k.m.bogerd@tue.nl
[47] R. M. Castro, Department of Mathematics and Computer Science, Eindhoven University of Technology, P.O. Box 513, 5600 MB Eindhoven, Netherlands E-mail: rmcastro@tue.nl
[48] R. van der Hofstad, Department of Mathematics and Computer Science, Eindhoven University of Technology, P.O. Box 513, 5600 MB Eindhoven, Netherlands E-mail: r.w.v.d.hofstad@tue.nl
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.