×

Inferring urban land use using large-scale social media check-in data. (English) Zbl 1338.91118

Summary: Emerging location-based services in social media tools such as Foursquare and Twitter are providing an unprecedented amount of public-generated data on human movements and activities. This novel data source contains valuable information (e.g., geo-location, time and date, type of places) on human activities. While the data is tremendously beneficial in modeling human activity patterns, it is also greatly useful in inferring planning related variables such as a city’s land use characteristics. This paper provides a comprehensive investigation on the possibility and validity of utilizing large-scale social media check-in data to infer land use types by applying the state-of-art data mining techniques. Two inference approaches are proposed and tested in this paper: the unsupervised clustering method and supervised learning method. The land use inference is conducted in a uniform grid level of 200 by 200 m. The methods are applied to a case study of New York City. The validation result confirms that the two approaches effectively infer different land use types given sufficient check-in data. The encouraging result demonstrates the potential of using social media check-in data in urban land use inference, and also reveals the hidden linkage between the human activity pattern and the underlying urban land use pattern.

MSC:

91D30 Social networks; opinion dynamics
86A32 Geostatistics
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Abonyi J, Feil B (2007) Cluster analysis for data mining and system identification. Springer, London · Zbl 1156.62045
[2] Alelyani S, Tang J, Liu H (2013) Feature selection for clustering: A review. Data Clust Algorithm Appl, CRC Press · Zbl 1377.68210
[3] Balasko B, Abonyi J, Feil B (2005) Fuzzy clustering and data analysis toolbox. http://www.abonyilab.com/software-and-data/fclusttoolbox
[4] Barnsley MJ, Barr SL (1996) Inferring urban land use from satellite sensor images using kernel-based spatial reclassification. Photogramm Eng Remote Sens 62(8):949-958
[5] Bishop CM (2006) Pattern recognition and machine learning (Information Science and Statistics), 1st edn. Springer-Verlag New York, Inc, Secaucus
[6] Breiman L (2001) Random forests. Mach Learn 45(1):5-32 · Zbl 1007.68152 · doi:10.1023/A:1010933404324
[7] Cheng Z et al. (2011) Exploring millions of footprints in location sharing services. AAAI ICWSM, 2010(Cholera) · Zbl 1007.68152
[8] ComScore, Inc (2012) 2012 mobile future in focus. ComScore, Inc.https://snaphop.com/2012-mobile-marketing-statistics/ · Zbl 0291.68033
[9] Davies D, Bouldin D (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 1(2):224-227 · doi:10.1109/TPAMI.1979.4766909
[10] Dunn JC (1973) A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J Cybernet 3(3):32-57 · Zbl 0291.68033 · doi:10.1080/01969727308546046
[11] González MC, Hidalgo CA, Barabási A-L (2008) Understanding individual human mobility patterns. Nature 453(7196):779-782 · doi:10.1038/nature06958
[12] Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. ACM SIGKDD Explor Newsl 11(1):10-18 · doi:10.1145/1656274.1656278
[13] Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, (Springer Series in Statistics), 2nd edn. Springer, New York · Zbl 1273.62005 · doi:10.1007/978-0-387-84858-7
[14] He X, Cai D, Niyogi P (2006) Laplacian score for feature selection. Adv Neural Inf Process Syst 18:507
[15] Marchal F (2005) A trip generation method for time-dependent Large-Scale Simulations of Transport and Land-Use. Netw Spat Econ 5:179-192 · Zbl 1081.90016 · doi:10.1007/s11067-005-2628-z
[16] Mesev V (1998) The use of census data in urban image classification. Photogramm Eng Remote Sens 5:431-438
[17] Moran MS, Inoue Y, Barnes EM (1997) Opportunities and limitations for image-based remote sensing in precision crop management. Remote Sens Environ 61(3):319-346 · doi:10.1016/S0034-4257(97)00045-X
[18] Müller E, Günnemann S, Assent I, Seidl T (2009) Evaluating clustering in subspace projections of high dimensional data. In Proc. 35th International Conference on Very Large Data Bases (VLDB 2009), Lyon, France
[19] New York City Department of City Planning (NYCDCP) (2013) MapPluto. http://www.nyc.gov/html/dcp/html/bytes/dwn_pluto_mappluto.shtml#mappluto
[20] Pfaffenbichler P, Emberger G, Shepherd S (2008) The integrated dynamic land use and transport model MARS. Netw Spat Econ 8(2-3):183-200 · doi:10.1007/s11067-007-9050-7
[21] Qi G, Li X, Li S, Pan G, Wang Z, Zhang D (2011) Measuring social functions of city regions from large-scale taxi behaviors. In the proceeding of Ninth Annual IEEE International Conference on Pervasive Computing and Communications, PerCOM, 384-388
[22] Ray S, Turi RH (1999) Determination of number of clusters in k-means clustering and application in colour image segmentation. In ICAPRDT
[23] Schmit C, Rounsevell MDA, La Jeunesse I (2006) The limitations of spatial land use data in environmental analysis. Environ Sci Pol 9(2):174-188 · doi:10.1016/j.envsci.2005.11.006
[24] Song C, Qu Z, Blumm N, Barabási A-L (2010) Limits of predictability in human mobility. Science (New York, NY) 327(5968):1018-1021 · Zbl 1226.91058 · doi:10.1126/science.1177170
[25] Soto V, Frias-Martinez E (2011a) Robust land use characterization of urban landscapes using cell phone data. In 1st Workshop on Pervasive Urban Applications, in conjunction with 9th Int. Conf. Pervasive Computing, June 2011 · Zbl 1152.91755
[26] Soto V, Frías-Martínez E (2011b) Automated land use identification using cell-phone records. In Proceedings of the 3rd ACM International Workshop on MobiArch - HotPlanet’11, 17. ACM Press, New York
[27] Sun H, Forsythe W, Waters N (2007) Modeling urban land use change and Urban Sprawl: Calgary, Alberta, Canada. Netw Spat Econ 7(4):353-376 · Zbl 1152.91755 · doi:10.1007/s11067-007-9030-y
[28] Toole JL, Ulm M, González MC, Bauer D (2012) Inferring land use from mobile phone activity. In Proceedings of the ACM SIGKDD International Workshop on Urban Computing - UrbComp’12, 1. ACM Press, New York
[29] Winkler R, Klawonn F, Kruse R (2011) Fuzzy c-means in high dimensional spaces. Int J Fuzzy Syst Appl (IJFSA) 1(1):1-16
[30] Xie XL, Beni G (1991) A validity measure for fuzzy clustering. IEEE Trans pattern anal mach intell 13(8):841-847 · Zbl 1226.91058
[31] Yang X, Lo CP (2002) Using a time series of satellite imagery to detect land use and land cover changes in the Atlanta, Georgia Metropolitan Area. Int J Remote Sens 23(9):1775-1798 · doi:10.1080/01431160110075802
[32] Yuan J, Yu Z, Xing X (2012) Discovering regions of different functions in a City using human mobility and POIs. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD’12, 186. ACM Press, New York
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.