×

Efficient secure data publishing algorithms for supporting information sharing. (English) Zbl 1181.94117

Summary: Many data sharing applications require that publishing data should protect sensitive information pertaining to individuals, such as diseases of patients, the credit rating of a customer, and the salary of an employee. Meanwhile, certain information is required to be published. In this paper, we consider data-publishing applications where the publisher specifies both sensitive information and shared information. An adversary can infer the real value of a sensitive entry with a high confidence by using publishing data. The goal is to protect sensitive information in the presence of data inference using derived association rules on publishing data. We formulate the inference attack framework, and develop complexity results. We show that computing a safe partial table is an NP-hard problem. We classify the general problem into subcases based on the requirements of publishing information, and propose algorithms for finding a safe partial table to publish. We have conducted an empirical study to evaluate these algorithms on real data. The test results show that the proposed algorithms can produce approximate maximal published data and improve the performance of existing algorithms.

MSC:

94A62 Authentication, digital signatures and secret sharing
68P25 Data encryption (aspects in computer science)
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Yang X C, Wang B, Yu G. Efficient secure data publishing algorithms for supporting information sharing. Sci China Ser-F: Inf Sci, 2009, 52: 627-644 · Zbl 1181.94117
[2] Fung B C M, Wang K, Chen R, et al. Privacy-preserving data publishing: A survey of recent developments. ACM Comput Surv, 2010, 42: 1-53
[3] Huang X Z, Liu J Q, Han Z, et al. A new anonymity model for privacy-preserving data publishing. China Commun, 2014, 11: 47-59
[4] Zakerzadeh H, Osborn H. Delay-sensitive approaches for anonymizing numerical streaming data. Inter J Inf Secur, 2013, 12: 423-437
[5] Samarati P, Sweeney L. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. SRI Computer Science Laboratory Technical Report SRI-CSL-98-04, 1998
[6] Samarati P. Protecting respondents’ identities in microdata release. IEEE Trans Knowl Data Eng, 2001, 13: 1010-1027
[7] Sweeney L. k-anonymity: a model for protecting privacy. Int J Uncertainty Fuzziness Knowl-Based Syst, 2002, 10: 557-570 · Zbl 1085.68589
[8] Machanavajjhala A, Gehrke J, Kifer D, et al. l-diversity: privacy beyond k-anonymity. In: Proceedings of 2013 IEEE 29th International Conference on Data Engineering, Atlanta, 2013. 24
[9] Li N H, Li T C, Venkatasubramanian S. t-closeness: privacy beyond k-anonymity and l-diversity. In: Proceedings of IEEE 23rd International Conference on Data Engineering, Istanbul, 2007. 106-115
[10] Dwork C. Differential privacy. In: Prodeedings of the 33rd International Colloquium on Automata, Languages and Programming, Venice, 2006. 1-12 · Zbl 1133.68330
[11] Wong R C, Li J Y, Fu A W, et al. (a, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2006. 754-759
[12] Zhang Q, Koudas N, Srivastava D, et al. Aggregate query answering on anonymized tables. In: Proceedings of IEEE 23rd International Conference on Data Engineering, Istanbul, 2007. 116-125
[13] Li J X, Tao Y F, Xiao X K. Preservation of proximity privacy in publishing numerical sensitive data. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2008. 473-486
[14] Li N H, Li T C, Venkatasubramanian S. Closeness: a new privacy measure for data publishing. IEEE Trans Knowl Data Eng, 2010, 22: 943-956
[15] Cao J N, Karras P, Kalnis P, et al. Sabre: a sensitive attribute bucketization and redistribution framework for t-closeness. VLDB J, 2011, 20: 59-81
[16] Cao J N, Karras P. Publishing microdata with a robust privacy guarantee. In: Proceedings of the 38th Intermational Conference on Very Large Data Bases, Istanbul, 2012. 1388-1399
[17] Ye Y, Liu Y, Wang C, et al. Decomposition: privacy preservation for multiple sensitive attributes. In: Proceedings of the 14th International Conference on Database Systems for Advanced Applications. Berlin: Springer, 2009: 486-490
[18] Gal T S, Chen Z Y, Gangopadhyay A. A privacy protection model for patient data with multiple sensitive attributes. Int J Inf Secur Priv, 2008, 2: 28-44
[19] Abdalaal A, Nergiz M E, Saygin Y. Privacy-preserving publishing of opinion polls. Comput Secur, 2013, 37: 143-154
[20] Xiao X K, Tao Y F. Personalized privacy preservation. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2006. 229-240
[21] Cao J N, Carminati B, Ferrari E, et al. Castle: continuously anonymizing data streams. IEEE Trans Dependable Secur Comput, 2011, 8: 337-352
[22] Ghinita G, Karras P, Kalnis P, et al. Fast data anonymization with low information loss. In: Proceedings of the 33rd International Conference on Very Large Data Bases, Vienna, 2007. 758-769
[23] le Fevre K, de Witt D J, Ramakrishnan R. Incognito: efficient full-domain k-anonymity. In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2005. 49-60
[24] Cheng Q S. Attribute recognition theoretical model with application. Acta Sci Naturalium Univ Pekinensis, 1997, 33: 12-20 · Zbl 0870.68141
[25] Fang Y,
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.