基于隨機映射的隱私保護聚類算法

打開文本圖片集
關鍵詞:高維數據;隱私保護;聚類;隨機映射;K-means
中圖分類號:TP309 文獻標志碼:A 文章編號:1001-3695(2025)08-035-2511-07
doi:10. 19734/j.issn.1001-3695.2024.10.0503
Privacy-preserving algorithm for clustering high-dimensional data based on random mapping
He Lili a,b, c,Zhang Chenglina,b,c,Cao Mingzenga,bc, Zhang Lei a.b,et (aScloffadEeoicogcalboaofuooutellgee&fr cessing,c.JiusiKbotoftelittoo&qeEeengecsiUJi longjiang 154007,China)
Abstract:Toaddress thechalengeof increasing privacycosts withrisingdata dimensions inclustering privacyprotectionalgorithms,this paper proposed arandom projection-based privacypreserving algorithm(RPPP).RPP selected relevant features usingthesymmetricaluncertaintymethodandgeneratedrandommatricesthroughindependentlyandidenticalldistributed Gaussiansequences.Tostrengthen distance-preservingproperties,itappied Gram-Schmidtorthogonalization toensuretheorthogonalityof therandom matrices.These matriceswere decomposed intomultipleindependentsub-matrices to map thereduced-dimensionalfeatures,andcreatedafeature-matchingdomainandanoise-perturbeddomain.To further enhanceprivacy protectin,thealgorithminjectedrandomnoiseintothenoise-perturbeddomain.ExperimentalresultsdemonstratethatRPPP efectivelydefendsagainstprivacyatacks.TestsconductedontheCancerandDiabetes datasetsshowthatRPPPoutperforms traditional algorithmsinbothprivacyprotectionandclustering eficiency.Specifically,RPPPimproves clustering efficncyby approximately 16.34% , 23.44% ,and 32.94% compared with UPA,GCCG,and AKA algorithms,respectively. Overall,RPPP significatlyehanesprivacyprotectionwhileboostingclustering eficiency,confirming itseffctivenessandpracticalaplicability.
Key Words:high-dimensional data;privacy protection;clustering;random projection;K-means
0 引言
近年來,隨著大數據技術的迅速興起以及信息技術的飛速發(fā)展[1,諸如醫(yī)療機構和教育機構等組織每天都會生成大量數據,這些數據涵蓋了廣泛的領域,通過數據挖掘技術的分析和處理,能夠將其轉換為具有實際應用價值的信息。(剩余18319字)