一种有效的基于GraphX的分布式结构化图聚类算法

一种有效的基于GraphX 的分布式结构化图聚类算法*

时生乐,赵宇海+,李源,印莹,王国仁

东北大学计算机科学与工程学院,沈阳110819

Efficient GraphX-Based Distributed Structural Graph Clustering Algorithm SHI Shengle,ZHAO Yuhai +,LI Yuan,YIN Ying,WANG Guoren

School of Computer Science and Engineering,Northeastern University,Shenyang 110819,China

+Corresponding author:E-mail:zhaoyuhai@https://www.360docs.net/doc/b114523464.html,

SHI Shengle,ZHAO Yuhai,LI Yuan,et al.Efficient GraphX-based distributed structural graph clustering algorithm.Journal of Frontiers of Computer Science and Technology,2018,12(10):1571-1582.

Abstract:Structural graph clustering is a fundamental algorithm in large graph analysis,which is of great value in many real-world applications,such as component detection,biological function discovery and graph visualization.At present,most of the distributed structural graph clustering algorithms are based on MapReduce framework in Hadoop,however this framework requires a lot of disk I/O overhead and calculates the exact similarities between all adjacent vertices in the graph which increases the computation of the algorithm.To solve the above two problems,this paper proposes two pruning rules,the first to reduce the number of similarity calculation between adjacent verti-ces and the second to reduce the computation time by calculating the similarity between vertices imprecisely.Then this paper proposes a structural graph clustering algorithm based on GraphX in Spark,called GXDSGC,which saves a lot of disk I/O overhead during operation.Finally,extensive experiments on many real and synthetic datasets show the efficiency and effectiveness of the proposed GXDSGC algorithm.Notably,it performs more than 30times faster than the compared MapReduce framework algorithm based on Hadoop,which improves the efficiency of the structural graph clustering in graph data analysis.

Key words:Spark;GraphX;distributed computing;graph clustering;community structures

*The National Natural Science Foundation of China under Grant Nos.61272182,61332014(国家自然科学基金);the Fundamental Research Funds for the Central Universities of China under Grant Nos.N150402002,N150404008(中央高校基本科研业务费专项资金).

Received 2017-09,Accepted 2017-11.

CNKI 网络出版:2017-10-31,https://www.360docs.net/doc/b114523464.html,/kcms/detail/11.5602.TP.20171031.1634.002.html ISSN 1673-9418CODEN JKYTA8

Journal of Frontiers of Computer Science and Technology

1673-9418/2018/12(10)-1571-12

doi:10.3778/j.issn.1673-9418.1709050E-mail:fcst@https://www.360docs.net/doc/b114523464.html, https://www.360docs.net/doc/b114523464.html, Tel:+86-10-89056056

相关文档
最新文档