Clustering analysis is critical towards understanding the hidden patterns behind the data. However, how to evaluate the quality of different clustering methods is still challenging and remains unsolved. The most widely used metric, normalized mutual information (NMI), was proved to have finite size effect, and its improved form relative normalized mutual information (rNMI) has reverse finite size effect. Corrected normalized mutual information (cNMI) was thus proposed and has neither finite size effect nor reverses finite size effect. However, cNMI violates the so-called proportionality assumption. In addition, NMI-type metrics have the problem of ignoring importance of small communities. Finally, they cannot be used to evaluate a single community of interest. Thus, we map the computed community labels to the ground-truth ones through integer linear programming, then use kappa index and F-score to evaluate the clustering methods. Experimental results demonstrate the advantages of the proposed method.
张忠元, 理学博士, 中央财经大学教授, 博士生导师, 中国计算机学会高级会员, 果壳网科学顾问。主要研究兴趣在复杂网络分析和智慧城市。在Data Mining and Knowledge Discovery, Physical Review E, EPL, Knowledge and Information Systems, Scientific Reports, PLOS ONE, 中国科学等国内外著名期刊上发表学术论文十余篇。爱思唯尔杰出审稿人, 担任Data Mining and Knowledge Discovery, Physica A, Management Science等著名期刊的匿名审稿人。教授课程包括数学分析, 运筹学, 数据挖掘和统计学等。