对于某些特定的场景,我们可以采用这种方法来进行计算,比如对于一个含有A、B等多个类别的数据集。
档案2:B01,B02
……
档案2:A03,A04
档案3:B01,B02
……
……
我们可以设计对应的指标来评估相应场景下的实现等等,这里给出一个基于sklearn混淆矩阵的相关一类多档和多类一档的实现。
``` from sklearn.metrics.cluster import pair_confusion_matrix def CalcPrecisionRecall(allClusters, groundTruth, thresh = 99999999): ''' Args allClusters: list of list, represent all clusters groundTruth: groundtruth dict thresh: blackhole archive threshold, if one single archive has features more than this value, ignore such archive ''' filteredClusters = [] for arc in allClusters: if len(arc) < thresh: filteredClusters.append(arc) tmpDict = {} for label, cluster in enumerate(filteredClusters): for feat in cluster: tmpDict[feat] = label featLis = list(tmpDict.keys()) featLis.sort() pd = [] gt = [] for feat in featLis: pd.append(tmpDict[feat]) gt.append(int(groundTruth[feat])) print("Start to calculate Precision and Recall") (TN, FP), (FN, TP) = pair_confusion_matrix(pd, gt) print("TN {} FP {} FN {} TP {}".format(TN, FP, FN, TP)) print("Precision is TP / (TP + FP) = {} / ({} + {}) = {}%".format(TP, TP, FP, 100 * TP / (TP +FP))) print("Error Rate of Different People in One cluster is FP / (TP + FP), which is 1 - precision: ") print("{} / ({} + {}) = {} %".format(FP, TP, FP, 100 * FP/(TP+FP))) print() print("Recall is TP / (TP + FN) = {} / ({} + {}) = {}%".format(TP, TP, FN, 100 * TP / (TP +FN))) print("Error Rate of One People in different cluster is FN / (TP + FN), which is 1 - recall: ") print("{} / ({} + {}) = {} %".format(FN, TP, FN, 100 * FN/(FN+TP))) ```