一类多档率和多类一档率
对于某些特定的场景,我们可以采用这种方法来进行计算,比如对于一个含有A、B等多个类别的数据集。
- 理想情况:
档案2:B01,B02
……
- 一类多档:
档案2:A03,A04
档案3:B01,B02
……
- 多类一档:
……
我们可以设计对应的指标来评估相应场景下的实现等等,这里给出一个基于sklearn混淆矩阵的相关一类多档和多类一档的实现。
```
from sklearn.metrics.cluster import pair_confusion_matrix
def CalcPrecisionRecall(allClusters, groundTruth, thresh = 99999999):
'''
Args
allClusters: list of list, represent all clusters
groundTruth: groundtruth dict
thresh: blackhole archive threshold, if one single archive has features more than this value, ignore such archive
'''
filteredClusters = []
for arc in allClusters:
if len(arc) < thresh:
filteredClusters.append(arc)
tmpDict = {}
for label, cluster in enumerate(filteredClusters):
for feat in cluster:
tmpDict[feat] = label
featLis = list(tmpDict.keys())
featLis.sort()
pd = []
gt = []
for feat in featLis:
pd.append(tmpDict[feat])
gt.append(int(groundTruth[feat]))
print("Start to calculate Precision and Recall")
(TN, FP), (FN, TP) = pair_confusion_matrix(pd, gt)
print("TN {} FP {} FN {} TP {}".format(TN, FP, FN, TP))
print("Precision is TP / (TP + FP) = {} / ({} + {}) = {}%".format(TP, TP, FP, 100 * TP / (TP +FP)))
print("Error Rate of Different People in One cluster is FP / (TP + FP), which is 1 - precision: ")
print("{} / ({} + {}) = {} %".format(FP, TP, FP, 100 * FP/(TP+FP)))
print()
print("Recall is TP / (TP + FN) = {} / ({} + {}) = {}%".format(TP, TP, FN, 100 * TP / (TP +FN)))
print("Error Rate of One People in different cluster is FN / (TP + FN), which is 1 - recall: ")
print("{} / ({} + {}) = {} %".format(FN, TP, FN, 100 * FN/(FN+TP)))
```
父主题: 聚类精度评估方式参考