一类多档率和多类一档率

对于某些特定的场景，我们可以采用这种方法来进行计算，比如对于一个含有A、B等多个类别的数据集。

理想情况：
档案1：A01，A02，A03，A04

档案2：B01，B02

……
一类多档：
档案1：A01，A02

档案2：A03，A04

档案3：B01，B02

……
多类一档：
档案1：A01，A02，A03，A04，B01，B02

……

我们可以设计对应的指标来评估相应场景下的实现等等，这里给出一个基于sklearn混淆矩阵的相关一类多档和多类一档的实现。

```
from sklearn.metrics.cluster import pair_confusion_matrix
 
def CalcPrecisionRecall(allClusters, groundTruth, thresh = 99999999):
    '''
    Args
        allClusters: list of list, represent all clusters
        groundTruth: groundtruth dict
        thresh: blackhole archive threshold, if one single archive has features more than this value, ignore such archive
    '''
    filteredClusters = []
    for arc in allClusters:
        if len(arc) < thresh:
            filteredClusters.append(arc)
    tmpDict = {}
    for label, cluster in enumerate(filteredClusters):
        for feat in cluster:
            tmpDict[feat] = label
    featLis = list(tmpDict.keys())
    featLis.sort()
    pd = []
    gt = []
    for feat in featLis:
        pd.append(tmpDict[feat])
        gt.append(int(groundTruth[feat]))
    print("Start to calculate Precision and Recall")
    (TN, FP), (FN, TP) = pair_confusion_matrix(pd, gt)
    print("TN {} FP {} FN {} TP {}".format(TN, FP, FN, TP))
    print("Precision is TP / (TP + FP) = {} / ({} + {}) = {}%".format(TP, TP, FP, 100 * TP / (TP +FP)))
    print("Error Rate of Different People in One cluster is FP / (TP + FP), which is 1 -  precision: ")
    print("{} / ({} + {}) = {} %".format(FP, TP, FP, 100 * FP/(TP+FP)))
    print()
    print("Recall is TP / (TP + FN) = {} / ({} + {}) = {}%".format(TP, TP, FN, 100 * TP / (TP +FN)))
    print("Error Rate of One People in different cluster is FN / (TP + FN), which is 1 - recall: ")
    print("{} / ({} + {}) = {} %".format(FN, TP, FN, 100 * FN/(FN+TP)))
```

父主题： 聚类精度评估方式参考