Rates of Different Classes in One Cluster and One Class in Different Clusters
These two rates are used in certain specific scenarios. The following uses a dataset that contains classes A and B as an example.
- Ideal case:
Archive 2: B01, B02
...
- One class in different clusters:
Archive 2: A03, A04
Archive 3: B01, B02
...
- Different classes in one cluster:
Archive 1: A01, A02, A03, A04, B01, B02
...
Dedicated metrics are created to evaluate a specific scenario. The following describes the grouping of different classes into one cluster and one class into different clusters based on the confusion matrix in sklearn.
```
from sklearn.metrics.cluster import pair_confusion_matrix
def CalcPrecisionRecall(allClusters, groundTruth, thresh = 99999999):
'''
Args
allClusters: list of list, represent all clusters
groundTruth: groundtruth dict
thresh: blackhole archive threshold, if one single archive has features more than this value, ignore such archive
'''
filteredClusters = []
for arc in allClusters:
if len(arc) < thresh:
filteredClusters.append(arc)
tmpDict = {}
for label, cluster in enumerate(filteredClusters):
for feat in cluster:
tmpDict[feat] = label
featLis = list(tmpDict.keys())
featLis.sort()
pd = []
gt = []
for feat in featLis:
pd.append(tmpDict[feat])
gt.append(int(groundTruth[feat]))
print("Start to calculate Precision and Recall")
(TN, FP), (FN, TP) = pair_confusion_matrix(pd, gt)
print("TN {} FP {} FN {} TP {}".format(TN, FP, FN, TP))
print("Precision is TP / (TP + FP) = {} / ({} + {}) = {}%".format(TP, TP, FP, 100 * TP / (TP +FP)))
print("Error Rate of Different People in One cluster is FP / (TP + FP), which is 1 - precision: ")
print("{} / ({} + {}) = {} %".format(FP, TP, FP, 100 * FP/(TP+FP)))
print()
print("Recall is TP / (TP + FN) = {} / ({} + {}) = {}%".format(TP, TP, FN, 100 * TP / (TP +FN)))
print("Error Rate of One People in different cluster is FN / (TP + FN), which is 1 - recall: ")
print("{} / ({} + {}) = {} %".format(FN, TP, FN, 100 * FN/(FN+TP)))
```
Parent topic: Clustering Accuracy Evaluation Method