General Description

The accuracy calculation method provided in this section is for reference only. The code used is only for understanding the algorithm.

For a clustering model (clustering algorithm), it is difficult to evaluate its accuracy and performance on an unknown dataset. A common method is to select a subset of a dataset of a specific task and manually label the dataset for evaluation based on the subset, which is then used to evaluate the overall accuracy of the task-specific algorithm. The following metrics are provided for reference.

Suppose that the following information is available:

GroundTruthDic: dictionary type. key is the original ID, and value is the label of the feature vector cluster.
GroundTruthCluster: list of list. Each list in the list indicates a cluster.
PredictedCluster: list of list. Each list in the list indicates a cluster.

Parent topic: Clustering Accuracy Evaluation Method