General Description

The accuracy calculation method provided in this section is for reference only. The code used is only for understanding the algorithm.

For a clustering model (clustering algorithm), it is difficult to evaluate its accuracy and performance on an unknown dataset. A common method is to select a subset of a dataset of a specific task and manually label the dataset for evaluation based on the subset, which is then used to evaluate the overall accuracy of the task-specific algorithm. The following metrics are provided for reference.

Suppose that the following information is available:

  • GroundTruthDic: dictionary type. key is the original ID, and value is the label of the feature vector cluster.
  • GroundTruthCluster: list of list. Each list in the list indicates a cluster.
  • PredictedCluster: list of list. Each list in the list indicates a cluster.