Clustering Parameter Configuration
Archive/Merge/KMeans Parameters
Set ArchiveMode to ARCHIVE, MERGE, or KMEANS by referring to the table "Common Archive/Merge parameters". FeatureClustering can either archive or merge archives at one time.
Parameter |
Default Value |
Possible Value |
Description |
|---|---|---|---|
FeatureCount |
10000 |
- |
Number of features to be archived When data is generated randomly, the value of FeatureCount cannot be set to 5000000. |
NeedNormalization |
TRUE |
TRUE/FALSE |
Whether a feature needs to be normalized by model. (The inner product measurement method for distance calculation needs to be normalized by model first.) The current INT8 data does not support quantization. |
PointPointThreshold |
0.875 |
- |
Threshold of the similarity between points (features) to be clustered |
PointClusterThreshold |
0.7 |
- |
Threshold of the similarity between points and clusters |
ClusterClusterThreshold |
0.8 |
- |
Threshold of the similarity between clusters |
MinRankDistance |
6 |
- |
Minimum sorting distance |
MaxRankDistance |
10 |
- |
Maximum sorting distance |
MinPicNum |
2 |
- |
Minimum number of vectors in an archive. If the number of features in an archive is less than the value of this parameter after archiving, all features in the archive are set as outliers. |
MaxCoverNum |
1 |
- |
Number of vectors in the cover archive (This parameter has no impact on MindX clustering. It is a reserved parameter for future use.) |
Parameter |
Default Value |
Possible Value |
Description |
|---|---|---|---|
ArchiveResultMergeThreshold |
0.6 |
- |
Threshold of the similarity between different archiving results. This parameter is used to merge archiving results in the archive merging scenario. |
MergeArchivesCount |
0 |
- |
Total number of archives to be merged |
Parameter |
Default Value |
Possible Value |
Description |
|---|---|---|---|
FeatureCount |
10000 |
- |
Total number of feature vectors to be archived in the K-Means clustering and archiving scenarios. Generally, the K-Means clustering mode is recommended when the clustering scale is greater than 10 million. A single processor supports a maximum of 25 million feature vectors in a base library. |
KMeansTimes |
6 |
- |
Number of K-Means clustering rounds. The minimum value is 1. The final clustering result is obtained by combining the results of multiple K-Means clustering rounds. |
ArchiveNum |
1000000 |
- |
Estimated value of the clustering numbers if the number of feature vectors is equal to the value of FeatureCount. If there is no specific reference value, set this parameter to one tenth of the value of FeatureCount, and then adjust the parameter based on the value. (The final clustering result is obtained from multiple rounds of clusterings. Therefore, the total number of final clustering categories may be different from the configured value of ArchiveNum. The value of ArchiveNum is used only as a reference value in a single K-Means clustering.) |
TopK |
100 |
- |
TopK feature vectors saved for retrieval of all feature vectors. A larger value of TopK indicates higher accuracy, but may slightly deteriorate the performance. Set this parameter as required. |
MaxKMeansIterTimes |
30 |
- |
Maximum number of iterations of a single K-Means clustering. If the number of iterations exceeds the maximum number, K-Means clustering ends immediately. |
MinFreqKMeans |
3 |
- |
In the combination result of multiple rounds of K-Means clustering, if two points belong to the lowest frequency of the same class in the final clustering result, the value of this parameter cannot exceed the value of KMeansTimes. |
MaxFreqIso |
3 |
- |
In the combination result of multiple rounds of K-Means clustering, if one or both of two points are isolated, and the two points belong to the highest frequency of the same class in the final clustering result, the value cannot exceed the value of KMeansTimes, but it should be greater than or equal to the value of MinFreqKMeans. |