Glossary

A-E

**Table 1**
Term/Acronym/Abbreviation	Description
A
AccumulatedRelativeError	Accumulated relative error An accuracy comparison algorithm. The result ranges from 0 to infinity. A value closer to 0 indicates a higher similarity.
Advisor	An expert system A tool used to locate top performance tuning issues of models and operators, identify and analyze bottlenecks, and output tuning suggestions, thereby improving the development efficiency.
AI	Artificial intelligence A new science to study and create machines that can mimic humans and extend the reach of human ability.
AIPP	Artificial intelligence preprocessing AIPP is introduced for AI Core-based image preprocessing including image resizing, color space conversion (CSC), and mean subtraction and factor multiplication (for pixel changing), prior to model inference.
Ascend EP	Ascend endpoint Ascend EPs are Ascend AI Processors that serve as secondary devices, for example, PCIe accelerator cards. They work with the primary devices (x86 or Arm servers) for efficient inference, training, and image recognition.
Ascend RC	Ascend root complex Ascend RCs are Ascend AI Processors that serve as the primary devices, for example, the Atlas 200 DK. They provide the host control function and are mainly applicable to mobile devices.
AscendCL	Ascend computing language AscendCL provides a collection of C language API libraries for users to develop deep neural network (DNN) applications for target recognition and image classification, ranging from device, context, stream, and memory management, to model and operator loading and execution, as well as media data processing.
ASHA	Asynchronous successive halving algorithm A hyperparameter optimization algorithm based on dynamic resource allocation. Its basic idea is to parallelize the training of multiple sets of hyperparameters, with a small number of training iterations per round. It evaluates and ranks all hyperparameters, and stops the training of hyperparameters ranked in the lower half in advance. Then it evaluates remaining hyperparameters in the next round, and halves them again until the optimization objective is achieved.
ATC	Ascend tensor compiler ATC converts network models under open-source frameworks, such as Caffe and TensorFlow, into offline models supported by Ascend AI Processors. It implements operator scheduling tuning, weight data rearrangement, and memory usage tuning during model conversion. ATC can be used to build operators.
Accuracy comparison	Accuracy comparison Accuracy comparison compares the dump data of a model running on NPUs and Ground Truth (.npy file of the model running on GPUs or CPUs). It implements comparison between the computation results of Huawei-developed operators and third-party equivalents.
B
BOHB	Bayesian optimization and Hyperband BOHB mixes the Hyperband algorithm and Bayesian optimization for hyperparameter optimization. Specifically, it uses the Hyperband capability to sample many configurations with a small budget to explore quickly and efficiently the hyperparameter search space and get promising configurations. Then it uses the Bayesian optimizer predictive power to propose good configurations close to the optimum.
BOSS	Bayesian optimization via sub-sampling A universal hyperparameter optimization algorithm based on the Bayesian optimization framework to solve the problems of limited computing resources and efficient search.
BP Point	Backpropagation point refers to the end position of an inverse operator in the iterative trajectory of a training network.
C
CPU	Central processing unit One of the main parts of a modern computer apart from internal memory and input/output devices. It interprets computer instructions and processes data in computer software.
CosineSimilarity	Cosine similarity algorithm An accuracy comparison algorithm. The result ranges from –1 to 1. A value closer to 1 indicates a higher similarity.
D
DDR	Double data rate DDR achieves two read/write operations in one clock cycle. That is, one read/write operation is performed respectively on the rising and falling edges of a clock.
DiffThd	Difference threshold
DSL	Domain-specific language One of the operator development modes. You simply need to use the DSL APIs to express the computing process. Subsequent operator scheduling, optimization, and build can be easily performed using existing APIs.
DVPP	Digital vision preprocessing DVPP preprocesses videos and images in specific formats using methods such as decoding and scaling, and encodes and outputs the processed videos and images.

F-J

**Table 2**
Term/Acronym/Abbreviation	Description
F
FP Point	Forward propagation point refers to the start position of a forward operator in the iterative trajectory of a training network.
FpDiff	Floating-point difference
FLOPS	Floating-point operations per second In computing, FLOPS is a measure of computer performance, especially in fields of scientific calculations that make heavy use of floating-point calculations. (Note: The last letter "S" stands for second, not a plural form.)
G
GDB	GNU debugger Standard debugger for the GNU OS.
GE	Graph engine GE provides a set of secure and easy-to-use APIs for graph/operator Intermediate Representation (IR) image composition. These APIs can be called to build a network model, and set graphs in the model, operators in the graphs, and attributes of the model and operators.
GPU	Graphics processing unit GPU is a microprocessor that performs image and graphics computing on PCs, workstations, game consoles, and mobile devices such as tablets and smartphones.
Graph Mode	MindSpore static graph mode, in which the neural network model is compiled into an entire graph and then delivered for execution.This mode introduces graph optimizations in the process to improve runtime performance and facilitate large-scale deployment and cross-platform execution.
H
HCCL	Huawei collective communication library HCCL implements high-performance collective communications between servers in training scenarios of deep learning.
HCCS	High confidence computing systems HCCS provides high-performance inter-processor (inter-device) data communication capabilities in multi-device scenarios.
HPO	Hyperparameter optimization HPO uses automated algorithms to optimize hyperparameters, such as the learning rate, activation function, and optimizer, that cannot be optimized through training in the original machine learning/deep learning algorithms.
HWTS	Hardware task scheduler HWTS provides hardware scheduling for AI Core tasks and reduces scheduling latency.
I
IR	Intermediate representation An IR is the data structure or code used internally by a compiler or virtual machine to represent source code. It is designed to be conducive for further processing, such as optimization and translation.
J
JDK	Java software development kit A collection of Java-based software development tools.

K-O

**Table 3**
Term/Acronym/Abbreviation	Description
K
KullbackLeiblerDivergence	Kullback-Leibler divergence An accuracy comparison algorithm. The result ranges from 0 to infinity. The smaller the Kullback-Leibler divergence, the closer the approximate distribution is to the true distribution.
L
L2 Cache	Second level cache L2 Cache refers to the shared second-level cache, which is called before the memory access.
LLC	Last level cache LLC refers to the shared highest-level cache, which is called before the memory access.
M
MaxAbsoluteError	Maximum absolute error An accuracy comparison algorithm. The result ranges from 0 to infinity. A value closer to 0 indicates a higher similarity.
MaxRelativeError	Maximum relative error An accuracy comparison algorithm. The result ranges from 0 to infinity. A value closer to 0 indicates a higher similarity.
MeanAbsoluteError	Mean absolute error An accuracy comparison algorithm. The result ranges from 0 to infinity. Values of MeanAbsoluteError and RootMeanSquareError that are closer to 0 indicate that the measured value is more accurate and closer to the actual value. If the value of MeanAbsoluteError is close to 0, a larger value of RootMeanSquareError indicates that some values are excessively large. A larger MeanAbsoluteError value and a RootMeanSquareError value that is equal to or close to the MeanAbsoluteError value suggest that the overall deviation is more centralized. A larger MeanAbsoluteError and a RootMeanSquareError value greater than that of MeanAbsoluteError indicate the presence of overall deviation and a scattered distribution of the deviation. Other situations do not exist because "RMSE ≥ MAE" is always true.
MeanRelativeError	Mean relative error An accuracy comparison algorithm. The result ranges from 0 to infinity. A value closer to 0 indicates a higher similarity.
msproftx	msprof tool extension, an extension of the MindStudio system tuning tool.
MTE1	Memory transfer engine 1 MTE1 copies the memory from an L1 buffer.
MTE2	Memory transfer engine 2 MTE2 copies the memory from a DDR or an L2 buffer.
MTE3	Memory transfer engine 3 MTE3 copies memory from the UB.
N
NAS	Neural architecture search A technology that automates the design of high-performance neural network structures based on sample sets by using certain algorithms. It effectively reduces the use and implementation costs of neural networks.
NIC	Network interface controller NIC is also known as network interface card, network adapter, LAN adapter, or other similar terms. It refers to a hardware component that connects a computer to a computer network.
NPU	Neural-network processing unit NPU uses the data-driven parallel computing architecture and is capable of efficiently processing massive video and image multimedia data. It is dedicated to processing a large number of computing tasks in AI applications.
Network-wide comparison	A tensor comparison method in Model Accuracy Analyzer. It compares the accuracy of all operators involved in computing in a network model.
O
OP	Operator An operator implements an operation, such as ReLU, Conv, Pooling, Scale, or Softmax.
OPP	Operator package
OS	Operating system

P-T

**Table 4**
Term/Acronym/Abbreviation	Description
P
PCIe	Peripheral component interconnect express PCIe provides high-speed serial point-to-point dual-channel high-bandwidth transmission. The connected devices are allocated with exclusive channel bandwidths and do not share the bus bandwidth. PCIe supports active power management, error reporting, end-to-end reliable transmission, hot swap, and quality of service (QoS).
PctRlt	Percent result: actual percentage.
PctThd	Percent threshold: percentage threshold.
PyNative mode	MindSpore dynamic graph mode. In this mode, operators on a neural network are delivered and executed one by one, making it easy to write and debug neural network models.
PTQ	Post-training quantization (PTQ) quantizes a pre-trained floating-point model and uses some training data to calibrate the model. PTQ requires the data-free and label-free algorithms, both of which can be performed on the Ascend inference platform and support the PTQ quantization scenarios with or without calibration datasets. The two algorithms convert a floating-point model into a fixed-point INT8 model, so as to compress the model, reduce the computing workload, and shorten the inference delay. Data-free quantization: Models can be quantized without obtaining input datasets. It performs effective quantization in data-free scenarios by iteratively flipping and optimizing weights at multiple scales. Label-free quantization: This algorithm requires users to provide a small number of datasets for calibration. Compared with data-free quantization, the input data of label-free quantization is consistent with the original data distribution, ensuring higher quantization accuracy.
Q
QAT	Quantization Aware Training (QAT) refers to a process in which a fake quantizer is inserted into a model to simulate rounding and clamping operations performed by the quantization model during inference. QAT improves adaptability of a model to quantization effects in a training process and provides higher quantization accuracy. In this process, all calculations (including model forward and backward propagation and fake-quantization node calculations) are performed through floating-point calculations. And a real INT8 model is obtained only after the training is complete.
R
RateDiff	Rate difference
RelativeEuclideanDistance	Euclidean relative distance An accuracy comparison algorithm. The result ranges from 0 to infinity. A value closer to 0 indicates a higher similarity.
RoCE	RDMA over converged Ethernet A network protocol deployed on the Ethernet that provides the remote memory management capability, allowing applications on different servers to directly move data between their memories without the intervention of the CPU. RoCE is a mechanism that provides communication interface bandwidth data.
RootMeanSquareError	Root mean square error An accuracy comparison algorithm. The result ranges from 0 to infinity. Values of MeanAbsoluteError and RootMeanSquareError that are closer to 0 indicate that the measured value is more accurate and closer to the actual value. If the value of MeanAbsoluteError is close to 0, a larger value of RootMeanSquareError indicates that some values are excessively large. A larger MeanAbsoluteError value and a RootMeanSquareError value that is equal to or close to the MeanAbsoluteError value suggest that the overall deviation is more centralized. A larger MeanAbsoluteError and a RootMeanSquareError value greater than that of MeanAbsoluteError indicate the presence of overall deviation and a scattered distribution of the deviation. Other situations do not exist because "RMSE ≥ MAE" is always true.
RUNTIME	Runtime runs in the application process space and provides applications with functions (specific to Ascend AI Processors) for managing memory, devices, streams, and events, and executing kernels.
RDMA	Remote direct memory access A function that enables a computer to directly transmit data to the memory of another computer over a network.
S
Sample-based	Profiling samples profile data at fixed AI Core-sampling intervals.
SDK	Software development kit A set of software development tools that allow the creation of applications for a certain software package, software framework, hardware platform, operating system, or similar development platform.
StandardDeviation	Standard deviation An accuracy comparison algorithm. The result ranges from 0 to infinity. The smaller the standard deviation is, the smaller the dispersion is, and the closer the value is to the average value. The mean value and standard deviation of the dump data are displayed in the format of (mean value;standard deviation). The first set of data is the result of My Output, and the second set is the result of Ground Truth.
Step Trace	Iteration trace The traces include the start time and end time of the forward propagation and backpropagation in each iteration, gradient update, and data augmentation bound duration.
ST	System test A testing process conducted on the entire product system to verify whether the system meets the defined requirement specifications.
SDMA	System direct memory access A peripheral that enables direct transmission of data between memory and other peripherals, without relying on the system processor. This results in improved data transmission efficiency and frees up CPU resources. An SDMA task is executed on only one SDMA channel. HWTS manages eight SDMA channels.
Single-operator comparison	A tensor comparison method in Model Accuracy Analyzer. It compares the accuracy of one or more operators involved in computing in a network model.
T
Task-based	Profiling samples the profile data of AI Core based on tasks.
TBE	Tensor boost engine TBE provides APIs for implementing operators using the Python language to build and generate CCE operators.
Tensor	Tensor A major data structure in TensorFlow. A tensor is N-dimensional (where N may be very large). It often takes the form of a scalar, vector, or matrix. The elements of a tensor can include integer values, floating point values, or string values.
Tensor comparison	Tensor comparison Tensor comparison compares data of two tensors using different algorithm evaluation indicators, and supports network-wide comparison and single-operator comparison.
TIK	Tensor iterator kernel One of the operator development modes. You can use the TIK API calls to write a custom operator in Python. Then, the TIK compiler compiles the operator into a binary file of an application that adapts to Ascend AI Processors.
TransData	A format conversion operator.
TS	Task scheduler TS is used to schedule different kernels to AI CPU or AI Core for execution.

U-Z

**Table 5**
Term/Acronym/Abbreviation	Description
U
UT	Unit test A fundamental testing activity in software development, where individual software units are tested in isolation from other program components.
V
VECTOR	Vector operation