Glossary
A-E
Term/Acronym/Abbreviation |
Description |
|---|---|
A |
|
AccumulatedRelativeError |
Accumulated relative error An accuracy comparison algorithm. The result ranges from 0 to infinity. A value closer to 0 indicates a higher similarity. |
Advisor |
An expert system A tool used to locate top performance tuning issues of models and operators, identify and analyze bottlenecks, and output tuning suggestions, thereby improving the development efficiency. |
AI |
Artificial intelligence A new science to study and create machines that can mimic humans and extend the reach of human ability. |
AIPP |
Artificial intelligence preprocessing AIPP is introduced for AI Core-based image preprocessing including image resizing, color space conversion (CSC), and mean subtraction and factor multiplication (for pixel changing), prior to model inference. |
Ascend EP |
Ascend endpoint Ascend EPs are Ascend AI Processors that serve as secondary devices, for example, PCIe accelerator cards. They work with the primary devices (x86 or Arm servers) for efficient inference, training, and image recognition. |
Ascend RC |
Ascend root complex Ascend RCs are Ascend AI Processors that serve as the primary devices, for example, the Atlas 200 DK. They provide the host control function and are mainly applicable to mobile devices. |
AscendCL |
Ascend computing language AscendCL provides a collection of C language API libraries for users to develop deep neural network (DNN) applications for target recognition and image classification, ranging from device, context, stream, and memory management, to model and operator loading and execution, as well as media data processing. |
ASHA |
Asynchronous successive halving algorithm A hyperparameter optimization algorithm based on dynamic resource allocation. Its basic idea is to parallelize the training of multiple sets of hyperparameters, with a small number of training iterations per round. It evaluates and ranks all hyperparameters, and stops the training of hyperparameters ranked in the lower half in advance. Then it evaluates remaining hyperparameters in the next round, and halves them again until the optimization objective is achieved. |
ATC |
Ascend tensor compiler
|
Accuracy comparison |
Accuracy comparison Accuracy comparison compares the dump data of a model running on NPUs and Ground Truth (.npy file of the model running on GPUs or CPUs). It implements comparison between the computation results of Huawei-developed operators and third-party equivalents. |
B |
|
BOHB |
Bayesian optimization and Hyperband BOHB mixes the Hyperband algorithm and Bayesian optimization for hyperparameter optimization. Specifically, it uses the Hyperband capability to sample many configurations with a small budget to explore quickly and efficiently the hyperparameter search space and get promising configurations. Then it uses the Bayesian optimizer predictive power to propose good configurations close to the optimum. |
BOSS |
Bayesian optimization via sub-sampling A universal hyperparameter optimization algorithm based on the Bayesian optimization framework to solve the problems of limited computing resources and efficient search. |
BP Point |
Backpropagation point refers to the end position of an inverse operator in the iterative trajectory of a training network. |
C |
|
CPU |
Central processing unit One of the main parts of a modern computer apart from internal memory and input/output devices. It interprets computer instructions and processes data in computer software. |
CosineSimilarity |
Cosine similarity algorithm An accuracy comparison algorithm. The result ranges from –1 to 1. A value closer to 1 indicates a higher similarity. |
D |
|
DDR |
Double data rate DDR achieves two read/write operations in one clock cycle. That is, one read/write operation is performed respectively on the rising and falling edges of a clock. |
DiffThd |
Difference threshold |
DSL |
Domain-specific language One of the operator development modes. You simply need to use the DSL APIs to express the computing process. Subsequent operator scheduling, optimization, and build can be easily performed using existing APIs. |
DVPP |
Digital vision preprocessing DVPP preprocesses videos and images in specific formats using methods such as decoding and scaling, and encodes and outputs the processed videos and images. |
F-J
Term/Acronym/Abbreviation |
Description |
|---|---|
F |
|
FP Point |
Forward propagation point refers to the start position of a forward operator in the iterative trajectory of a training network. |
FpDiff |
Floating-point difference |
FLOPS |
Floating-point operations per second In computing, FLOPS is a measure of computer performance, especially in fields of scientific calculations that make heavy use of floating-point calculations. (Note: The last letter "S" stands for second, not a plural form.) |
G |
|
GDB |
GNU debugger Standard debugger for the GNU OS. |
GE |
Graph engine GE provides a set of secure and easy-to-use APIs for graph/operator Intermediate Representation (IR) image composition. These APIs can be called to build a network model, and set graphs in the model, operators in the graphs, and attributes of the model and operators. |
GPU |
Graphics processing unit GPU is a microprocessor that performs image and graphics computing on PCs, workstations, game consoles, and mobile devices such as tablets and smartphones. |
Graph Mode |
MindSpore static graph mode, in which the neural network model is compiled into an entire graph and then delivered for execution.This mode introduces graph optimizations in the process to improve runtime performance and facilitate large-scale deployment and cross-platform execution. |
H |
|
HCCL |
Huawei collective communication library HCCL implements high-performance collective communications between servers in training scenarios of deep learning. |
HCCS |
High confidence computing systems HCCS provides high-performance inter-processor (inter-device) data communication capabilities in multi-device scenarios. |
HPO |
Hyperparameter optimization HPO uses automated algorithms to optimize hyperparameters, such as the learning rate, activation function, and optimizer, that cannot be optimized through training in the original machine learning/deep learning algorithms. |
HWTS |
Hardware task scheduler HWTS provides hardware scheduling for AI Core tasks and reduces scheduling latency. |
I |
|
IR |
Intermediate representation An IR is the data structure or code used internally by a compiler or virtual machine to represent source code. It is designed to be conducive for further processing, such as optimization and translation. |
J |
|
JDK |
Java software development kit A collection of Java-based software development tools. |
K-O
Term/Acronym/Abbreviation |
Description |
|---|---|
K |
|
KullbackLeiblerDivergence |
Kullback-Leibler divergence An accuracy comparison algorithm. The result ranges from 0 to infinity. The smaller the Kullback-Leibler divergence, the closer the approximate distribution is to the true distribution. |
L |
|
L2 Cache |
Second level cache L2 Cache refers to the shared second-level cache, which is called before the memory access. |
LLC |
Last level cache LLC refers to the shared highest-level cache, which is called before the memory access. |
M |
|
MaxAbsoluteError |
Maximum absolute error An accuracy comparison algorithm. The result ranges from 0 to infinity. A value closer to 0 indicates a higher similarity. |
MaxRelativeError |
Maximum relative error An accuracy comparison algorithm. The result ranges from 0 to infinity. A value closer to 0 indicates a higher similarity. |
MeanAbsoluteError |
Mean absolute error An accuracy comparison algorithm. The result ranges from 0 to infinity.
|
MeanRelativeError |
Mean relative error An accuracy comparison algorithm. The result ranges from 0 to infinity. A value closer to 0 indicates a higher similarity. |
msproftx |
msprof tool extension, an extension of the MindStudio system tuning tool. |
MTE1 |
Memory transfer engine 1 MTE1 copies the memory from an L1 buffer. |
MTE2 |
Memory transfer engine 2 MTE2 copies the memory from a DDR or an L2 buffer. |
MTE3 |
Memory transfer engine 3 MTE3 copies memory from the UB. |
N |
|
NAS |
Neural architecture search A technology that automates the design of high-performance neural network structures based on sample sets by using certain algorithms. It effectively reduces the use and implementation costs of neural networks. |
NIC |
Network interface controller NIC is also known as network interface card, network adapter, LAN adapter, or other similar terms. It refers to a hardware component that connects a computer to a computer network. |
NPU |
Neural-network processing unit NPU uses the data-driven parallel computing architecture and is capable of efficiently processing massive video and image multimedia data. It is dedicated to processing a large number of computing tasks in AI applications. |
Network-wide comparison |
A tensor comparison method in Model Accuracy Analyzer. It compares the accuracy of all operators involved in computing in a network model. |
O |
|
OP |
Operator An operator implements an operation, such as ReLU, Conv, Pooling, Scale, or Softmax. |
OPP |
Operator package |
OS |
Operating system |
P-T
Term/Acronym/Abbreviation |
Description |
|---|---|
P |
|
PCIe |
Peripheral component interconnect express PCIe provides high-speed serial point-to-point dual-channel high-bandwidth transmission. The connected devices are allocated with exclusive channel bandwidths and do not share the bus bandwidth. PCIe supports active power management, error reporting, end-to-end reliable transmission, hot swap, and quality of service (QoS). |
PctRlt |
Percent result: actual percentage. |
PctThd |
Percent threshold: percentage threshold. |
PyNative mode |
MindSpore dynamic graph mode. In this mode, operators on a neural network are delivered and executed one by one, making it easy to write and debug neural network models. |
PTQ |
Post-training quantization (PTQ) quantizes a pre-trained floating-point model and uses some training data to calibrate the model. PTQ requires the data-free and label-free algorithms, both of which can be performed on the Ascend inference platform and support the PTQ quantization scenarios with or without calibration datasets. The two algorithms convert a floating-point model into a fixed-point INT8 model, so as to compress the model, reduce the computing workload, and shorten the inference delay. Data-free quantization: Models can be quantized without obtaining input datasets. It performs effective quantization in data-free scenarios by iteratively flipping and optimizing weights at multiple scales. Label-free quantization: This algorithm requires users to provide a small number of datasets for calibration. Compared with data-free quantization, the input data of label-free quantization is consistent with the original data distribution, ensuring higher quantization accuracy. |
Q |
|
QAT |
Quantization Aware Training (QAT) refers to a process in which a fake quantizer is inserted into a model to simulate rounding and clamping operations performed by the quantization model during inference. QAT improves adaptability of a model to quantization effects in a training process and provides higher quantization accuracy. In this process, all calculations (including model forward and backward propagation and fake-quantization node calculations) are performed through floating-point calculations. And a real INT8 model is obtained only after the training is complete. |
R |
|
RateDiff |
Rate difference |
RelativeEuclideanDistance |
Euclidean relative distance An accuracy comparison algorithm. The result ranges from 0 to infinity. A value closer to 0 indicates a higher similarity. |
RoCE |
RDMA over converged Ethernet A network protocol deployed on the Ethernet that provides the remote memory management capability, allowing applications on different servers to directly move data between their memories without the intervention of the CPU. RoCE is a mechanism that provides communication interface bandwidth data. |
RootMeanSquareError |
Root mean square error An accuracy comparison algorithm. The result ranges from 0 to infinity.
|
RUNTIME |
Runtime runs in the application process space and provides applications with functions (specific to Ascend AI Processors) for managing memory, devices, streams, and events, and executing kernels. |
RDMA |
Remote direct memory access A function that enables a computer to directly transmit data to the memory of another computer over a network. |
S |
|
Sample-based |
Profiling samples profile data at fixed AI Core-sampling intervals. |
SDK |
Software development kit A set of software development tools that allow the creation of applications for a certain software package, software framework, hardware platform, operating system, or similar development platform. |
StandardDeviation |
Standard deviation An accuracy comparison algorithm. The result ranges from 0 to infinity. The smaller the standard deviation is, the smaller the dispersion is, and the closer the value is to the average value. The mean value and standard deviation of the dump data are displayed in the format of (mean value;standard deviation). The first set of data is the result of My Output, and the second set is the result of Ground Truth. |
Step Trace |
Iteration trace The traces include the start time and end time of the forward propagation and backpropagation in each iteration, gradient update, and data augmentation bound duration. |
ST |
System test A testing process conducted on the entire product system to verify whether the system meets the defined requirement specifications. |
SDMA |
System direct memory access A peripheral that enables direct transmission of data between memory and other peripherals, without relying on the system processor. This results in improved data transmission efficiency and frees up CPU resources. An SDMA task is executed on only one SDMA channel. HWTS manages eight SDMA channels. |
Single-operator comparison |
A tensor comparison method in Model Accuracy Analyzer. It compares the accuracy of one or more operators involved in computing in a network model. |
T |
|
Task-based |
Profiling samples the profile data of AI Core based on tasks. |
TBE |
Tensor boost engine TBE provides APIs for implementing operators using the Python language to build and generate CCE operators. |
Tensor |
Tensor A major data structure in TensorFlow. A tensor is N-dimensional (where N may be very large). It often takes the form of a scalar, vector, or matrix. The elements of a tensor can include integer values, floating point values, or string values. |
Tensor comparison |
Tensor comparison Tensor comparison compares data of two tensors using different algorithm evaluation indicators, and supports network-wide comparison and single-operator comparison. |
TIK |
Tensor iterator kernel One of the operator development modes. You can use the TIK API calls to write a custom operator in Python. Then, the TIK compiler compiles the operator into a binary file of an application that adapts to Ascend AI Processors. |
TransData |
A format conversion operator. |
TS |
Task scheduler TS is used to schedule different kernels to AI CPU or AI Core for execution. |
U-Z
Term/Acronym/Abbreviation |
Description |
|---|---|
U |
|
UT |
Unit test A fundamental testing activity in software development, where individual software units are tested in isolation from other program components. |
V |
|
VECTOR |
Vector operation |