Glossary

A-E

Table 1

Term/Acronym/Abbreviation

Description

A

AccumulatedRelativeError

Accumulated relative error

An accuracy comparison algorithm. The result ranges from 0 to infinity. A value closer to 0 indicates a higher similarity.

Advisor

An expert system

A tool used to locate top performance tuning issues of models and operators, identify and analyze bottlenecks, and output tuning suggestions, thereby improving the development efficiency.

AI

Artificial intelligence

A new science to study and create machines that can mimic humans and extend the reach of human ability.

AIPP

Artificial intelligence preprocessing

AIPP is introduced for AI Core-based image preprocessing including image resizing, color space conversion (CSC), and mean subtraction and factor multiplication (for pixel changing), prior to model inference.

Ascend EP

Ascend endpoint

Ascend EPs are Ascend AI Processors that serve as secondary devices, for example, PCIe accelerator cards. They work with the primary devices (x86 or Arm servers) for efficient inference, training, and image recognition.

Ascend RC

Ascend root complex

Ascend RCs are Ascend AI Processors that serve as the primary devices, for example, the Atlas 200 DK. They provide the host control function and are mainly applicable to mobile devices.

AscendCL

Ascend computing language

AscendCL provides a collection of C language API libraries for users to develop deep neural network (DNN) applications for target recognition and image classification, ranging from device, context, stream, and memory management, to model and operator loading and execution, as well as media data processing.

ASHA

Asynchronous successive halving algorithm

A hyperparameter optimization algorithm based on dynamic resource allocation. Its basic idea is to parallelize the training of multiple sets of hyperparameters, with a small number of training iterations per round. It evaluates and ranks all hyperparameters, and stops the training of hyperparameters ranked in the lower half in advance. Then it evaluates remaining hyperparameters in the next round, and halves them again until the optimization objective is achieved.

ATC

Ascend tensor compiler

  • ATC converts network models under open-source frameworks, such as Caffe and TensorFlow, into offline models supported by Ascend AI Processors. It implements operator scheduling tuning, weight data rearrangement, and memory usage tuning during model conversion.
  • ATC can be used to build operators.

Accuracy comparison

Accuracy comparison

Accuracy comparison compares the dump data of a model running on NPUs and Ground Truth (.npy file of the model running on GPUs or CPUs). It implements comparison between the computation results of Huawei-developed operators and third-party equivalents.

B

BOHB

Bayesian optimization and Hyperband

BOHB mixes the Hyperband algorithm and Bayesian optimization for hyperparameter optimization. Specifically, it uses the Hyperband capability to sample many configurations with a small budget to explore quickly and efficiently the hyperparameter search space and get promising configurations. Then it uses the Bayesian optimizer predictive power to propose good configurations close to the optimum.

BOSS

Bayesian optimization via sub-sampling

A universal hyperparameter optimization algorithm based on the Bayesian optimization framework to solve the problems of limited computing resources and efficient search.

BP Point

Backpropagation point refers to the end position of an inverse operator in the iterative trajectory of a training network.

C

CPU

Central processing unit

One of the main parts of a modern computer apart from internal memory and input/output devices. It interprets computer instructions and processes data in computer software.

CosineSimilarity

Cosine similarity algorithm

An accuracy comparison algorithm. The result ranges from –1 to 1. A value closer to 1 indicates a higher similarity.

D

DDR

Double data rate

DDR achieves two read/write operations in one clock cycle. That is, one read/write operation is performed respectively on the rising and falling edges of a clock.

DiffThd

Difference threshold

DSL

Domain-specific language

One of the operator development modes. You simply need to use the DSL APIs to express the computing process. Subsequent operator scheduling, optimization, and build can be easily performed using existing APIs.

DVPP

Digital vision preprocessing

DVPP preprocesses videos and images in specific formats using methods such as decoding and scaling, and encodes and outputs the processed videos and images.

F-J

Table 2

Term/Acronym/Abbreviation

Description

F

FP Point

Forward propagation point refers to the start position of a forward operator in the iterative trajectory of a training network.

FpDiff

Floating-point difference

FLOPS

Floating-point operations per second

In computing, FLOPS is a measure of computer performance, especially in fields of scientific calculations that make heavy use of floating-point calculations. (Note: The last letter "S" stands for second, not a plural form.)

G

GDB

GNU debugger

Standard debugger for the GNU OS.

GE

Graph engine

GE provides a set of secure and easy-to-use APIs for graph/operator Intermediate Representation (IR) image composition. These APIs can be called to build a network model, and set graphs in the model, operators in the graphs, and attributes of the model and operators.

GPU

Graphics processing unit

GPU is a microprocessor that performs image and graphics computing on PCs, workstations, game consoles, and mobile devices such as tablets and smartphones.

Graph Mode

MindSpore static graph mode, in which the neural network model is compiled into an entire graph and then delivered for execution.This mode introduces graph optimizations in the process to improve runtime performance and facilitate large-scale deployment and cross-platform execution.

H

HCCL

Huawei collective communication library

HCCL implements high-performance collective communications between servers in training scenarios of deep learning.

HCCS

High confidence computing systems

HCCS provides high-performance inter-processor (inter-device) data communication capabilities in multi-device scenarios.

HPO

Hyperparameter optimization

HPO uses automated algorithms to optimize hyperparameters, such as the learning rate, activation function, and optimizer, that cannot be optimized through training in the original machine learning/deep learning algorithms.

HWTS

Hardware task scheduler

HWTS provides hardware scheduling for AI Core tasks and reduces scheduling latency.

I

IR

Intermediate representation

An IR is the data structure or code used internally by a compiler or virtual machine to represent source code. It is designed to be conducive for further processing, such as optimization and translation.

J

JDK

Java software development kit

A collection of Java-based software development tools.

K-O

Table 3

Term/Acronym/Abbreviation

Description

K

KullbackLeiblerDivergence

Kullback-Leibler divergence

An accuracy comparison algorithm. The result ranges from 0 to infinity. The smaller the Kullback-Leibler divergence, the closer the approximate distribution is to the true distribution.

L

L2 Cache

Second level cache

L2 Cache refers to the shared second-level cache, which is called before the memory access.

LLC

Last level cache

LLC refers to the shared highest-level cache, which is called before the memory access.

M

MaxAbsoluteError

Maximum absolute error

An accuracy comparison algorithm. The result ranges from 0 to infinity. A value closer to 0 indicates a higher similarity.

MaxRelativeError

Maximum relative error

An accuracy comparison algorithm. The result ranges from 0 to infinity. A value closer to 0 indicates a higher similarity.

MeanAbsoluteError

Mean absolute error

An accuracy comparison algorithm. The result ranges from 0 to infinity.

  • Values of MeanAbsoluteError and RootMeanSquareError that are closer to 0 indicate that the measured value is more accurate and closer to the actual value.
  • If the value of MeanAbsoluteError is close to 0, a larger value of RootMeanSquareError indicates that some values are excessively large.
  • A larger MeanAbsoluteError value and a RootMeanSquareError value that is equal to or close to the MeanAbsoluteError value suggest that the overall deviation is more centralized.
  • A larger MeanAbsoluteError and a RootMeanSquareError value greater than that of MeanAbsoluteError indicate the presence of overall deviation and a scattered distribution of the deviation.
  • Other situations do not exist because "RMSE ≥ MAE" is always true.

MeanRelativeError

Mean relative error

An accuracy comparison algorithm. The result ranges from 0 to infinity. A value closer to 0 indicates a higher similarity.

msproftx

msprof tool extension, an extension of the MindStudio system tuning tool.

MTE1

Memory transfer engine 1

MTE1 copies the memory from an L1 buffer.

MTE2

Memory transfer engine 2

MTE2 copies the memory from a DDR or an L2 buffer.

MTE3

Memory transfer engine 3

MTE3 copies memory from the UB.

N

NAS

Neural architecture search

A technology that automates the design of high-performance neural network structures based on sample sets by using certain algorithms. It effectively reduces the use and implementation costs of neural networks.

NIC

Network interface controller

NIC is also known as network interface card, network adapter, LAN adapter, or other similar terms. It refers to a hardware component that connects a computer to a computer network.

NPU

Neural-network processing unit

NPU uses the data-driven parallel computing architecture and is capable of efficiently processing massive video and image multimedia data. It is dedicated to processing a large number of computing tasks in AI applications.

Network-wide comparison

A tensor comparison method in Model Accuracy Analyzer.

It compares the accuracy of all operators involved in computing in a network model.

O

OP

Operator

An operator implements an operation, such as ReLU, Conv, Pooling, Scale, or Softmax.

OPP

Operator package

OS

Operating system

P-T

Table 4

Term/Acronym/Abbreviation

Description

P

PCIe

Peripheral component interconnect express

PCIe provides high-speed serial point-to-point dual-channel high-bandwidth transmission. The connected devices are allocated with exclusive channel bandwidths and do not share the bus bandwidth. PCIe supports active power management, error reporting, end-to-end reliable transmission, hot swap, and quality of service (QoS).

PctRlt

Percent result: actual percentage.

PctThd

Percent threshold: percentage threshold.

PyNative mode

MindSpore dynamic graph mode. In this mode, operators on a neural network are delivered and executed one by one, making it easy to write and debug neural network models.

PTQ

Post-training quantization (PTQ) quantizes a pre-trained floating-point model and uses some training data to calibrate the model. PTQ requires the data-free and label-free algorithms, both of which can be performed on the Ascend inference platform and support the PTQ quantization scenarios with or without calibration datasets. The two algorithms convert a floating-point model into a fixed-point INT8 model, so as to compress the model, reduce the computing workload, and shorten the inference delay.

Data-free quantization: Models can be quantized without obtaining input datasets. It performs effective quantization in data-free scenarios by iteratively flipping and optimizing weights at multiple scales.

Label-free quantization: This algorithm requires users to provide a small number of datasets for calibration. Compared with data-free quantization, the input data of label-free quantization is consistent with the original data distribution, ensuring higher quantization accuracy.

Q

QAT

Quantization Aware Training (QAT) refers to a process in which a fake quantizer is inserted into a model to simulate rounding and clamping operations performed by the quantization model during inference. QAT improves adaptability of a model to quantization effects in a training process and provides higher quantization accuracy. In this process, all calculations (including model forward and backward propagation and fake-quantization node calculations) are performed through floating-point calculations. And a real INT8 model is obtained only after the training is complete.

R

RateDiff

Rate difference

RelativeEuclideanDistance

Euclidean relative distance

An accuracy comparison algorithm. The result ranges from 0 to infinity. A value closer to 0 indicates a higher similarity.

RoCE

RDMA over converged Ethernet

A network protocol deployed on the Ethernet that provides the remote memory management capability, allowing applications on different servers to directly move data between their memories without the intervention of the CPU. RoCE is a mechanism that provides communication interface bandwidth data.

RootMeanSquareError

Root mean square error

An accuracy comparison algorithm. The result ranges from 0 to infinity.

  • Values of MeanAbsoluteError and RootMeanSquareError that are closer to 0 indicate that the measured value is more accurate and closer to the actual value.
  • If the value of MeanAbsoluteError is close to 0, a larger value of RootMeanSquareError indicates that some values are excessively large.
  • A larger MeanAbsoluteError value and a RootMeanSquareError value that is equal to or close to the MeanAbsoluteError value suggest that the overall deviation is more centralized.
  • A larger MeanAbsoluteError and a RootMeanSquareError value greater than that of MeanAbsoluteError indicate the presence of overall deviation and a scattered distribution of the deviation.
  • Other situations do not exist because "RMSE ≥ MAE" is always true.

RUNTIME

Runtime runs in the application process space and provides applications with functions (specific to Ascend AI Processors) for managing memory, devices, streams, and events, and executing kernels.

RDMA

Remote direct memory access

A function that enables a computer to directly transmit data to the memory of another computer over a network.

S

Sample-based

Profiling samples profile data at fixed AI Core-sampling intervals.

SDK

Software development kit

A set of software development tools that allow the creation of applications for a certain software package, software framework, hardware platform, operating system, or similar development platform.

StandardDeviation

Standard deviation

An accuracy comparison algorithm. The result ranges from 0 to infinity. The smaller the standard deviation is, the smaller the dispersion is, and the closer the value is to the average value. The mean value and standard deviation of the dump data are displayed in the format of (mean value;standard deviation). The first set of data is the result of My Output, and the second set is the result of Ground Truth.

Step Trace

Iteration trace

The traces include the start time and end time of the forward propagation and backpropagation in each iteration, gradient update, and data augmentation bound duration.

ST

System test

A testing process conducted on the entire product system to verify whether the system meets the defined requirement specifications.

SDMA

System direct memory access

A peripheral that enables direct transmission of data between memory and other peripherals, without relying on the system processor. This results in improved data transmission efficiency and frees up CPU resources. An SDMA task is executed on only one SDMA channel. HWTS manages eight SDMA channels.

Single-operator comparison

A tensor comparison method in Model Accuracy Analyzer.

It compares the accuracy of one or more operators involved in computing in a network model.

T

Task-based

Profiling samples the profile data of AI Core based on tasks.

TBE

Tensor boost engine

TBE provides APIs for implementing operators using the Python language to build and generate CCE operators.

Tensor

Tensor

A major data structure in TensorFlow. A tensor is N-dimensional (where N may be very large). It often takes the form of a scalar, vector, or matrix. The elements of a tensor can include integer values, floating point values, or string values.

Tensor comparison

Tensor comparison

Tensor comparison compares data of two tensors using different algorithm evaluation indicators, and supports network-wide comparison and single-operator comparison.

TIK

Tensor iterator kernel

One of the operator development modes. You can use the TIK API calls to write a custom operator in Python. Then, the TIK compiler compiles the operator into a binary file of an application that adapts to Ascend AI Processors.

TransData

A format conversion operator.

TS

Task scheduler

TS is used to schedule different kernels to AI CPU or AI Core for execution.

U-Z

Table 5

Term/Acronym/Abbreviation

Description

U

UT

Unit test

A fundamental testing activity in software development, where individual software units are tested in isolation from other program components.

V

VECTOR

Vector operation