printf
Description
Prints the value of a Scalar, an Expr, a ScalarArray, or a Tensor. This API is supported even in the functional debugging environment.
Prototype
def printf(format_string, *arg)
Parameters
Parameter |
Input/Output |
Description |
|---|---|---|
format_string |
Input |
String to print, including characters and format characters. Must be an immediate of type string. This parameter cannot be left empty. The string can contain a maximum of 128 characters. If set_printf_params has been called before the printf call, the maximum string length is single_print_length – 32 Example: "scalar is: %d\n" where %d are format characters. Syntax of format characters: %[flag][width][.precision]type
|
*arg |
Input |
Values of up to six arguments. Supported data types:
|
Flag |
Description |
|---|---|
'#' |
Add 0o for %o. Add 0x for %x. Has no effect on characters in other formats. |
'0' |
Used in pair with the width field. Padded with zeros when the actual number of output characters is less than the width field. For example, the print result in the following example is 010: s1 = tik.scalar("int32", init_value=10);
printf ("%03d", s1)
Padded with spaces. When the actual output is greater than size, the actual output is retained. |
Format Character |
Description |
|---|---|
%d |
Signed decimal integer |
%o |
Signed octal number. Float is unsupported. |
%x |
Signed hexadecimal number. Float is unsupported. |
%f |
Decimal float |
%s |
String |
%c |
Single character in the range of [0, 255]. Float is unsupported. |
%% |
Character % |
Applicability
Returns
None
Restrictions
- The total length of each printf statement must not exceed the value of single_print_length set in the set_printf_params API. The default value is 512 bytes and can be modified by using the set_printf_prams API. If the length of a statement exceeds this value, the statement is truncated. The total print length is calculated as follows:
- Total print length = Print header length + Format string length + Data length
- Print header length: fixed at 32 bytes.
- format_length: length of format_string. Must be 32-byte aligned.
- Data length = Header length of the first data + Length of the first data + Header length of the second data + Length of the second data + ... + Header length of the nth data + Length of the nth data
- Data header length: fixed at 32 bytes.
- Data length: 32-byte aligned.
- Each printf statement will use some extra Unified Buffer space, which is calculated as follows:
- Tensor with scope set to the L1 Buffer, Unified Buffer, or Global Memory, and Scalar: 256 bytes + 32 bytes
- Tensor with scope set to the L1OUT Buffer: 256 bytes + 256 * Unit data type size
- If extra synchronization is inserted to each printf statement, the statement execution order could be changed.
- If extra operations are inserted to each printf statement, performance deterioration could occur and execution time could increase.
- printf supports functional debugging.
- If the length of the printed data cannot be evaluated at the frontend, warning: discard xxx args is displayed during parsing. xxx indicates the volume of discarded data.
- The printf workspace size is evenly allocated to the blocks. The printf workspace size defaults to 128 MB and can be modified by using the set_printf_prams call. When the printf workspace is full, old printf content is overwritten. The overwritten space size can be obtained through the printed result of warning: discard xxx args. Increase the printf workspace size by using the set_printf_prams call as needed.
- Each printf statement generates an IR. If build fails because the IR limit is exceeded, reduce the number of printf statements.
- Using printf statements prolongs the operator build time.
- When the operator file contains a for multi-block loop, printf is not allowed inside this loop.
- Note the following when calling printf in the functional debugging environment:
- When single_print_length is exceeded, data can still be printed.
- When the printf workspace exceeds print_workspace_size, old printf content is not overwritten.
- If a tensor with offset is to be printed, the following alignment requirements must be met:
- If the Tensor scope is the Unified Buffer or Global Memory, the offset must be the multiple of the size (in byte) of the data type.
- If the Tensor scope is the L1 Buffer, the offset must be 32-byte aligned.
- If the Tensor (in bytes) scope is the L1OUT Buffer, the offset must be the multiple of (256 * Size of the data type).
Example
1. Print the value of a Scalar.
from tbe import tik
tik_instance = tik.Tik()
scalar = tik_instance.Scalar("int8", init_value=-128)
// The required space is as follows: 32 bytes header + 32 bytes string + 32 bytes <TL of each parameter> * 1 + 32 bytes <32-byte aligned>
tik_instance.printf("scalar is: %d\n", scalar)
tik_instance.BuildCCE(inputs=[], outputs=[], kernel_name="print_scalar")
tik_instance.tikdb.start_debug({})
Input
[]
Output
scalar is: -128
2. Print the value of a Unified Buffer segment.
tik_instance = tik.Tik()
ub_tensor = tik_instance.Tensor("int16", [16, 16], tik.scope_ubuf, 'ub_tensor')
scalar = tik_instance.Scalar(dtype="int16",init_value=6)
ub_tensor[15].set_as(scalar)
tik_instance.printf("ubtensor[15:16] is: %d\n", ub_tensor[15:16])
tik_instance.BuildCCE(inputs=[], outputs=[], kernel_name="printf_ub")
tik_instance.tikdb.start_debug({})
Input
[]
Output
ubtensor[15:16] is: 6
3. Print the value of a Global Memory segment.
tik_instance = tik.Tik()
gm_tensor = tik_instance.Tensor("float16", [256, ], tik.scope_gm, "gm_tensor")
tik_instance.printf("%f\n", gm_tensor[0:32])
tik_instance.BuildCCE(inputs=[gm_tensor], outputs=[], kernel_name="printf_gm")
data = np.random.uniform(-65504, 65504, [256, ]).astype(np.float16)
tik_instance.tikdb.start_debug(feed_dict={'gm_tensor': data})
Input
[-13288 46208 52448 -6488 -3364 55264 49376 3840 ...]
Output
-13288.000000 46208.000000 52448.000000 -6488.000000 -3364.000000 55264.000000 49376.000000 3840.000000 ...