Modifying SSD Prototxt

Do not directly copy the code samples in this section to your network model. Adjust the parameters to suit your use case. For example, the bottom and top parameters must match those in the corresponding network model, and the sequence of the bottom and top parameters is fixed.

If your network is SSD, add a postprocessing layer SSDDetectionOutput to the end of the original .prototxt file by referring to List of Custom Operators.

For details, see the caffe.proto file (${INSTALL_DIR}/include/proto). Add the declaration of the custom layer to the LayerParameter message. (The following custom layer has been declared in caffe.proto and you do not need to add it again.)

message LayerParameter {
...
  optional SSDDetectionOutputParameter ssddetectionoutput_param = 232;
...
}

According to the caffe.proto file, the operator type and attributes are defined as follows:

message SSDDetectionOutputParameter {
    optional int32 num_classes= 1 [default = 2];
    optional bool share_location = 2 [default = true];
    optional int32 background_label_id = 3 [default = 0];
    optional float iou_threshold = 4 [default = 0.45];
    optional int32 top_k = 5 [default = 400];
    optional float eta = 6 [default = 1.0];
    optional bool variance_encoded_in_target = 7 [default = false];
    optional int32 code_type = 8 [default = 2];
    optional int32 keep_top_k = 9 [default = 200];
    optional float confidence_threshold = 10 [default = 0.01];
}

The SSDDetectionOutput operator has three inputs and two outputs as described in Supported Caffe Operators. An example of the constructed operator code is as follows.

layer {
  name: "detection_out"
  type: "SSDDetectionOutput"
  bottom: "bbox_delta"
  bottom: "score"
  bottom: "anchors"
  top: "out_boxnum"
  top: "y"
  ssddetectionoutput_param {
    num_classes: 2
    share_location: true
    background_label_id: 0
    iou_threshold: 0.45
    top_k: 400
    eta: 1.0
    variance_encoded_in_target: false
    code_type: 2
    keep_top_k: 200
    confidence_threshold: 0.01
  }
}
  • In the bottom input, bbox_delta corresponds to mbox_loc in the original Caffe network, score corresponds to mbox_conf_flatten in the original Caffe network, and anchors corresponds to mbox_priorbox in the original Caffe network. The value of num_classes must be the same as that in the original network.
  • In the scenario where the top output has a batch size greater than 1:
    • The output shape of out_boxnum is (batchnum, 8). The first element of batchnum is the number of actual boxes.
    • The output shape of y is (batchnum, len, 8), where len is the value of keep_top_k aligned to 128. For example, if batch = 2 and keep_top_k = 200, the output shape is (2, 256, 8) and the first 256 x 8 data elements are the result of the first batch.

For details about the parameters, see Supported Caffe Operators.