Create YOLOv3 using PyTorch from scratch (Part-3)

In this post we load pre-trained weights for the YOLOv3 model and run some test inferences.
1. Overview

This is Part-3 of the series on building a YOLOv3 model from scratch.

Here is an overview of the series:

  1. Understand the YOLO model.
  2. Build the model backbone.
  3. Load pre-trained weights: this post.

    To verify the Darknet-53 model we built works as intended, we could load the pre-trained YOLOv3 weights and perform some inferences on some images.

  4. Get the tools ready.
  5. Training data preparation.
  6. Train the model.

2. Download the pre-trained weights

The pre-trained weights of YOLOv3 can be downloaded from this link. More instructions can be found from the official site of yolo. Once it is downloaded, put the weight file yolov3.weights somewhere inside our YOLOv3_pytorch project folder.

3. Write the weight-loading method

We are going to equip our Darknet53 model with a load_weights() method that reads and loads the downloaded weights into the model layers. So if you haven’t built the Darknet53 model, please go to Part-2 of the series and get the model ready.

Below is the load_weights() method. Put it inside our Darknet53 class:

def load_weights(self, weight_file):
    '''Load pretrained weights'''

    def getSlice(w, cur, length):
        return torch.from_numpy(w[cur:cur+length]), cur+length

    def loadW(data, target):
        data = data.view_as(target)
        with torch.no_grad():

    with open(weight_file, 'rb') as fin:
        # the 1st 5 values are header info
        # 1. major version number
        # 2. minor version number
        # 3. subversion number
        # 4,5. images seen by the network during training

        self.header_info = np.fromfile(fin, dtype=np.int32, count=5)
        self.seen = self.header_info[3]

        weights = np.fromfile(fin, dtype=np.float32)

    ptr = 0
    for layer in self.layers.values():
        if not isinstance(layer, ConvBNReLU):

        conv = layer.layers[0]

            bn = layer.layers[1]
            # get the number of weights of bn layer
            num = bn.bias.numel()
            # load the weights
            bn_bias, ptr = getSlice(weights, ptr, num)
            bn_weight, ptr = getSlice(weights, ptr, num)
            bn_running_mean, ptr = getSlice(weights, ptr, num)
            bn_running_var, ptr = getSlice(weights, ptr, num)

            # cast the loaded weights into dims of module weights
            loadW(bn_bias, bn.bias)
            loadW(bn_weight, bn.weight)
            loadW(bn_running_mean, bn.running_mean)
            loadW(bn_running_var, bn.running_var)
            # number of conv biases
            num = conv.bias.numel()
            # load the weights
            conv_bias, ptr = getSlice(weights, ptr, num)
            loadW(conv_bias, conv.bias)

        # conv weights
        num = conv.weight.numel()
        conv_weight, ptr = getSlice(weights, ptr, num)
        loadW(conv_weight, conv.weight)

    assert len(weights) == ptr, 'Not all weight values loaded.'


Some more explanations.

The pre-trained weights are saved in binary format, so we open it in binary-reading (rb) mode:

with open(weight_file, 'rb') as fin:

The numpy.fromfile() function is used to read from the opened file object.

The 1st 5 numbers are header information. Starting from the 6th number are the model weights. We read them all into a weights array:

self.header_info = np.fromfile(fin, dtype=np.int32, count=5)
weights = np.fromfile(fin, dtype=np.float32)

It is important to keep track of how many numbers we read from this big array. The exact number of weights needs to be read and fed into the correct places of the model layers, such that the trained weights can function as they were trained to do.

To help getting slices of numbers from the array, we create a getSlice() helper function that cuts a slice starting from a pointed location cur, with length length. The function then shifts the pointer cur by length so that it points to the next number to be read:

def getSlice(w, cur, length):
    return torch.from_numpy(w[cur:cur+length]), cur+length

Then we initialize the pointer ptr to point to the beginning of the array weights, and enter into an iteration through the model layers:

ptr = 0
for layer in self.layers.values():
    if not isinstance(layer, ConvBNReLU):
    conv = layer.layers[0]

Only convolutional layers have trainable weights, so we skip all other types of layers.

Recall that if the convolutional layer is followed by a batch normalization, then the Conv2d module has no bias terms.

So we query the layer’s .bn attribute to see if it is case. If so, we call bn.bias.numel() to get the number weights in the BatchNorm2d module, slice out the weight numbers, and call a loadW() helper function to feed the weights into the module:

    bn = layer.layers[1]
    # get the number of weights of bn layer
    num = bn.bias.numel()
    # load the weights
    bn_bias, ptr = getSlice(weights, ptr, num)
    bn_weight, ptr = getSlice(weights, ptr, num)
    bn_running_mean, ptr = getSlice(weights, ptr, num)
    bn_running_var, ptr = getSlice(weights, ptr, num)

    # cast the loaded weights into dims of module weights
    loadW(bn_bias, bn.bias)
    loadW(bn_weight, bn.weight)
    loadW(bn_running_mean, bn.running_mean)
    loadW(bn_running_var, bn.running_var)

If the convolutional layer has no batch normalization, then load a bias term:

    # number of conv biases
    num = conv.bias.numel()
    # load the weights
    conv_bias, ptr = getSlice(weights, ptr, num)
    loadW(conv_bias, conv.bias)

Lastly, we slice out the weights for the convolutional kernel and feed that into the Conv2d module:

# conv weights
num = conv.weight.numel()
conv_weight, ptr = getSlice(weights, ptr, num)
loadW(conv_weight, conv.weight)

Once the iteration through the network layers is finished, we should have a properly functioning YOLOv3. Let’s test that out.

4. Do some inferences using pre-trained weights

To test out whether the loaded weights function as expected, let’s get some images and run the model on them.

Below gives 4 sample images. Download them and save into the data sub-folder of the YOLOv3_pytorch project folder.


Figure 1: Sample image.


Figure 2: Sample image.


Figure 3: Sample image.


Figure 4: Sample image.

Then, download this coco.names_.txt file and save into the data sub-folder as well. This is a list of 80 class names in the COCO detection dataset.

Now create a script in the YOLOv3_pytorch folder, with these contents:

from __future__ import print_function
import os
import torch
from PIL import Image
from torchvision import transforms

from config import load_config
from model import Darknet53
from utils import read_coco_names, draw_predictions

if __name__=='__main__':

    #--------------------Load model config--------------------
    CONFIG_FILE = './config/yolov3.cfg'
    net_config, module_list = load_config.parse_config(CONFIG_FILE)
    config = {'net': net_config}
    config['module_list'] = module_list
    config['width'] = 416
    config['height'] = 416
    config['n_classes'] = 80

    #-------------------Create model-------------------
    model = Darknet53(config)

    #-------------------Load weights-------------------
    weight_path = './yolov3.weights'

    #----------------Turn on eval model----------------

    #------------Transform image to tensor------------
    trans = transforms.Compose([
        transforms.Resize([config['width'], config['height']]),

    #--------------------Load data classes--------------------
    coco_name_file = './data/coco.names_.txt'
    class2id, id2class = read_coco_names(coco_name_file)

    #--------------Load some test images--------------
    data_folder = './data'
    img_files = os.listdir(data_folder)

    for fii in img_files:
        if os.path.splitext(fii)[1] != '.jpg':

        img_fileii = os.path.join(data_folder, fii)
        print('\nReading image file:', img_fileii)
        imgii =

        # transform image to tensor
        img_tensor = trans(imgii).unsqueeze(0)

        # make prediction
        with torch.no_grad():
            y = model(img_tensor)

        print('y.shape:', y.shape)
        y = y.detach().cpu().numpy().squeeze()
        print('y.shape:', y.shape)

        # filter by confidence
        y_conf = y[:, 4]
        idx = y_conf >= 0.96
        y_filtered = y[idx]

        if len(y_filtered) > 0:
            # draw predictions
            fig, ax = draw_predictions(imgii, model.width, model.height, y_filtered, id2class)

The code is fairly self-explanatory. Just note that we are filtering the predictions by selecting those with confidence scores >= 0.96. This is only a temporary solution. Typically we will need to follow it up by a Non-maximum suppression. We will cover that in a later post.

To make it work, we also need to 2 utility functions:

  • read_coco_names(): read the class names in the coco.names_.txt file, and create 2 dict s: one for mapping the class names to integer ids, and the other does the opposite.
  • draw_predictions(): draw the image and bounding boxes of detections on top of it.

I put these 2 functions in a file in the YOLOv3_pytorch folder. Here is the read_coco_names() function:

def read_coco_names(file_path):

    with open(file_path, 'r') as fin:
        names = fin.readlines()

    class2id = dict([(xx.strip(), ii) for (xx, ii) in zip(names, range(len(names)))])
    id2class = dict([(vv,kk) for (kk,vv) in class2id.items()])

    return class2id, id2class

Fairly straightforward. Now the draw_predictions() function:

import numpy as np
import matplotlib.pyplot as plt
from matplotlib import patches

def draw_predictions(img, input_w, input_h, pred, id2class):
    '''Draw object detection predictions

        img (PIL.Image): image to detect objects from.
        input_w (int): width of image as input to model.
        input_h (int): height of image as input to model.
        pred (ndarray): n x m ndarray, n is number of selected detections.
            m = 5 + n_classes.
        id2class (dict): dict containing id-to-class name key-value pairs.

    figure = plt.figure(figsize=(12,10), dpi=100)
    ax = figure.add_subplot(1,1,1)
    img_w, img_h = img.size

    for pii in pred:

        boxii = pii[:4]  # x, y, w, h
        # scale to original image size
        boxii[[0, 2]] *= img_w / input_w
        boxii[[1, 3]] *= img_h / input_h
        # get top-left corner
        xc, yc, w, h = boxii
        x1 = xc - w/2
        y1 = yc - h/2
        # create bbox
        recii = patches.Rectangle((x1, y1), w, h, facecolor='none', edgecolor='w')
        # prepare label
        confii = pii[4]
        clsii = np.argmax(pii[5:])
        labelii = '%s %.2f' %(id2class[clsii], confii)
        ax.text(x1, y1, labelii, ha='left', va='bottom', bbox={'color': 'c', 'alpha': 0.6})

    return figure, ax

I’m using matplotlib to do the drawings. Feel free to use opencv if you like.

5. Sample results

Below are the detection results corresponding to the sample images shown above:


Figure 5: Sample image detection result.


Figure 6: Sample image detection result.


Figure 7: Sample image detection result.


Figure 8: Sample image detection result.

The results are not bad. The model correctly detected with high confidence the objects and correctly classified them.

But there are many overlapping detections. This could be solved by running a Non-maximum suppression filtering. We will get to that in the next post. So stay tuned.

