There are 3 "sizes" which are important when working with Darknet:
width=
... and height=
... in the
[net] section at the top of the config file. 1024x768
or 1920x1080
. If your image size matches approximately your network dimensions, then you don't need to do anything else. Darknet and the neural network you trained should have no problems, and you can skip the rest of this page.
If your image size is much bigger (1.5x or more) than your network size – especially if you are looking for small objects! – then image tiling can help.
The neural network is given a size when it is first trained. Both the width and the height must be divisible by 32; this is a Darknet limitation. Example sizes might be 640x480
, or 608x448
. The YOLOv3-tiny.cfg
and YOLOv4-tiny.cfg
files for example default to 416x416
, so many people getting started with Darknet begin with that size.
Every image that is fed into Darknet will be resized to the network dimensions. Both for training purposes, and then later during object detection when you apply the neural network against images.
For the sake of this example, lets say we use a network size of 416x320
, which gives an aspect ratio of 1.3, very close to the 1.3333 you'd get from a typical 4:3 image captured by a phone or webcam.
That means an image measuring 1024x768
will be resized by Darknet to measure 416x320
. Here is what that looks like:
The neural network in this example detects the numbers and the locks on the mailboxes. The problem is once Darknet has scaled the image down, it fails to detect anything. The individual objects in the image are now too small to detect due to the new image size.
Instead of resizing the 1024x768
image down to 416x320
, the image can be broken up into smaller "tiles". When the option has been enabled (via DarkHelp::Config::enable_tiles) DarkHelp then automatically feeds each individual tile through Darknet, and the results are combined as if a single image had been processed.
The size of the tiles is determined automatically by DarkHelp according to the network dimensions. In this example with a 1024x768
image and a neural network measuring 416x320
results in 2 horizontal tiles and 2 vertical tiles, like this:
The tiles are then automatically processed by DarkHelp and individually fed to Darknet:
The final result vector contains all the objects detected across all tiles for the original image:
Running the 1024x768
sample image through the 416x320
network without the use of tiles resulted in zero objects found.
Using the same image but with tiling enabled provides much better results: