DarkHelp  v1.9.2-1
C++ API for the neural network framework Darknet
Looking for a C++ dev who knows OpenCV?
I'm looking for work. Hire me!
Image Tiling

There are 3 "sizes" which are important when working with Darknet:

  • The dimensions of the neural network. This is the width=... and height=... in the [net] section at the top of the config file.
  • The size of the images, which may or may not match the network dimensions. For example, 1024x768 or 1920x1080.
  • The size of the objects within the images. This becomes especially important when attempting to detect small objects.

If your image size matches approximately your network dimensions, then you don't need to do anything else. Darknet and the neural network you trained should have no problems, and you can skip the rest of this page.

If your image size is much bigger (1.5x or more) than your network size – especially if you are looking for small objects! – then image tiling can help.

The neural network is given a size when it is first trained. Both the width and the height must be divisible by 32; this is a Darknet limitation. Example sizes might be 640x480, or 608x448. The YOLOv3-tiny.cfg and YOLOv4-tiny.cfg files for example default to 416x416, so many people getting started with Darknet begin with that size.

Every image that is fed into Darknet will be resized to the network dimensions. Both for training purposes, and then later during object detection when you apply the neural network against images.

For the sake of this example, lets say we use a network size of 416x320, which gives an aspect ratio of 1.3, very close to the 1.3333 you'd get from a typical 4:3 image captured by a phone or webcam.

That means an image measuring 1024x768 will be resized by Darknet to measure 416x320. Here is what that looks like:

The neural network in this example detects the numbers and the locks on the mailboxes. The problem is once Darknet has scaled the image down, it fails to detect anything. The individual objects in the image are now too small to detect due to the new image size.

Instead of resizing the 1024x768 image down to 416x320, the image can be broken up into smaller "tiles". When the option has been enabled (via DarkHelp::Config::enable_tiles) DarkHelp then automatically feeds each individual tile through Darknet, and the results are combined as if a single image had been processed.

See also
DarkHelp::Config::enable_tiles
DarkHelp::Config::combine_tile_predictions
DarkHelp::Config::only_combine_similar_predictions
DarkHelp::Config::tile_edge_factor
DarkHelp::Config::tile_rect_factor

The size of the tiles is determined automatically by DarkHelp according to the network dimensions. In this example with a 1024x768 image and a neural network measuring 416x320 results in 2 horizontal tiles and 2 vertical tiles, like this:

The tiles are then automatically processed by DarkHelp and individually fed to Darknet:

The final result vector contains all the objects detected across all tiles for the original image:

> DarkHelp --tiles on --autohide off --duration off --threshold 0.2 mailboxes{.names,.cfg,_best.weights} DSCN0582.jpg
...
-> neural network dimensions: 416x320
-> looking for image and video files
-> found 1 file
#1/1: loading "DSCN0582.jpg"
-> prediction took 12 milliseconds across 4 tiles (2x2) each measuring 512x384
-> prediction results: 87
-> 1/87: "lock 96%" #0 prob=0.962067 x=312 y=257 w=21 h=22 entries=1
-> 2/87: "1 96%" #1 prob=0.963509 x=143 y=254 w=33 h=29 entries=1
-> 3/87: "9 98%" #9 prob=0.978319 x=342 y=248 w=38 h=31 entries=1
-> 4/87: "lock 98%" #0 prob=0.980014 x=107 y=260 w=24 h=23 entries=1
-> 5/87: "2 35%" #2 prob=0.349146 x=632 y=292 w=33 h=8 entries=1
-> 6/87: "1 96%" #1 prob=0.959376 x=631 y=242 w=36 h=30 entries=1
...

Running the 1024x768 sample image through the 416x320 network without the use of tiles resulted in zero objects found.

Using the same image but with tiling enabled provides much better results:

DarkHelp
The DarkHelp namespace contains (almost) everything in the DarkHelp library.
Definition: DarkHelp.hpp:51