How many FPS can you expect to process when working with Darknet, YOLO, and OpenCV?
Device | FPS | Method |
---|---|---|
Beaglebone Green | 0.005 FPS | Darknet (CPU) |
Raspberry Pi 4 | 2.4 FPS | OpenCV DNN (CPU) |
Jetson Nano | 16.9 FPS | OpenCV DNN + CUDA |
Jeson Xavier NX | 46.4 FPS | OpenCV DNN + CUDA |
Jetson AGX | 71.5 FPS | Darknet + CUDA |
NVIDIA RTX 2070 | 209.7 FPS | OpenCV DNN + CUDA |
NVIDIA RTX 3090 | 413.8 FPS | Darknet + CUDA (see table below for details) |
The original soccer video is 1920x1080 @ 29.97 FPS. It has 1080 frames for a total length of 36 seconds.
For these tests, the video was pre-processed to stretch each frame to exactly 416x416 @ 29.97 FPS. This way the timing test measures the length of time it takes to apply the neural network (and save the results), not the amount of time spent stretching each video frame.
A total of 140 video frames were annotated in DarkMark to create the YOLOv3-tiny 416x416 neural network. I saved a screenshot of the training options.
If you want to run some additional timing tests and compare the results, the files you'll need are:
All tests were run using DarkHelp version 1.3.10-1, Darknet hash #aef928c from 2021-09-28, and either OpenCV v4.1.1 or OpenCV v4.5.3.
The command used to run the timing tests is one of the following, depending on the test:
Here is an example of what that looks like:
For most devices, there are 4 test scenarios:
The older version of OpenCV v4.1.1 that comes on the NVIDIA Jetson images is not compiled with CUDA support. For this reason, the Jetson devices were manually upgraded to OpenCV v4.5.3. (See these scripts to get started.)
In addition, the Jetson devices were tested with Jetson Clocks both disabled and enabled to determine the difference this makes.
The highest FPS values for each device is highlighted to make it easy to find in the table.
Device | Jetson Clocks | OS | OpenCV | CPU | Darknet (CPU) | Darknet (CUDA) | OpenCV DNN (CPU) | OpenCV DNN (CUDA) | |||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Time (ms) | FPS | Time (ms) | FPS | Time (ms) | FPS | Time (ms) | FPS | ||||||
Beaglebone Green | n/a | Debian 9 Stretch | 3.2.0 | ARMv7 x 1 @ 1 GHz | 62 hours | 0.005 FPS | CUDA not available | DNN not available | DNN not available | ||||
Raspberry Pi 4 | n/a | Ubuntu 20.04.3 | 4.2.0 | ARM x 4 @ ? GHz | 38 minutes | 0.5 FPS | CUDA not available | 449929 | 2.4 FPS | CUDA not available | |||
Jetson Nano | disabled | Ubuntu 18.04.6 | 4.1.1 | ARMv8 x 4 @ ? GHz | ? | ? | 81655 | 13.2 FPS | ? | ? | CUDA not available | ||
enabled | Ubuntu 18.04.6 | 4.1.1 | ARMv8 x 4 @ 1.5 GHz | ? | ? | 72873 | 14.8 FPS | ? | ? | CUDA not available | |||
disabled | Ubuntu 18.04.6 | 4.5.3 | ARMv8 x 4 @ ? GHz | 1427781 | 0.8 FPS | 80867 | 13.4 FPS | 369439 | 2.9 FPS | 69293 | 15.6 FPS | ||
enabled | Ubuntu 18.04.6 | 4.5.3 | ARMv8 x 4 @ 1.5 GHz | 1448414 | 0.7 FPS | 73158 | 14.8 FPS | 367345 | 2.9 FPS | 64016 | 16.9 FPS | ||
Jetson Xavier NX | disabled | Ubuntu 18.04.6 | 4.1.1 | ARMv8 x 4 @ 1.9 GHz | 379265 | 2.8 FPS | 30903 | 34.9 FPS | 156581 | 6.9 FPS | CUDA not available | ||
enabled | Ubuntu 18.04.6 | 4.1.1 | ARMv8 x 6 @ 1.4 GHz | 360043 | 3.0 FPS | 24440 | 44.2 FPS | 155879 | 6.9 FPS | CUDA not available | |||
disabled | Ubuntu 18.04.6 | 4.5.3 | ARMv8 x 4 @ 1.9 GHz | 353190 | 3.1 FPS | 30215 | 35.7 FPS | 150356 | 7.2 FPS | 29881 | 36.1 FPS | ||
enabled | Ubuntu 18.04.6 | 4.5.3 | ARMv8 x 6 @ 1.4 GHz | 346793 | 3.1 FPS | 24528 | 44.0 FPS | 147004 | 7.3 FPS | 23272 | 46.4 FPS | ||
Jetson AGX | disabled | Ubuntu 18.04.6 | 4.1.1 | ARMv8 x 8 @ ? GHz | 182048 | 5.9 FPS | 22204 | 48.6 FPS | 72043 | 15.0 FPS | CUDA not available | ||
enabled | Ubuntu 18.04.6 | 4.1.1 | ARMv8 x 8 @ 2.3 GHz | 181224 | 6.0 FPS | 15151 | 71.3 FPS | 71508 | 15.1 FPS | CUDA not available | |||
disabled | Ubuntu 18.04.6 | 4.5.3 | ARMv8 x 8 @ ? GHz | 181211 | 6.0 FPS | 22223 | 48.6 FPS | 70424 | 15.3 FPS | 20972 | 51.5 FPS | ||
enabled | Ubuntu 18.04.6 | 4.5.3 | ARMv8 x 8 @ 2.3 GHz | 177985 | 6.1 FPS | 15108 | 71.5 FPS | 68709 | 15.7 FPS | 16883 | 64.0 FPS | ||
Virtualbox VM | n/a | Ubuntu 20.04.3 | 4.2.0 | Intel i7 x 16 @ 3.2 GHz | 156626 | 6.9 FPS | CUDA not available | 27383 | 39.4 FPS | CUDA not available | |||
RTX 2070 | n/a | Ubuntu 18.04.6 | 4.5.3 | Intel i7 x 8 @ 3.4 GHz | 205703 | 5.3 FPS | 6084 | 177.5 FPS | 54791 | 19.7 FPS | 5149 | 209.7 FPS | |
RTX 3090 | n/a | Ubuntu 20.04.4 | 4.2.0 | AMD Ryzen 9 5950X x 16 @ 3.4 GHz | 37095 | 29.1 FPS | 2610 | 413.8 FPS | 27816 | 38.8 FPS | did not have CUDA-enabled OpenCV built on this system to test | ||
Device | Jetson Clocks | OS | OpenCV | CPU | Time (ms) | FPS | Time (ms) | FPS | Time (ms) | FPS | Time (ms) | FPS | |
Darknet (CPU) | Darknet (CUDA) | OpenCV DNN (CPU) | OpenCV DNN (CUDA) |
The cost of resizing the video frames or images to match the neural network dimensions was ignored in the table above. The test video was manually pre-processed so the frame dimensions would match the 416x416 neural network, allowing the test to focus on the cost of applying the neural network.
But the cost incured to resize each video frame is non-trivial and negatively impacts the FPS. The next table highlights this cost by comparing the FPS achieved when working with 416x416 video versus the same video in the original 1920x1080 dimensions.
Device | Video Measures 416x416 |
Video Measures 1920x1080 |
|
---|---|---|---|
Raspberry Pi 4 | 2.4 FPS | 1.8 FPS | |
Jetson Nano | 16.9 FPS | 5.0 FPS | |
Jetson Xavier NX | 46.4 FPS | 9.1 FPS | |
Jetson AGX | 71.5 FPS | 14.6 FPS | |
Virtualbox VM | 39.4 FPS | 21.3 FPS | |
RTX 2070 | 209.7 FPS | 43.6 FPS | |
RTX 3090 | 413.8 FPS | 83.2 FPS |