Table Of Contents
This sample, sampleMovieLensMPS, is an end-to-end sample that imports a trained TensorFlow model and predicts the highest rated movie for each user using MPS (Multi-Process Service).
MPS allows multiple CUDA processes to share single GPU context. With MPS, multiple overlapping kernel execution and memcpy
operations from different processes can be scheduled concurrently to achieve maximum utilization. This can be especially effective in increasing parallelism for small networks with low resource utilization such as those primarily consisting of a series of small MLPs.
This sample is identical to sampleMovieLens in terms of functionality, but is modified to support concurrent execution in multiple processes. Specifically, this sample demonstrates how to generate weights for a MovieLens dataset that TensorRT can then accelerate.
Note: Currently, sampleMovieLensMPS supports only Linux x86-64 (includes Ubuntu and RedHat) desktop users.
The network is trained in TensorFlow on the MovieLens dataset containing 6,040 users and 3,706 movies. The NCF recommender system is based off of the Neural Collaborative Filtering paper.
Each query to the network consists of a userID
and list of MovieIDs
. The network predicts the highest-rated movie for each user. As trained parameters, the network has embeddings for users and movies, and weights for a sequence of MLPs.
Specifically, this sample:
The network is converted from Tensorflow using the UFF converter (see Converting A Frozen Graph To UFF), and imported using the UFF parser. Constant layers are used to represent the trained parameters within the network, and the MLPs are implemented using MatrixMultiply layers. A TopK operation is added manually after parsing to find the highest rated movie for the given user.
The sample fills the input buffer with userIDs
and their corresponding lists of MovieIDs
, which are loaded from movielens_ratings.txt
. Then, it launches the inference to predict the rating probabilities for the movies using TensorRT. The inference will be launched on multiple processes. When MPS is enabled, the processes will share one single CUDA context to reduce context overhead. See Multi-Process Service Introduction for more details about MPS.
Finally, the sample compares the outputs predicted by TensorRT with the expected outputs which are given by movielens_ratings.txt
. For each user, the MovieID
with the highest probability should match the expected highest-rated MovieID
. In the verbose mode, the sample also prints out the probability, which should be close to the expected probability.
In this sample, the following layers are used. For more information about these layers, see the TensorRT Developer Guide: Layers documentation.
Activation layer The Activation layer implements element-wise activation functions.
MatrixMultiply layer The MatrixMultiply layer implements matrix multiplication for a collection of matrices.
Scale layer The Scale layer implements a per-tensor, per-channel, or per-element affine transformation and/or exponentiation by constant values.
Shuffle layer The Shuffle layer implements a reshape and transpose operator for tensors.
TopK layer The TopK layer finds the top K
maximum (or minimum) elements along a dimension, returning a reduced tensor and a tensor of index positions.
This sample comes with a pre-trained model. However, if you want to train your own model, you would need to also convert the model weights to UFF format before you can run the sample. For step-by-step instructions, refer to the README.md
file in the sampleMovieLens
directory.
make
in the <TensorRT root directory>/samples/sampleMovieLensMPS
directory. The binary named sample_movielens_mps
will be created in the <TensorRT root directory>/bin
directory. ``` cd <TensorRT root directory>/samples/sampleMovieLensMPS make `` Where
<TensorRT root directory>` is where you installed TensorRT.
Set-up an MPS client. Set the following variables in the client process environment. The
CUDA_VISIBLE_DEVICESvariable should not be set in the client's environment.
`` export CUDA_MPS_PIPE_DIRECTORY=/tmp/nvidia-mps # Set to the same location as the MPS control daemon export CUDA_MPS_LOG_DIRECTORY=/tmp/nvidia-log # Set to the same location as the MPS control daemon ``` This output shows that the sample ran successfully;
PASSED`. The output also shows that the predicted items for each user matches the expected items and the duration of the execution. Finally, the sample prints out the PIDs of the processes, showing that the inference is launched on multiple processes.echo quit | nvidia-cuda-mps-control
--help
optionsTo see the full list of available options and their descriptions, use the -h
or --help
command line option.
The following resources provide a deeper understanding about sampleMovieLensMPS:
MovieLensMPS
Models
Documentation
For terms and conditions for use, reproduction, and distribution, see the TensorRT Software License Agreement documentation.
February 2019 This README.md
file was recreated, updated and reviewed.