How to Convert Encrypted TLT Files to TensorRT Engine on Nvidia Jetson

This post describes the process of converting an encrypted TLT file (.etlt) to a TensorRT engine file (.engine) on a Jetson Nano device. While it is entirely possible to use an encrypted TLT file with DeepStream, it is recommended that you convert to a TensorRT engine to take advantage of environment and hardware specific optimizations. Further, whenever you upgrade CUDA or TensorRT (minor versions included), you must rebuild the engine to optimize for the updated libraries that will eventually use it. According to Nvidia, “Running an engine that was generated with a different version of TensorRT or CUDA is not supported and will cause unknown behavior that affects inference speed, accuracy, and stability, or it may fail to run altogether.”

This guide will provide the necessary steps to make generating and updating engine files super simple. We will be using Nvidia’s PeopleNet network, which utilizes ResNet34. Instructions on downloading the pre-built pruned version of the encrypted TLT for this model are given below.

Note: This process should work on any Jetson device, but we will use the Nano because it is the most restricted in terms of on-board graphics memory. The tool we will be using, tlt-converter, uses TensorRT to convert a pre-built .etlt file and hence requires sufficient memory.

It is assumed that you are running TensorRT version 7.1.x and CUDA 10.2, which is the versions included in Jetpack 4.4 and 4.5 as of this writing.

More information can be found on Nvidia’s Deploying to Deepstream guide.

Step 1: Download TLT-CONVERTER

While there are technically two methods for retrieving and using tlt-converter, this article will only focus on downloading the binary to be used directly on the Jetson Nano. The other method is to use nvidia-docker, but it requires an x86 host machine. If you prefer this method, which requires you to install TLT on your x86 host, you can find more information here.

Update (Feb 26, 20201): It appears the link to download tlt-converter is changed periodically by Nvidia. This was brought to my attention by a comment made by AnthonyJ. Unfortunately, that means the process will be made slightly more complicated, since you will need to navigate to Nvidia’s website to download the binary rather than using wget. You can find the download link here under the subheading “Instructions for Jetson”. Just make sure you save the zip file to your home directory (~/). You can either manually unzip it or use the unzip command below. Thanks again to AnthonyJ for pointing this out!

From Jetson Nano:

sudo apt-get update
sudo apt-get install libssl-dev

# Only if you want to use the unzip utility
sudo apt-get install unzip
unzip tlt-converter.zip -d ~/tlt-converter

cd tlt-converter
chmod +x tlt-converter
sudo mv tlt-converter /usr/bin

This installs (or updates) the unzip utility, as well as the SSL library that we will require later on when we run tlt-converter. It also downloads the zip file and extracts the Readme and tlt-converter binary. Further, it adds execution permissions so you can run the binary, and places it in the user’s binary folder which should be in your PATH. To verify this folder is in your PATH, run:

echo $PATH

You should see a colon-separated list with various paths, and /usr/bin should be one of them. At this point, you should be able to run the following command:

tlt-converter -h

If everything was successful, you will get a print out similar to the following:

usage: tlt-converter [-h] [-v] [-e ENGINE_FILE_PATH]
[-k ENCODE_KEY] [-c CACHE_FILE]
[-o OUTPUTS] [-d INPUT_DIMENSIONS]
[-b BATCH_SIZE] [-m MAX_BATCH_SIZE]
[-w MAX_WORKSPACE_SIZE] [-t DATA_TYPE]
[-i INPUT_ORDER]
input_file
...

If you get an error complaining about TensorRT or otherwise, it is quite possible that you do not have the correct version of TensorRT installed. Please ensure that you are using TensorRT 7.1.x and CUDA 10.2

Step 2: Download the encrypted tlt file

We need a sample file to work with, so for now let’s download a pre-built and pruned version of ResNet34 PeopleNet. You do not need to have DeepStream installed to run the resulting engine, since you can run it as a standalone TensorRT engine, so we will not assume that you have DeepStream installed at this point. Download the pre-built .etlt file and store it in your home directory:

wget https://api.ngc.nvidia.com/v2/models/nvidia/tlt_peoplenet/versions/pruned_v1.0/files/resnet34_peoplenet_pruned.etlt -O ~/resnet34_peoplenet_pruned.etlt

Note: If you do have DeepStream installed with the included samples, you can find much more information on the various pre-built encrypted TLT files in the README typically located at:

/opt/nvidia/deepstream/deepstream-5.0/samples/configs/tlt_pretrained_models/README

Step 3: environment variables

Until now, all of the steps would work on just about every Jetson device without trouble. However, there are a few key differences that set the Nano apart from devices like the TX2 or Xavier. The following environment variables reflect the Nano’s limitations in memory and processing power. Feel free to tune them to your needs, but these are the bare-bones defaults for converting on a Nano:

export TRT_LIB_PATH=”/usr/lib/aarch64-linux-gnu”
export TRT_INC_PATH=”/usr/include/aarch64-linux-gnu”
export INPUT_DIMENSIONS=3,544,960
export ENCODE_KEY=tlt_encode
export BATCH_SIZE=1
export ENGINE_FILE_PATH=resnet34_peoplenet_pruned.etlt_b1_gpu0_fp16.engine
export INPUT_ORDER=nhwc
export MAX_BATCH_SIZE=2
export OUTPUTS=output_bbox/BiasAdd,output_cov/Sigmoid
export DATA_TYPE=fp16
export MAX_WORKSPACE_SIZE=1610612736
export MODEL_IN=resnet34_peoplenet_pruned.etlt

Explanation:

TRT_LIB_PATH and TRT_INC_PATH: Simply point to the directory where your TensorRT shared library files are located.
INPUT_DIMENSIONS: Specify the input to ResNet34, in order of channels, height, width
ENCODE_KEY: The key that was used to generate the .etlt file. In this case, it was provided by Nvidia.
BATCH_SIZE: Calibration is a step performed by the builder when deciding suitable scale factors for inference. We set it to 1 because of memory considerations on the Nano. The default is 8. This is only relevant for converting the .etlt file to a TensorRT engine.
ENGINE_FILE_PATH:The output file name. Note that we did not prepend anything to the filename, meaning it will be stored in the same location where you run tlt-converter from.
INPUT_ORDER: Specifies the order in which INPUT_DIMENSIONS should be inferred. I have a post on Nvidia’s developer forums because, quite frankly, the INPUT_ORDER seems counter-intuitive to be at the time of this writing. I will update the post once I learn more. Suffice to say, nhwc works for this example.
MAX_BATCH_SIZE: Specifies the maximum TesnorRT engine batch size with a default value of 16.
OUTPUTS: Specifies the outputs expected for each node.
DATA_TYPE: Specifies the data type used in the model.
MAX_WORKSPACE_SIZE: Specifies the maximum TensorRT workspace size. This is probably the most significant setting for the Jetson Nano. I tried various values here, from the default of 1<<30=1073741824=1Gb to 1<<31=2147483648=2Gb and the value 1610612736=1.5Gb happens to work the best. The default value does not provide enough memory for TensorRT to employ various “tactics” and results in a warning, but it still technically works. The 2Gb value results in an out-of-memory exception. I found that using 1.5Gb will result in a workspace size warning about 15% of the time, but more often then not, it is sufficient to let TensorRT do its thing without complaining.
MODEL_IN: Is simply the input .etlt file.

Step 4: Converting (and waiting)

With the environment variables setup, we should be good to go on converting the encrypted TLT file to a TensorRT engine. We will assume that you saved the resnet34_peoplenet_pruned.etlt file in your home directory (cd ~) and you have exported all of the environment variables from above.

tlt-converter \
    -d $INPUT_DIMENSIONS \
    -k $ENCODE_KEY \
    -b $BATCH_SIZE \
    -e $ENGINE_FILE_PATH \
    -i $INPUT_ORDER \
    -m $MAX_BATCH_SIZE \
    -o $OUTPUTS \
    -t $DATA_TYPE \
    -w $MAX_WORKSPACE_SIZE \
    $MODEL_IN

This could take anywhere for 1-5 minutes, depending on whether or not TensorRT complains about maximum workspace memory. I found the resetting the Nano typically resulted in faster conversion times, most likely because the GPU memory is wiped and not taken up by other processes.

The result should be a new TensorRT engine file in your /home/username directory. If you have DeepStream or some other TensorRT pipeline setup, feel free to try it out! Be sure to leave and questions or comments below and I will make my best attempt to reply. Make sure you subscribe to get additional quality content in the future! Otherwise, look for an upcoming post where I discuss configuring and running PeopleNet on the Nano!

3 responses to “How to Convert Encrypted TLT Files to TensorRT Engine on Nvidia Jetson”

AnthonyJ says:

February 26, 2021 at 1:54 pm

FYI, the link to the trt-converter.zip seems to be broken. I found a different link to it via https://docs.nvidia.com/metropolis/TLT/tlt-getting-started-guide/text/deploying_to_deepstream.html#generating-an-engine-using-tlt-converter

LikeLike

- Rob Royce says:
  
  February 26, 2021 at 6:37 pm
  
  Hey Anthony, thanks for pointing this out! It appears Nvidia changes the underlying link every week or so. Unfortunately this means the process will be slightly less convenient since users will have to navigate to the website to download, but that shouldn’t be a big deal. I’ll update the article to reflect your findings. I really appreciate the feedback!
  
  LikeLike
  
Salim says:

July 19, 2021 at 1:15 am

Hey Rob,
Your blog was a great read. It would also be very helpful to see your viewpoint on how to deploy this .engine model file onto the jetson device using a simple python script to run the inferencing stage.

LikeLike