Artificial intelligence and robotics go together like peanut butter and chocolate. To be truly useful, a service robot must be smart enough to perform its assigned tasks, stay out of harm’s way, and not run over its human colleagues in the process. Almost a decade ago, NVIDIA recognized that the extremely parallel processing technology of its GPUs and the more programmable nature of graphics architectures could be used for machine learning. In 2014, that vision began to materialize with the launch of the Jetson TK1, and an AI powerhouse was born.
Fast forward to 2022 and NVIDIA hasn’t let up the gas. In March, the company pulled back the curtain on the Jetson AGX Orin, the company’s latest robotics and AI platform based on Ampere GPU technology. While the COVID-related lockdowns overseas have delayed the company getting its hardware into the hands of developers, these units are shipping and we have one of the dev kits in-house. So, without further ado, let’s get to know the Jetson AGX Orin.
Jetson AGX Orin Developer Kit Specifications
We have covered the specifications of the different Jetson AGX Orin kits in our previous coverage, but the bottom line is that there’s about a GeForce RTX 3050 laptop GPU from CUDA and tensor cores to do most of the heavy lifting. GPU power accompanies a pair of NVIDIA deep learning accelerators and a dozen Cortex A78AE cores in a package offering up to 60 watts of power budget. All of this is backed by 32GB of unified LPDDR5 memory with over 200GB/s of bandwidth. If that sounds a lot like Apple’s M1 Pro or Max, let’s just say Cupertino didn’t invent shared memory architectures.
There is a hidden PCI Express slot under the hatch
The production kit contains plenty of I/O for cameras, microphones, and other sensors, and the development kit goes to great lengths to provide them all. We have 10 Gbps USB 3.2, DisplayPort, 10 Gigabit Ethernet, a 40-pin GPIO die, and headers for automation, audio, and JTAG programmers. The micro SD slot and dual M.2 slots provide room for tons of additional storage and wireless connectivity. There’s also a PCI Express Gen 4 slot with eight connectivity lanes and an x16 form factor. Since the Jetson AGX Orin SDK only has 64GB of eMMC storage on board, this could be a good place to store some extra storage, for example.
The Jetson AGX Orin Development Kit can run headless connected to a Linux PC via one of the USB-C or micro USB ports or as a standalone Linux box. We chose to simply run it as its own standalone PC; there are plenty of memory and CPU resources, and it comes with a full installation of Ubuntu Linux 20.04 LTS ready to roll. We’ll talk in depth about our experience with Orin shortly. Whether you’re using the Jetson AGX Orin as a device attached to another Linux PC or as a standalone development environment, everything we needed to get started was in the box.
Work with Jetson AGX Orin Development Kit
All of this is fun to read and write, but Nvidia sent us a kit so we had to dive in and check it out. As you can see from the photos above, the Jetson AGX Orin Development Kit is toddler. It’s about 4.3 inches square and three inches tall, about the size of one of Intel’s tiny NUCs, but a bit larger. The size is helped by the presence of an external power supply, which connects to the USB-C port just above the barrel connector, which can be used for powering with an AC power supply instead. Plug in a keyboard, mouse, and DisplayPort monitor (or HDMI display with a DP to HDMI adapter) and turn it on to get started.
To help us test out various features, NVIDIA also included a USB headset and a 720p webcam, although these aren’t normally included in the dev kit. Those pieces of hardware were important, though, because one of the demos we’ll be showing soon is looking for those specific hardware IDs.
The Orin Jetson AGX comes with Ubuntu Linux 20.04 pre-installed right out of the box, so the initial boot sequence will be familiar to everyone. Ubuntu Veterans. After choosing our language, location, username, and password, we were introduced to the Gnome 3 desktop that Ubuntu has been using since retiring its Unity interface. NVIDIA has included useful desktop shortcuts that open in-browser documentation links and sample code folders in Gnome’s file browser. But before we could start, we had to upgrade to NVIDIA Jetson Jetpack 5which is about a 10GB download on its own, and add our own code editor.
Since Microsoft added support for Arm64 Linux distributions in 2020, it’s become a popular choice among Jetpack developers, whether they prefer C++ or Python. This can be downloaded directly from Microsoft Visual Studio Code website on the Other Downloads page, or via apt-get on the command line. The VS Code Marketplace has all the necessary language support extensions for both supported languages.
The first time we opened a project, Code prompted us to install everything we were missing, so it was a pretty painless installation process.
After installing Jetpack 5 and downloading the benchmark tools and sample code from NVIDIA, we only used about 15GB of the 64GB built-in eMMC storage. Developers who want to work directly on the system will want to keep that to mind, because it doesn’t leave much room for data and projects, especially data used to make inferences about visual AI models. Remember that the Jetson AGX Orin can run in headless mode connected to another Linux PC via USB, so it’s a way to get around this limitation, or just use an external USB drive to store projects. The 10Gbps USB-C port should be fast enough that most developers won’t notice any slowdowns.
Explore Jetson AGX Orin Demos
The other side of the development sample coin is the demos. In addition to the sample code provided by NVIDIA, there are a host of fully functional demos for developers to dig into to see how the company’s trained AI models respond to live input. We dug into each one to see how they worked and to explore practical use cases. The most important and potentially impactful demo NVIDIA gave us was the NVIDIA Riva Automatic Speech Recognition (ASR). This one was best served by capturing the machine learning model at work in a video, so that’s what we’ve embedded below.
The ASR demo gave us a blank terminal window, and while we were talking, it started detecting our speech and transcribing it in real time. There are many other voice recognition software tools on the market, but this is only part of the human-robot interface. Combine ASR with conversational AI and text-to-speech and you have a robot that can not only talk to you, but hold a conversation, which NVIDIA is particularly proud of. We’ll see that on the next page, actually. Incidentally, this demo isn’t exactly hardware-agnostic; This particular ASR app was made for the specific headset NVIDIA sent us, but the source code can of course be extended for additional hardware support.
As you can see in the video, the transcription wasn’t absolutely perfect, but NVIDIA says it’s enough to let a service bot determine a user’s intent. The video is unscripted, and that’s intentional. Most conversations aren’t scripted, even if you’ve thought a lot about what you want to say. You can see that when I started and stopped, the AI started and stopped with me, and in that respect the transcript was a pretty fair assessment of what I had to say. It’s fun to watch in the video as the AI tries to figure out what I mean; words flow in and out of the terminal window as they go and it begins to detect meaning.
There are many solutions on the market, but this one in particular was fun to watch because it worked. That’s a big deal, because automated speech recognition with a low power budget is a big part of what will make it easier to interact with robots. Extracting meaning and context from sentences is sometimes quite difficult for humans, so training an AI is quite a monumental task. Plenty of companies have solutions, and NVIDIA’s isn’t the first, but as our video shows – which intentionally avoids talking about technology – it’s not exactly a solved problem yet. It’s better, though.
Which brings us to NVIDIA AI-CAT model adaptation framework, or “Train, Adapt, Optimize” toolkit. This means that the model can be adapted by developers to be quickly operational. As an example, NVIDIA provides the Action Recognition Net model, which was trained to identify certain actions corresponding to specific exercises, such as walking and running, as well as to detect a person’s falls. This model was trained on a few hundred short video clips showing these actions from different angles. NVIDIA provided a tutorial on extending the model to identify additional actions, such as doing a pushup or a pullup.
We followed up and were then able to deploy the model to either the Jetson AGX Orin itself or NVIDIA’s DeepStream, which has A100 instances on Azure. This is where the difference between a data center accelerator instance still dominates even the upgraded Jetson. The Jetson was pretty quick to run the model and test that the changes we made were correct, while the DeepStream instances are ridiculously quick to browse video files and identify actions performed on them. This gave us a taste of the model enhancement workflow offered by TAO and an idea of the dev-deploy-test workflow provided by Jetson AGX Orin and DeepStream in tandem.
Playing with the tools is fun and all, but these kit pieces are made for serious work. Next, let’s look at some practical implications and see what sorts of conclusions we can draw from our time with the kit.