Medical ultrasound is the most attractive of all diagnostic imaging systems due to its least invasive nature and lack of radiation. As medical ultrasound continues to expand into a wider range of applications for its non-invasive nature and for its ability to image soft tissues, there is a growing demand to support advanced imaging techniques in ultrasound beamformers, multidimensional visualization and the application of artificial images. intelligence to aid in the diagnosis of critical illnesses.
Today, however, medical ultrasound is still limited in its ability to be used in real time due to challenges such as sequential data acquisition, low frame rates (10 to 50 frames per second) and the suboptimal focus of the image, which can only be achieved at a single depth.
In standard sequential imaging, the complete image is obtained line by line. Each line is scanned taking a set of points say, from left to right, with a transmitted beam focusing on a given point. A line at any lateral position is then produced using dynamic focus during reception. In the next step, you go one step further and scan again. You repeat the process several times to create the complete image.
In order to improve the quality of the ultrasound images, different emission focal depths are used, and the final image is obtained from a recombination of these partial images corresponding to different depths.
Additionally, ultrasound imaging is limited to the speed of sound, which is 1,540 meters per second. Thus, it takes 200 microseconds to descend to a depth of 15 centimeters and receive the signal back, which gives 5,000 measurements per second. For an image with 100-200 lines of resolution, this will translate to frame rates of 10-50 frames per second (fps) corresponding to 100ms to 20ms.
For cardiac imaging, for example, the aortic valve (found in the human heart and most animal hearts) switches with time delays on the order of 200 ms. So you can only get 10 snapshots (200ms/20ms=10) of the moving valve. This is not sufficient for real-time imaging needs.
Therefore, sequential data acquisition is not sufficient to achieve the desired frame rate and image quality in critical ultrasound diagnostics.
Ultra-fast imaging techniques
One way to address these challenges is to use UltraFast imaging techniques. UltraFast imaging is a paradigm shift from normal sequential acquisition to full-plane, full-parallel acquisition using spherical or plane waves. This provides the ability to make optimally focused images anywhere in the frame and the ability to achieve thousands of frames per second, resulting in high image quality, precision and scan depth . We can also do functional imaging with high precision for high and low speeds. The full dataset gives us the opportunity for more accurate retrospective measurements.
But UltraFast techniques have been limited to research scanners and have proven difficult to implement on a commercially viable scanner due to the vast resource requirements in terms of computation, system size, and amount of power dissipated.
Advantages of adaptive SoCs
Let’s now explore how AMD-Xilinx’s adaptive systems-on-chips (SoCs) can further innovate UltraFast imaging.
Xilinx Versal-based Adaptive SoCs are next-generation Adaptive Computing Acceleration Platform (ACAP) devices featuring tightly coupled multiprocessors, an FPGA, and the new “Intelligent Engine” or AI Engines (AIE) with a highly parallel tiled SIMD-VLIW architecture. SIMD stands for ‘Single-Instruction-Multiple-Data’ and VLIW stands for ‘Very Long Instruction Word’. The different blocks are tightly coupled using the NoC or “Network-on-Chip” architecture allowing rapid movement of data between the different blocks.
AIE is the primary unit of computation for UltraFast algorithms (like Planar Wave and Synthetic Aperture). It is a large array of SIMD/VLIW processors connected in a mesh (see Figure 1 below). Each processor has its own instruction and data memory and each processor can share memory with its neighbor. All processors are connected on an innovative interconnect built with a massive bandwidth of several terabytes per second. This structure allows an unprecedented level of parallelism which is necessary to implement such algorithms.
Figure 1 below shows the architecture of the new AI engines in the Versal SoC.
This new adaptive computing architecture with the AIE system allows medical equipment manufacturers to implement very high throughput algorithms, for example, in parallel with beamforming software performing real-time scans or 3D/4D visualization, AI-ML for region of interest selection, aiding data inference, and offloading image reconstruction in endoscopy, robotic surgery, and radiology using a single, fully-embedded device.
Using this new adaptive SoC, AMD-Xilinx set out to build a practical UltraFast beamformer with the help of Dr. Joergen Jensen from the Technical University of Denmark. Dr. Jensen helped develop the algorithms and AMD-Xilinx, together with our partner, implemented this beamformer in a sample design with software libraries for the Versal adaptive SoC.
The block diagram below (Figure 2) is a high-level representation of an example UltraFast beamformer on a single Versal Adaptive SoC, combining FPGAs, CPUs, and hardware accelerators for AI and digital signal processing.
The FPGA part of the device manages transducers and acquires echoes, stores data in external DDR4 memories and can communicate with the host system using PCIe or expand and synchronize with another module via Precision Time Protocol and 10/25 Gigabit Ethernet channels.
Key benefits include:
- Use a single Versal Adaptive SoC to drive an UltraFast-based beamformer and deliver extremely high performance.
- Implement a complete software harness trainer in the Xilinx Vitis software development environment using a high-level programming language like C/C++
- Enable Vitis accelerated libraries, which can be used as-is in a medical equipment design, or used as sample designs, to implement the medical equipment manufacturer’s own algorithms.
Since many scientists use Matlab to design their system, we can provide Matlab support with Model Composer which is a model-based design tool that enables rapid design in MATLAB and Simulink environment and speeds up the path to production on Versal adaptive SoC devices thanks to automatic code. generation. This is reinforced by a set of C++ patterns that encapsulates the core AI engines API (shown in Figure 3 below).
Figure 4 below shows the unified software environment of Vitis with the accelerated libraries used for “UltraFast” ultrasound imaging.
Technology demonstration in real situation
AMD-Xilinx presented a real-world demonstration of the full capabilities of such an ultrasonic beamformer at the Radiological Society of North America (RSNA) 2021 107and annual meeting in Chicago in December 2021. The demonstrator uses the Versal AI Core series VCK190 evaluation board connected to an AMD workstation. A Dell Alienware workstation was selected with the AMD RX6900XT GPU to render the image. Ultrasound data is provided in real time by a wireless probe to show exact performance, or it can also be provided using a simulator like the Field II used by most ultrasound imaging scientists to validate algorithms with a reference image.
We measured and generated benchmarks for the “UltraFast” based beamformer for the Versal Adaptive SoC and a competing GPU. We have presented data for two applications, for abdominal imaging and for imaging small parts. The software environment used for the Versal device is Vitis 2021.2 (with support for future versions) and CUDA for the GPU.
Tables 1 and 2 below summarize the performance results in fps (frames per second) for a single beamformer with 64 active elements and 200 lines of resolution for the adaptive Xilinx Versal SoC, the RTX 2020 GPU (Nvidia) and for a PC running an i7 processor (Intel).
Table 1: Small parts imaging
Table 2: Abdominal Imaging
Linear and matched filter interpolation results are shown for floating-point 32 (FP32) and integer 16 (Int16) data types. As these numbers above show, the Versal platform can not only implement a full beamformer using “UltraFast” techniques, but also significantly outperforms a gaming GPU and PC. The results show a 44 times gain for linear interpolation for integers, 27 times the gain for floating point in some of the critical algorithms on the GPU. The Catmull-Rom Spline interpolation, which is one of the most difficult to implement, shows that the Versal further improves its performance advantage by 91 to 160 times over the GPU.
With this new innovation, a single embedded SoC device can now enable a commercially viable real-time ultrasound beamformer using “UltraFast” algorithms. This will generate new capabilities in the diagnosis of serious diseases by creating the ability to obtain optimally focused images everywhere in the image and to obtain thousands of images per second, which will result in quality and high image precision.