TotalView for HPC
Faster fault isolation, improved memory optimization, and dynamic visualization for your high-scale HPC apps
Understand high-scale parallel and multicore applications, with unprecedented control over processes and thread execution and visibility into program states and data.
With TotalView for HPC, simultaneous debug many processes and threads in a single window to get complete control over program execution: Running, stepping, and halting line-by-line through code within a single thread or arbitrary groups of processes or threads. Work backwards from failure through reverse debugging, isolating the root cause faster by eliminating repeated restarts of the application. Reproduce difficult problems that occur in concurrent programs that use threads, OpenMP, MPI, GPUs, or coprocessors.
Memory leaks, deadlocks, and race conditions can be things of the past. Whether an experienced developer or new to the development challenges of multicore or parallel applications, TotalView finds errors quickly, validates prototypes, verifies calculations, and certifies code correctness.
TotalView works with C, C++, and Fortran applications written for Linux (including the Cray and Blue Gene platforms), Linux PowerLE, UNIX, Mac OS X, and Xeon Phi, and supports OpenMP, MPI, OpenPOWER, OpenACC / CUDA and ARM.
TotalView for HPC Features
Selecting the right dynamic analysis tool for your team depends on your team requirements, language support, and platform availability.
TotalView supports these key technologies:
TotalView provides powerful functionality to make debugging as easy as possible
C and C++ debugging and troubleshooting
C and C++ give you control over the details of data, access patterns, memory management, and execution. But direct control over low level machine behavior leaves little margin for error when it comes to building and maintaining scalable scientific applications. TotalView provides the ideal environment for troubleshooting complex C and C++ applications. They feature detailed views of objects, data structures, and pointers, simplifying working with complex objects.
The standard template library (STL) collection classes simplify the way you manipulate your program's data, but they complicate troubleshooting when your program hangs or crashes. TotalView type transformation facility (TTF) provides a flexible way for you to provide alternate displays for data objects. STLView transformations provide a logical view of STL collection class objects, providing a more practical view of list data. The end result is a simplified, intuitive view into the structure and behavior of your code.
Mixed Language Debugging with Python and C/C++
Many developers are leveraging the power of Python to develop applications and calling into C/C++ code to perform compute intensive tasks or access existing algorithms. Debugging across the language barriers can be challenging but CodeDynamics makes this easy by showing you a fully integrated call stack across the language barriers and allows all the Python and C/C++ variables and their values to be inspected. No other debugger makes it this easy for you to understand, diagnose and fix your mixed language Python and C/C++ applications!
Fortran debugging
While there are some things that both C and Fortran have in common, Fortran is not C. TotalView correctly represents Fortran notation, types, and concepts, such as common blocks and modules, that are not present in other languages.
Fortran is especially good at representing and manipulating numerical and mathematical data. One of its key characteristics is its facility for representing array data. Scientists and engineers working with Fortran source code are doing so in part to take advantage of language-level support for things like multidimensional arrays, array assignment, and the powerful features of Fortran pointers. Our technology can help you leverage these key attributes of Fortran to ensure working code.
Data visualization for understanding application behavior, computational data, and patterns
Most of the applications you are developing are engines for manipulating data. Whether observational or computational, it is the data that you really care about. When you are trying to develop insight into the behavior of a physical system you approach it quantitatively. The same approach is necessary when trying to understand the behavior of computational systems.
Troubleshooting involves exploring the behavior of a live application, looking for clues as to why the computation is not proceeding as expected, slicing the data presented in different ways to uncover patterns. It is critical that you have the tools that make it easy to view and manipulate that data, and TotalView helps streamline this process.
Debugging memory leaks and malloc errors
The fact that memory is a limited resource has a significant impact on the implementation of your application, especially when it contains millions of lines of code. As program complexity grows, memory leak debugging and troubleshooting malloc errors become more difficult. Memory-related code defects can cause out-of-control resource and random data corruption. Memory errors can also manifest themselves as random program crashes, negatively impacting productivity. In a worst-case scenario, memory errors can result in corrupted data causing programs to generate inaccurate results. TotalView helps you manage that risk by ensuring working code and accurate results.
Support for MPI, OpenMP, and other parallel paradigms
TotalView provides comprehensive support for MPI, OpenMP, UPC, and GA. With support for more than 20 implementations of MPI, TotalView has been the debugger of choice in parallel programming courses.
Multithreaded applications / multicore architectures
The era of increasing clock rates has ended. Processor architectures are characterized by multicore and many-core designs. Building a multithreaded application or transitioning from a serial application to a parallel application presents significant challenges. TotalView and ReplayEngine are natively built to help you manage the challenges presented by concurrency, parallelism, and threads.
Race conditions are a common problem, even in a well-tested multithreaded application. You can use locks, semaphores, and atomic operations to avoid race conditions, but they can introduce subtle problems of their own. Our tools provide visibility into the behavior of your code, increasing your understanding of the impact of these problems.
Unattended batch debugging with TVScript
TVScript is a framework for non-interactive debugging with TotalView. You define a series of events that may occur within the target program, TVScript loads the program under its control, sets breakpoints as necessary, and runs the program. At each program stop, TVScript gathers data which is logged to a set of output files for your review when the job has completed. If you call TVScript with no arguments, it provides usage guidelines and a listing of available events and actions. TVScript has been likened to printf on steroids.
Accelerator support
TotalView on Linux x86-64 supports CUDA and OpenACC debugging:
© Copyright 2000-2023 COGITO SOFTWARE CO.,LTD. All rights reserved