How can we, as software developers, make our big data or high-performance computing (HPC) application run faster? And when do we know that we’re done and there’s no more optimization to squeeze out? It turns out there’s been great progress made in figuring out how to answer these questions, along with finding practical feedback on what to look at if we might not be done. The technique is called “roofline analysis,” and it can be done manually or with the help of a new tool from Intel that does some of the most tedious work for us. Roofline analysis is a method to estimate performance with an eye on giving feedback about where bottlenecks exist. In other words, it guides us to where the full potential of the system isn’t being reached. This doesn’t prove that we can improve things, but it sure helps highlight where to consider putting our optimization efforts. It also tells us when the system is limiting performance and not our application, a strong indication of where optimizing the application won’t matter (but an algorithm change might). The focus is on optimization of the algorithm we have, but it can also inspire change We should never forget that changing to an algorithm that requires less computation, or a significantly different amount of data consumed, may be the best way to optimize. Such algorithm insights may be inspired by a roofline analysis, but the real focus of roofline analysis is help in the optimization of the algorithm we already have. Think of roofline analysis as a guiding light to help prioritize our optimization work in a scientific manner – and helping us avoid having as many dead ends as “Let’s try this” or “Let’s try that.” How do we know when we’ve tuned all we can tune? Roofline analysis helps in determining the gap between an application and the potential of a computer. That ends up guiding our optimization work, and occasionally inspiring algorithm changes. Roofline analysis graphs get their name from the roofline analysis graph they all have: a gentle slope upward (like the slope on the side of a roof) as the computer is limited by the supply of data, followed by the level (rooftop) cap of computational performance. The more intense a computation is (as the graph moves to the right), the more likely we become limited by the machine’s compute capabilities rather than its data capabilities. The diagram shows that we can actually have various memory subsystems (cache levels) with different caps as well as different caps on computational performance based on our coding/optimization styles. These multiple levels set us up for the game of figuring out if we can make a higher cap apply to our application instead of being capped at a lower performance. Intel automated much of the tedious work in doing a roofline analysis Intel has implemented the roofline analysis into a feature in its Intel® Advisor tool that allows exploration of the roofline analysis for your own application, and provides concrete feedback on where an application is performing now as well as insights into what bottlenecks there are. In other words, this roofline analysis method helps you know what to focus on when the answer to “Are we done tuning our application yet?” is “Not done yet. There’s more to do.” To learn more, here are some resources: New article: The Parallel Universe magazine, “Intel® Advisor Roofline Analysis – A New Way to Visualize Performance Optimization Trade-Offs.” Video: Roofline Analysis in Intel® Advisor 2017 User’s perspective on the technique and Intel’s tool titled: “Utilizing Roofline Analysis in the Intel Advisor to Deliver Performance for applications on the Intel Xeon Phi Processor (Code named Knights Landing).” Trial copy: Information on Intel Advisor tool, including link to “Try” for free Select free tool programs: Information on ways you might qualify for free tools from Intel Berkeley paper: Roofline: An Insightful Visual Performance Model for Multicore Architectures Click here to download your free 30-day trial of Intel Parallel Studio XE Software Development