Are We Done Tuning Our Application Yet? Roofline Analysis Helps Finds Useful Answers

opinion

Feb 27, 20174 mins

How can we, as software developers, make our big data or high-performance computing (HPC) application run faster? And when do we know that we’re done and there’s no more optimization to squeeze out?

It turns out there’s been great progress made in figuring out how to answer these questions, along with finding practical feedback on what to look at if we might not be done. The technique is called “roofline analysis,” and it can be done manually or with the help of a new tool from Intel that does some of the most tedious work for us.

Roofline analysis is a method to estimate performance with an eye on giving feedback about where bottlenecks exist. In other words, it guides us to where the full potential of the system isn’t being reached. This doesn’t prove that we can improve things, but it sure helps highlight where to consider putting our optimization efforts. It also tells us when the system is limiting performance and not our application, a strong indication of where optimizing the application won’t matter (but an algorithm change might).

The focus is on optimization of the algorithm we have, but it can also inspire change

We should never forget that changing to an algorithm that requires less computation, or a significantly different amount of data consumed, may be the best way to optimize. Such algorithm insights may be inspired by a roofline analysis, but the real focus of roofline analysis is help in the optimization of the algorithm we already have.

Think of roofline analysis as a guiding light to help prioritize our optimization work in a scientific manner – and helping us avoid having as many dead ends as “Let’s try this” or “Let’s try that.”

How do we know when we’ve tuned all we can tune?

Roofline analysis helps in determining the gap between an application and the potential of a computer. That ends up guiding our optimization work, and occasionally inspiring algorithm changes.

Roofline analysis graphs get their name from the roofline analysis graph they all have: a gentle slope upward (like the slope on the side of a roof) as the computer is limited by the supply of data, followed by the level (rooftop) cap of computational performance. The more intense a computation is (as the graph moves to the right), the more likely we become limited by the machine’s compute capabilities rather than its data capabilities. The diagram shows that we can actually have various memory subsystems (cache levels) with different caps as well as different caps on computational performance based on our coding/optimization styles. These multiple levels set us up for the game of figuring out if we can make a higher cap apply to our application instead of being capped at a lower performance.

Intel automated much of the tedious work in doing a roofline analysis

Intel has implemented the roofline analysis into a feature in its Intel® Advisor tool that allows exploration of the roofline analysis for your own application, and provides concrete feedback on where an application is performing now as well as insights into what bottlenecks there are.

In other words, this roofline analysis method helps you know what to focus on when the answer to “Are we done tuning our application yet?” is “Not done yet. There’s more to do.”

To learn more, here are some resources:
- New article: The Parallel Universe magazine, “Intel® Advisor Roofline Analysis – A New Way to Visualize Performance Optimization Trade-Offs.”
- Video: Roofline Analysis in Intel® Advisor 2017
- User’s perspective on the technique and Intel’s tool titled: “Utilizing Roofline Analysis in the Intel Advisor to Deliver Performance for applications on the Intel Xeon Phi Processor (Code named Knights Landing).”
- Trial copy: Information on Intel Advisor tool, including link to “Try” for free
- Select free tool programs: Information on ways you might qualify for free tools from Intel
- Berkeley paper: Roofline: An Insightful Visual Performance Model for Multicore Architectures
  Click here to download your free 30-day trial of Intel Parallel Studio XE

Software Development

by James Reinders

Software Programmer

James Reinders is a software programmer with a passion for Parallel Programming and Parallel Computer Architecture. He has contributed to the development of some of the world’s fastest computers, and the software tools that make that performance accessible for programmers. James has shared this passion in classes, webinars, articles and has authored eight books for software developers. James enjoyed 10,001 days working at Intel, and now continues to share his passion to help others “Think Parallel.”

Show me more

Topics

About

Policies

Our Network

More

Are We Done Tuning Our Application Yet? Roofline Analysis Helps Finds Useful Answers

More from this author

Boost System and IoT Development with Updated Intel System Studio

Watch Experts Talk About Cool Science at HPC

Raise Your Hand and Ask: What’s a Qubit?

Connecting Dies: How Moore’s Law Now Drives Packages

Intel HPC Developer Conference and SC17: Supercomputing Matters More Than Ever

Intel’s ‘2018 Model Year’ Developer Tools Are Now Available

Developer Skills at Work: Serious (and Not-So-Serious) Stuff

Digital Twins: A Compelling Use for Simulations on IoT Data

Show me more

Stop using AI to submit bug reports, says Google

The ‘toggle-away’ efficiencies: Cutting AI costs inside the training loop

AI optimization: How we cut energy costs in social media recommendation systems

How to build desktop apps in Typescript with Electrobun

Write and run assembly in Python with Copapy

Run AI Models Locally on Your PC — No Cloud Required (LM Studio Guide)