3 Replies Last post: Dec 5, 2008 10:56 AM by clapkow  
Click to view Roland_K's profile   1 posts since
Nov 11, 2008

Dec 2, 2008 11:07 AM

Building applications to deploy on different levels of IBM POWER hardware and AIX

I have a customer who needs to deploy an application they have developed with XL C/C++ for AIX v8 on an IBM POWER5 and AIX v5.3 system to a number of sites running different levels of IBM POWER hardware and AIX.

They would like to know what compile options to use to maximize the performance of their application on POWER4, POWER5, and POWER6 running AIX 5.3 or AIX 6.1.

They told me they know of -qarch and -qtune. Currently they are using the default setting. They would also like to know how these options could affect applications in areas of correctness and performance if they are not set correctly.

Click to view clapkow's profile   3 posts since
Nov 24, 2008

It is understandable that choosing the right -qarch -qtune options is difficult. There are many options and choosing the wrong one might generate object code that is not executable on some platform a product is deployed on.

I'll use your example of an application being deployed on POWER4, POWER5, and POWER6 machines and walk us through the tables in the Compiler Reference to try and make the best choice.

Selecting the right -qarch

We need to ensure that the produced object code by the compiler is executable on any POWER4, POWER5 and POWER6 machine. Reading the descriptions for -qarch=pwr4, -qarch=pwr5 in the section describing the -qarch option we see that:

-qarch=pwr5 will generate code that can run on a POWER5, POWER5+
-qarch=pwr4 will generate code that can run on a POWER4, POWER5, POWER5+, PowerPC 970
POWER6 is not supported by v8 of XLC/C++ for AIX; however, POWER6 can run any code generated for POWER5 or POWER5+. In the documentation for v9 of XLC/C++ you would see
-qarch=pwr6 will generate code that can run on POWER6
-qarch=pwr5 will generate code that can run on a POWER5, POWER5+, POWER6
-qarch=pwr4 will generate code that can run on a POWER4, POWER5, POWER5+, POWER6, PowerPC 970

So out of the 3 possibilities the safe option to use would be -qarch=pwr4 as code produced will run all of the machines the application will be deployed on.

Now we could also use -qarch=pwr3 because it is labeled as

-qarch=pwr3 will generate code that can run on a POWER3, POWER4, POWER5, POWER5+, POWER6, PowerPC 970

In general it is best to specify the most modern machine type for -qarch that still permits the object code to be executable on all of the machines your application will be deployed. There is also a table in the compiler reference for v9 and a similar table in the compiler reference for v8 in section entitled "Acceptable compiler mode and processor architecture combinations" that can help make this decision.

http://www-949.ibm.com/software/rational/cafe/servlet/JiveServlet/downloadImage/1202/capture.jpg

You can see that specifying -qarch=pwr3 would work, but Large page support could not be exploited unless you specify -qarch=pwr4. Similarly using a the default -qarch=ppc, the compiler does not have to option of using graphics features, or square root features.

We could not look at this table alone and choose -qarch=pwr6 to take advantage of vector processing support, because -qarch=pwr6 generates code that can run only on POWER6 which would not work because we want to deploy our application on POWER4 and POWER5 as well.

Choosing the best -qtune option

The -qtune option does not determine what machines will be able to run your application but instead tells the compiler which machine it should try to make the application run fastest on.

In v9 of the compiler we made this choice easier. We added -qtune=balanced. This possible value of balanced will tune the application to run fastest on a broad range of processors.

The alternative to using -qtune=balanced is choosing a processor which will represent the majority of your users. So in the example the application will be deployed on POWER4, POWER5 and POWER6 machines. If the majority of your users will have a POWER5 processor it is probably best to try and make your application run fastest on a POWER5 machine. In that case -qtune=pwr5 is the prefered option.

In summary your customer should use -qarch=pwr4 and -qtune either pwr4 or pwr5 (pwr6 or balanced if they switch to v9)

A word of caution:

Higher level of optimization such as -O4 and -O5 are a very good way of making your application perform better; however, they have the side effect of setting -qarch=auto and -qtune=auto. The word "auto" is replaced with the architecture of the compiling machine. Since your client is using a POWER5 machine this would result in -qarch being set to pwr5 and -qtune being set to pwr5. As we saw from the discussion above, using -qarch=pwr5 will generate code that can run on POWER5 and POWER6. There is a possibility that an instruction will be used that cannot be executed by a POWER4 machine. The users with a POWER4 machine may get an Illegal Instruction exception.

Click to view mendell's profile   2 posts since
Nov 6, 2008
The short answer is that the default -qarch will run on all machines supported by the version of AIX or Linux that the compiler will run on. If you know that all machines that will run a program will only ever run on a given machine (such as power 5), then it will be safe to compile with -qarch=pwr5. The program will then run on power 5, power 5+ and power 6 machines. It might not run on a Power 4 machine, as some instructions introduced in Power 5 might b generated.

-qtune is used to tell the compiler that most often the program will be run on a given class of machine, and the compiler should optimize specifically for that machine, while still generating code that will execute on all machines that th -qarch supports.

Starting in V9.0, the default -qtune is balanced, which will tune for recent machines. Currently, this will generate code that will run well on Power 5 and Power 6 machines.

Starting in V10.1, the default -qarch has been changed to ppc64grsq, which supports all IBM PowerPC machines that run AIX or Linux.
Click to view clapkow's profile   3 posts since
Nov 24, 2008

Also, it has been brought to my attention that we did provide the -qtune=balanced option in a PTF.

In the support section of the XL C/C++ IBM website the February 2008 XL C/C++ Enterprise Edition V8.0 for AIX PTF the Readme mentions the addtion of -qtune=balanced.

Bottom Banner