Commercial Computing with C/C++

2 Posts tagged with the optimization tag
0

So many times we get clients complaining to us that their code used to work on an older release but it's broken using the new release of the compiler. After closer look at the sample test case provided, we find out they have been lucky to have a working copy of the code. You see, the breakage is expected because they have broken the ansi-aliasing rules.

Not many of us follow the rules defined in C and C++ standards^1^ religiously. Although, the aliasing rules encourage accessing an object by lvalues of types compatible, we often have to break this rule in order to make the code "work".

By default, the xlc on z/Os compiles with ANSIALIAS. Based on the assumption that pointers in the source file access objects of the same type, the compiler determines storage locations that is accessed in two or more ways, i.e. aliased. If, for example, we have a struct s with two members s1 and s2, the storage for s overlaps with storage for both s.s1 and s.s2. But the storage of the s.s1 and s.s2 don't overlap. This knowledge is critical to aggressive compiler optimization. It allows some loads to move up and stores to move down. The rearrangements in the sequence of execution is desirable and increases executing more of the code in parallel.

Casting a pointer to point to a different object is a common C practice. For each type mismatch, xlc generates a warning and/or an informational message, which you may not notice if you have set the level of diagnostic messages to error or higher, -qflag=E, S, or U, or if you are redirecting all compiler messages to a hardly-ever-looked-at log file. Often the first time you notice a problem is when you execute the code and get an incorrect result.

You have broken the rules, now what?

You can compile routines that are not ansi alias compilant with low levels of optimization, e.g. at OPT0. The higher the level, the more aggressive the optimizations based on aliasing information. You can turn off optimization per routine, by #pragma option_override(func,"OPt(LEVEL,0)").

You can use -qnoansialias compile option or use cc utility which passes noansialias to the compiler by default. This may not be desirable because it usually results in significant performance degradation, e.g. gcc compiled at -O3 with -qnoansialias runs 20% slower.

You can fix the non-compliance in your source code.

1ISO/IEC 14882:1998(E), Section 3.10, Paragraph 15 states:

If a program attempts to access the stored value of an object through an lvalue of other than one of the following types thebehaviour is undefined:

  • the dynamic type of the object
  • a cv-qualified version of the dynamic type of the object
  • a type that is signed or unsigned type corresponding to the dynamic type of the object
  • a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object
  • an aggregate or union type that includes one of the aforementioned type among its members (including, recursively, a member of a subaggregate or contained union)
  • a type that is a (possibly cv-qualified) base class type of the dynamic type of the object
  • a char or unsigned char type

Example:

/*alias.c*/
int foo(char *c)
{
char a[100];
char *cptr = a;
*(int *)cptr = *(int*)c;
return 0;
}
xlc -c alias.c -O3 -qlist=./ -qflag=i -qinfo

INFORMATIONAL CCN3495 ./alias.c:5 Pointer type conversion found.
INFORMATIONAL CCN3374 ./alias.c:5 Pointer types "int*" and "char*" are not compatible.
INFORMATIONAL CCN3495 ./alias.c:5 Pointer type conversion found.
INFORMATIONAL CCN3374 ./alias.c:5 Pointer types "int*" and "char*" are not compatible.
INFORMATIONAL CCN3415 ./alias.c:7 The external function definition "foo" is never referenced.

0 Comments Permalink
0

z/OS C/C++ Performance Features

Compliers are an important tool in your development environment. A good optimizing compiler generates performance code without you worrying about the low level details of the OS, internals of the runtime environment and hardware architecture. You can concentrate on the business logic in your application. But optimization can take up a lot of resources, both in terms of compilation time and memory space. XL C/C++ provides an optimization option with 3 levels (called suboption 1, 2 and 3). Level 1 and 2 represent compromise between execution time performance and compile time. This is the appropriate setting in most cases. But there are situation where you want to let the compiler to exploit as much optimization opportunities as it could, regardless of compilation resources. This is what optimization level 3 (suboption 3) does. Experience shows that most of a program's time is usually spent in certain areas of the code (the 80-20 rule), one way of using level 3 is to apply it in those source files which contains hot spots of the application. The rest of the files, responsible for tasks like initialization, termination, error handling, and user interactions, can be compiled with lower optimization levels, getting the best of both worlds.

This leads to the general direction of putting more control into the programmer's hand in controlling actions taken by the compiler. An important technique in optimization is loop unrolling. Unrolling eliminates the loop control checking, which in turn can expose more optimization opportunities between loop iterations. But this is a two edged sword as too much unrolling can increase code size and larger memory footprint for the application. The optimizer normally makes decisions basing on it's analysis of the code. But often times the programmer knows which are the hot loops and can direct the compiler to do unrolling on specific ones. This is the purpose of the UNROLL option and the corresponding pragma directive. You can use these to control which loops to unroll, and by how many times, applying the optimization benefit to code that are most frequently executed.

The idea of execution frequency and its impact on optimization leads to the idea of Profile Directed Feedback (PDF). This is an enhancement to inter-procedural analysis (IPA), and is used together with the IPA option. IPA performances whole program analysis; it looks at code from all source files instead of just one. This leads to many more optimization opportunities than a normal optimizer usually discovers. PDF brings this a step further -- the compiler makes use of profiling information to direct its optimization. The steps to use PDF is as follows: 1) Build the application with the opion PDF1. This results in a load module with instructmentation to collect profiling information. 2) Run the instructmented module with typical input. The instructmented code will produce a data file containing the execution frequency of the code. This is called the training run. 3) Build the application again with PDF2. This is the production build where IPA makes use of the profiling data collected in step 2 to perform aggressive optimization. The result is a load module tuned to run optimally with the typical input used in the training run. In order to use this successfully, input data in the training run must be selected carefully. It is most effective when the production data profile on average doesn't vary too much.

The above are just a few of the features that can boost your program’s performance. You can find out more in the Programming Guide (http://publibz.boulder.ibm.com/epubs/pdf/cbcpg190.pdf, Part 5, chapter 36-42).

0 Comments Permalink
Bottom Banner