Announcement:

IBM Rational C/C++ and COBOL Development Tools for AIX Beta by cafe at Nov 13, 2009 9:58 AM

We are pleased to announce the IBM Rational C/C++ and COBOL Development Tools for AIX Managed Beta Program.

The Managed Beta program will engage users that are interested in deploying Rational C/C++ and COBOL development tools for AIX to evaluate code and provide feedback on their experiences. Participants will join developers and product management for discussions and have access to a private website. The website will include a tool for reporting problems and feature requests. This program is now open for sponsors to submit nominations for their customers/partners. More information

Blog Posts

Statistics: Blogs: 4 Blog Posts: 39
Items per page Filter: 1 2 3 Previous Next
0

Hello all, over the last year, a group of Transactional Memory experts from Sun, Intel, and IBM have been getting together every Friday to discuss how to create a uniform syntax for Transactional Memory.

We are happy to release the first version of the Draft Specification of Transactional Language Constructs for C++. This specification is the result of a joint work by a group of people from Intel, IBM and Sun, and is based on our experience working with transactional language constructs. We would like to encourage people to implement this specification and we welcome feedback on the document. Please direct any such feedback to this discussion group TM & Languages.

You can find the specification in the Resource Library/Articles, Presentation, Ebooks under the Parallel Section

If you have any comments, I invite you to post them here or in the Discussion forum.

TM Specification Drafting Group

0 Comments Permalink
2

This is my first post on our C/C++ Cafe that has been long in coming.

If you are like me, then you are a new zOS programmer. The learning ride has been quite turbulent and there are ways to go yet. If you are a devoted programmer then you feel quiet excitement of your new program almost working, tempered by the chance of another large manual being 'thrown' at you. zOS is one of those products that has too much of a good thing, that is there is a LOT of documentation. This fact very quickly becomes an advantage as one gains more experience.

No matter what design patterns you use, almost as soon as you start writing code, you will need to include other files, be it either your own include files or third party libraries. For efficiency purposes, these files will in all likelihood reside in datasets. To make things more interesting, the documentation might instruct to use directory based (HFS) #include, confusing the new-comers as to where files reside.

I want to discuss two tasks that come up when dealing with include files:

  • Finding the include files
  • Dealing with preprocessor macros

If the compiler already found the include file (i.e. successful
compile) and you are interested where the file is located (i.e. which version of the library you are actually using), both -qlist and
-qsource produce an includes section.

#Will list all included files:
xlc -qsource -c foo.c | sed -n '/I N C L U D E S/,/E N D O F I N C L U D E S/ p'
#Will list the path to NAME_OF_INCLUDE:
xlc -qlist -c foo.c | grep NAME_OF_INCLUDE

The grep command might not return anything because of some zOS specific file translations, hence be careful:

  • if NAME.OF.INCLUDE contains dots, the real file name might be translated to INCLUDE.OF.NAME or just NAME
  • if NAME_OF_INCLUDE contains underscores, it might get translated to NAME@OF@INCLUDE

If you are using V1R11 compiler, there is a new feature, -qmakedep (on USS only) that will produce a list of all included files. It is much easier to remember then the sed command above. For example if you compiled:

# previus invocation: 'xlc -c foo.c'
# Append -qmakedep
xlc -c foo.c -qmakedep
cat foo.u

For previous releases of zOS compiler, have a look at the makedepend utility. It contains some other options that might be useful for include file debugging.

-qshowinc is another particularly useful option, if you already have the include files. It shows the the contents of the included files. It is similar to what PPONLY (-E equivalent) option does, except it outputs to the listing and does not strip preprosessor directives. However, be ready to pipe the output to other programs to filter out the thousands of lines of code produced. Most of the time I use less and its search features. sed and grep sometimes are also be useful.

If you are trying to include a file, and the compiler cannot find it, there is more research involved. In a general case, the include files can be found, top to bottom in these places:

pwd
LSEARCH
DD:USERLIB
SEARCH
DD:SYSLIB


  • System includes can only be found in SEARCH and DD:SYSLIB.
  • DD: statements come from the JCL, hence you might need to know how the compiler was invoked.
  • SEARCH and LSEARCH come from the compiler options hence are easy to modify

SEARCH and LSEARCH both contain a list of directories and partial dataset qualifiers. The topic is discussed in detail in our Compiler User Guide in 'Chapter 7: Using include files'. I personally found the flowcharts in Chapter 7 and the Examples in the LSEARCH option explanation cleared up most of the most dataset questions that I had. It is worth noting for newcomers that terms 'z/OS Unix files' and 'HFS files' are equivalent and refer to directory based files (i.e. similar to organization Linux and Windows file systems) (as opposed to DATASETs that can be sequential or PDS in this discussion)

-qlist, -qsource options and -V and -v c89 and xlc flags provide a quick way to find out what the values of LSEARCH and SEARCH are.

As Visda has discussed before in her blog post, NOSEARCH() and NOLSEARCH() reset the respective option value back to empty.

Last topic I wanted to touch was preprocessor macros and what options are available when dealing with them.

Michael Wong has posted here a way to find compiler predefined macros on AIX using the SHOWMACROS option. zOS unfortunately does not have this option till V1R11. The best equivalent is to use the makedepend utility -Wm,list option and then view depend.lst. It will contain a list of predefined compiler macros. However, makedepend utility is being deprecated since V1R11 in preference to built-in -qmakedep option.

Nevertheless most macros should be mentioned in the zOS manuals related feature sections. Here is a small list from the manual.

If you already have the macro name you wish to use, have a look at the -qEXPMAC option. This option will show you the value of the macro in the source listing. It is most useful when combined with -qshowinc.

I hope this gets you started on the right track.

2 Comments Permalink
0

Hello -- I'm the technical leader for the IBM MASS math libraries. MASS stands for Mathematical Acceleration Subsystem, and consists of libraries of mathematical functions specifically tuned for optimum performance on various computing platforms.

MASS was originally launched by IBM in 1995, and has been continuously improved and expanded since then. I've been involved with MASS since 2002.

There are currently versions of MASS for all the POWER processors, running AIX or Linux operating systems. There are also versions for BlueGene/L and BlueGene/P, as well as the Cell/B.E.

The libraries contain accelerated implementations of elementary functions such as the trigonometric and hyperbolic functions and their inverses; power, logarithm, exponential, error function, and others. Complete lists are available on the MASS Web page, the URL of which is given below.

There are both scalar and vector libraries, and for Cell/B.E. there is a SIMD library as well.

The libraries are callable from C, C++, or Fortran source programs. The IBM XL C/C++ and IBM XL Fortran compilers are also capable of recognizing opportunities to use MASS to accelerate the source program, and invoking it automatically without the need for source-program changes.

The MASS libraries are packaged with the IBM XL C/C++ and IBM XL Fortran compilers, and are also available free-of-charge on the MASS Web page, for users of other compilers (such as gcc) for the target hardware.

You can find out more about the MASS libraries at the MASS Web page, http://www.ibm.com/software/awdtools/mass . If you have questions about MASS, you can post them on this blog, or send them to me at masslib@ca.ibm.com, both of which I monitor.

Happy numerical computing!

Regards,
Robert


Robert F. Enenkel, Ph.D.,
Compilers/Numerical Computing, IBM Toronto Laboratory,
Mail Stop D2, 8200 Warden Ave., Markham, Ontario, Canada, L6G 1C7
https://www-927.ibm.com/ibm/cas/toronto/people/members/robert.shtml

0 Comments Permalink
0

Transactional Memory (TM) is a high level abstraction for supporting a safe mutable shared state, such that the user does not have to worry about the low-level details of locking and sharing of global resources. It is basically a class of optimistic speculation techniques such that groups of memory operations are bundled as an atomic operation such that it can resolves the problems with locks, possibly support composability.

The basic idea is to move your group of atomic operations through, assuming that it will be successful, and only rollback when a conflict actually occurs.

At the moment, much of the ideas of TM are there as a way to test out the idea, and possibly be integrated into some future hybrid system. Even practitioners of TM knows there is a certain amount of hype that we have to deal with in any new technology before it drops to a trough and rebounds back to a realistic plateau.

A number of vendors have planned both hardware and software implementations of Transactional Memory.

The software transactional memory compilers from different vendors all use different syntax, and this creates a basic problem with interoperability, and common porting of code. I will deal with this in the next post.

IBM is also working in this area, and has released an Alphaworks compiler supporting Software Transaction, actually last year.

The IBM XL C/C++ for Transactional Memory for AIX is also accessible from the Resource Library of the C/C++ Cafe.

The public domain STM runtime is compatible with the AlphaWorks XLC STM release.
It was released through the Amino Concurrency Building Blocks project
The source code is here

I will have more to say about the Amino Building Blocks in a future post.

0 Comments Permalink
0

So many times we get clients complaining to us that their code used to work on an older release but it's broken using the new release of the compiler. After closer look at the sample test case provided, we find out they have been lucky to have a working copy of the code. You see, the breakage is expected because they have broken the ansi-aliasing rules.

Not many of us follow the rules defined in C and C++ standards^1^ religiously. Although, the aliasing rules encourage accessing an object by lvalues of types compatible, we often have to break this rule in order to make the code "work".

By default, the xlc on z/Os compiles with ANSIALIAS. Based on the assumption that pointers in the source file access objects of the same type, the compiler determines storage locations that is accessed in two or more ways, i.e. aliased. If, for example, we have a struct s with two members s1 and s2, the storage for s overlaps with storage for both s.s1 and s.s2. But the storage of the s.s1 and s.s2 don't overlap. This knowledge is critical to aggressive compiler optimization. It allows some loads to move up and stores to move down. The rearrangements in the sequence of execution is desirable and increases executing more of the code in parallel.

Casting a pointer to point to a different object is a common C practice. For each type mismatch, xlc generates a warning and/or an informational message, which you may not notice if you have set the level of diagnostic messages to error or higher, -qflag=E, S, or U, or if you are redirecting all compiler messages to a hardly-ever-looked-at log file. Often the first time you notice a problem is when you execute the code and get an incorrect result.

You have broken the rules, now what?

You can compile routines that are not ansi alias compilant with low levels of optimization, e.g. at OPT0. The higher the level, the more aggressive the optimizations based on aliasing information. You can turn off optimization per routine, by #pragma option_override(func,"OPt(LEVEL,0)").

You can use -qnoansialias compile option or use cc utility which passes noansialias to the compiler by default. This may not be desirable because it usually results in significant performance degradation, e.g. gcc compiled at -O3 with -qnoansialias runs 20% slower.

You can fix the non-compliance in your source code.

1ISO/IEC 14882:1998(E), Section 3.10, Paragraph 15 states:

If a program attempts to access the stored value of an object through an lvalue of other than one of the following types thebehaviour is undefined:

  • the dynamic type of the object
  • a cv-qualified version of the dynamic type of the object
  • a type that is signed or unsigned type corresponding to the dynamic type of the object
  • a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object
  • an aggregate or union type that includes one of the aforementioned type among its members (including, recursively, a member of a subaggregate or contained union)
  • a type that is a (possibly cv-qualified) base class type of the dynamic type of the object
  • a char or unsigned char type

Example:

/*alias.c*/
int foo(char *c)
{
char a[100];
char *cptr = a;
*(int *)cptr = *(int*)c;
return 0;
}
xlc -c alias.c -O3 -qlist=./ -qflag=i -qinfo

INFORMATIONAL CCN3495 ./alias.c:5 Pointer type conversion found.
INFORMATIONAL CCN3374 ./alias.c:5 Pointer types "int*" and "char*" are not compatible.
INFORMATIONAL CCN3495 ./alias.c:5 Pointer type conversion found.
INFORMATIONAL CCN3374 ./alias.c:5 Pointer types "int*" and "char*" are not compatible.
INFORMATIONAL CCN3415 ./alias.c:7 The external function definition "foo" is never referenced.

0 Comments Permalink
1

Search.jpg I think this option's name, (especially the negative one), is pretty confusing. Be that as it may, NOSEARCH wipes out all the previous search specifications and SEARCH sets them.

In METAL C a subset of standard C libraries have been defined, in this context this option becomes pretty handy.

In short, if strcpy is used in the user code, to avoid unexpected compile time messages complaining about strcpy syntax, incorrect run time behavior or what not, its declaration and definition should be pulled in from the METAL string.h header file.


cat foo.c


#include <string.h>
int main() {
char s[10];
memset(s,0,sizeof(s));
strcpy(s,"hello");
return 55;
}



xlc -S -c -qmetal foo.c -qlist=./
grep 'string.h' foo.lst



1 /usr/include/string.h wrong!!!

xlc -S -c -qmetal -qnosearch -I/usr/include/metal foo.c -qlist=./
grep 'string.h' hello.lst



1 /usr/include/metal/string.h correct!!!


The End!

1 Comments Permalink
2

I apologize for lack of updates recently as an addition to the family has kept me hopping.

I have still been keeping up my parallel programming work by a recent talk on C++0x Multithreading at BoostCon 09:

http://www.boostcon.com/program#schedule

My two talks (the other one was an overview of C++0x and compiler support which can be seen here:
http://www-949.ibm.com/software/rational/cafe/blogs/cpp-standard/2009/05/26/the-view-or-trip-report-from-the-mar-2009-c-standard-meeting) seem packed with about 60 people in an auditorium for about 90 minutes. Here is a trip report from a fellow speaker Justin Gottschlich who attended my talks:

http://par-con.blogspot.com/

The slides and video should be online soon.
A second trip report review is by Emil Dotchevski:

http://revergestudios.com/reblog/index.php?n=ReCode.BoostCon09

Justin gave an excellent talk on a proposed Boost Transactional Memory Library which he is collaborating with the Father of TM: Maurice Herlihy.

I have been involved in Transactional Memory for a few years now and knows its hype, promise and pitfalls. Still I felt I was stirred by Justin's excellent oratory skills, pitching this technology. The idea of doing TM as a pure library is not easy, although it does get the technology into the hands of everyone as fast as possible. Language changes take time to get right. I should know and will discuss in a future post.

Please read their excellent trip reports for the details. I will turn my discussion on something else that is also interesting to me personally, but may not have much to do with Parallel programming.

BoostCon 09, as with previous BoostCon was an exciting experience. Without intentionally touting my own horn, BoostCon09 is rich with speakers of experience in the field. They choose their speaker carefully from the pantheon of C++. Last year was Bjarne Stroustrup. This year was Andrei Alexandrescu, whose topic is Iterators Must Go.

Andrei gave many reasons why iterators, once a good idea, are unsuitable as we move forward. For me, the most interesting argument is that iterators are not well suited to multithreaded programming, because many of the idea of stack pop and push can only work in single thread unless we change the interface.

BoostCon, in my opinion is rapidly becoming the leading C++ conference, in direct competition with SD West, and ACCU. All are packed with workshops, and knowledgeable speakers.

This year, there was a distinct track of parallel programming theme, which included:

  • Joel Falcou: High-Level Parallel Programming EDSL - A BOOST libraries use case
  • Stephan T. Lavavej: Parallel Patterns Library in Visual Studio 2010
  • Justin Gottschlich, Jeremy Siek: Boost + Software Transactional Memory
  • Troy Straszheim: Kamasu: Parallel computing on the GPU with boost::proto
and of course, yours truly's:
  • Michael Wong: Multithreaded C++0x: The Dawn of a New Standard

All this makes me want to suggest a special track for parallel programming for Boost in future years.

I attended some of the 0x tutorials and found that I still had things that I didn't know. This is not surprising given the depth of C++0x.

The other interesting part was the Cmake tutorial. Boost build has used bjam since the beginning. This build tool, while interesting in its own right has many peculiarities which makes its adoption not a trivial task for building Boost on IBM systems. From the stories we heard around Boostcon, the same seems to apply to other environments.

Recently, there has been a move towards using cmake as a truly better build tool. From what I can see from the workshop, they are right and I am eager to move our Boost build to a cmake system, especially if Boost is moving in that direction, to rid us of the problems that bjam has caused.

http://www.cmake.org/

How does cmake differ form the traditional unix make?
Unlike make, it does not actually do the software build, but instead generates standard build scripts (makefiles on Unix, project file for Windows, workspaces for Eclipse/CDT), that makes it easier to adopt to various systems.

Cmake was started as part of the Insight visual ToolKit build from Kitware, and has since migrated into many products. The real explosion occurred with adoption of cmake by KDE. Since then, even more software is converting to cmake.

Cmake is so far able to build Boost, but is not able yet to run its builtin tests. This is a problem that I am sure will be rectified.

The beauty is that cmake is available already as a binary on AIX systems.

I worked with Troy Straszheim and Brad King to try getting cmake working on our Boost build. I got half way but found a problem which I hope to resolve.

We support Boost because IBM AIX and Linux xlC++ compilers have been tested with Boost with support in V8 with 1.32, then V9 with 1.34, and V10.1 with 1.34.1.

You can see our Boost test results here:
http://www.ibm.com/support/docview.wss?rs=2239&context=SSJT9L&uid=swg27006911

2 Comments Permalink
0

As a software developer, it's a challenge finding the right Integrated Development Environment(IDE). IBM is working on an exciting new concept in developing applications for the AIX operating system.

As an AIX developer, you are no doubt used to working on the machine you are compiling on in a remote terminal session using a command line editor such as vi or emacs. While these editors are powerful and efficient, modern IDEs provide powerful tools such as indexed searching, code navigation and on the fly error detection. These powerful tools can help you find bugs sooner and save time usually spent waiting for compilers to finish.

IBM has developed an IDE that allows you to leverage the Eclipse platform on a client workstation to edit and compile C/C++ code directly on AIX. You can import existing projects into your workspace and use the existing tool chain. The IDE also provides the capability to edit, compile, and debug COBOL source code. Intelligent marking of errors allows you to jump quickly to the problem. The platform also allows you to navigate remote directories and perform basic file operations.

A beta level trial version of this technology is available here http://www.alphaworks.ibm.com/tech/cremoteide


Go ahead and give it a try and feel free to comment in the Forum section of the above site.

0 Comments Permalink
2

Are you getting the most performance from your IBM hardware investment? Using an up-to-date compiler is key to hardware exploitation. A no-charge, 60-day evaluation copy of the latest IBM XL C/C++ compiler is downloadable from www.ibm.com, but how can you really test it in scenarios such as seeing how your application behaves on newer hardware, with a different operating system, or on a newer level of an operating system? If you are member of IBM PartnerWorld, the Virtual Loaner Program may be the answer.

The IBM Virtual Loaner Program provides no-charge, self-service Internet access to IBM hardware and middleware for IBM business partners and qualified members of IBM PartnerWorld. The Virtual Loan Program allows a user to select an environment from multiple hardware and operating system configurations. The dedicated, IBM IES/ITSC-certified environment of a VLP system provides the benefits of a secure loaner machine with greater flexibility in scheduling and less hassle than other loaner programs because IBM manages and hosts the hardware. Instead of month-long engagements, a VLP user makes multiple short reservations for a VLP system. The VLP system "remembers" the user's system image so that he or she can return repeatedly during a longer overall period of time to the same type of system and environment, including all files and work in progress.

On a VLP system, a user might evaluate IBM hardware, tools, and middleware, or develop, test, debug, and port applications and solutions. VLP systems might be used to develop product demonstrations or to allow users to see how their application performs with varying amounts of virtual CPU and RAM. Other features of the VLP usage model are full access to the user's own data and build environments and full root access (QSECOFR), which allows the user to install fixes and software.

The IBM Software Access catalog, accessible from an AIX or Linux partition on a VLP system, includes other IBM compilers, such as IBM XL Fortran.

Platforms and operating systems currently supported
POWER5, and POWER6 dedicated (root access) resources:
  • AIX 5.3, AIX 6.1 with WPAR
  • IBM i 5.4, IBM i 6.1
  • Linux:
    • Red Hat: RHEL4, RHEL5
    • SUSE: SLES9, SLES10

Watch for our development blogs about using the XL C/C++ compiler on VLP systems.

For more information, see:

2 Comments Permalink
0

rsc.jpg

Come and learm how to leverage the latest IBM Rational compilation technology to improve programmer productivity, application performance, and return on hardware capital investments. This session " Leverage Compilation Technology to Optimize Return on Investment", will be held on Thursday (June 4). Check out the Conference agenda !

We look forward to seeing you at the conference. Click here to register.

To learn more about IBM Rational Software Conference 2009, please visit the conference web site

0 Comments Permalink
7

In a couple of previous posts ( TOC Overflow: what is it, and why should you care?, Dealing with TOC overflow: the traditional approach ) I have presented the issue of TOC overflow. Now I will discuss some features of the XL compilers that can help bypass TOC overflow while minimizing any negative effects on runtime performance.

1. Minimal TOC: The option -qminimaltoc makes the compiler generate code that uses a single entry in the TOC for each compilation unit (in C/C++ a compilation unit is a source file). In order to do this, a separate level of indirection must be follow in order to access TOC-based variables. This means that the program will be larger and slower than if it did not have TOC overflow, but it will still be faster than using the -bbigtoc option. This is similar to the -mminimal-toc from gcc.

Furthermore, -qminimaltoc does not need to be used on all compilation units, so you can minimize the performance impact by using this flag only on compilation units that are not relevant for performance.

2. IPA: IPA is short for inter-procedural analysis, a form of compiler optimization that looks at the whole program, not just a single compilation unit. For this, the optimizer is invoked during the linking phase of your application, to perform transformations that can affect multiple compilation units.

Applying this process significantly reduces TOC pressure, and in most cases completely eliminates TOC overflow. It does so by restructuring your program to reduce the number of global symbols. The result is similar to what could be achieved through source changes, but avoiding the widespread manual source changes.

In the XL compilers, IPA is implied at optimization levels -O4 and -O5, but those also include other complex optimizations which may not be as relevant to commercial application development. One good alternative is the option -qipa=level=0, which applies a minimal level of whole-program optimization. This is often sufficient to eliminate TOC overflow, but in very large applications you may need -qipa=level=1 instead, which will perform a more aggressive reduction of the TOC requirements, at the cost of a longer compilation process.

Note that for whole-program analysis to be performed, the -qipa option needs to be specified both at the compile and link command lines. This means that the linking of the program has to be done through the compiler driver (xlc, xlC or cc) instead of directly through the system linker (ld). For maximum effect, all source files should be compiled with -qipa, but it is possible to mix-and-match objects compiled with different options and have them interoperate.

If you try these options please add comments to this post describing your results.

7 Comments Permalink
0

SUSE Linux Enterprise Server 11 (SLES 11) for POWER is now available.

IBM XL C/C++ for Linux, V10.1 and XL Fortran for Linux V12.1 are enabled to run on SLES 11. To upgrade, you need download and install the March 2009 Updates from the respective product support page.

For more information, please visit http://www-01.ibm.com/support/docview.wss?uid=swg21382092

0 Comments Permalink
0

Compilers are expected to make volatiles immune to optimizations that result in incorrect access to the volatile variables e.g. reducing the load/stores, re-ordering them, and etc.

A recent study on volatiles identified a few bugs with GCC 4.3.0 and LLVM-GCC 2.2. We put our compiler to test and found none of the three bugs identified in this paper applies. Not bad!

The first test case loads a volatile variable in the loop. Although invariant, we expect the compiler to leave x in the loop. The generated pseudo assembly code at O2 and O3 confirm this.

Here is the source code:
const volatile int x;
volatile int y;
void foo(void)
{
for(y=0; y>10; y++)
{
int z=x;
}
}

The assembly listing of the source code above at O3, below, shows the load of x in each iteration of the unrolled loop:

@1L3 DS 0H
L r0,x(r15,r1,0)
L r0,y(r14,r1,0)
AHI r0,H'1'
ST r0,y(r14,r1,0)
L r0,y(r14,r1,0)
CHI r0,H'10'
BNH @1L5
L r0,x(r15,r1,0)
L r0,y(r14,r1,0)
AHI r0,H'1'
ST r0,y(r14,r1,0)
L r0,y(r14,r1,0)
CHI r0,H'10'
BNH @1L5
L r0,x(r15,r1,0)
L r0,y(r14,r1,0)
AHI r0,H'1'
ST r0,y(r14,r1,0)
L r0,y(r14,r1,0)
CHI r0,H'10'
BNH @1L5
L r0,x(r15,r1,0)
L r0,y(r14,r1,0)
AHI r0,H'1'
ST r0,y(r14,r1,0)
L r0,y(r14,r1,0)
CHI r0,H'10'
BH @1L3

The second test accesses a volatile variable on the fall through path of a condition.

Source is:
extern in qux();
volatile int w;
int bar(void)
{
if(qux())
return 0;
else
return w;
}

In the pseudo listing, below, w is correctly accessed when qux() returns zero. This listing generated at O3 is:

L r15,=V(qux)(,r3,66)
L r2,_CEECAA_(,r12,500)
BASR r14,r15
LTR r15,r15
L r1,=Q(w)(,r3,70)
BE @1L1
LA r15,0
B @1L3
@1L1 DS 0H
L r15,w(r1,r2,0)
@1L3 DS 0H

In the last source code a volatile variable is incremented inside the loop.

volatile int a;
void baz(void)
{
int i;
for(i=0; i<3; i++)
{
a += 7;
}
}

We unroll and the loop by three and access "a" three times. The listing of compile at O3 looks like below.

L r0,a(r14,r1,0)
AHI r0,H'7'
ST r0,a(r14,r1,0)
L r0,a(r14,r1,0)
AHI r0,H'7'
ST r0,a(r14,r1,0)
L r0,a(r14,r1,0)
AHI r0,H'7'
ST r0,a(r14,r1,0)

0 Comments Permalink
2

Assembly language is powerful. It is close to the underlying hardware architecture; it allows you to access low level system features, interacting closely with the operating system; and it has no runtime dependencies. In short, it allows you to do almost anything you want.

But assembly language forces you to spell out all the minor details in the code. You need to focus on the leaves and trees, instead of the forest and the landscape. You need to spend time on the low level details and not the high level logic. You need to hand hold the underlying machine every step of the way. It is tedious. An assembly program can take time to write and debug. The resulting code can be difficult to maintain.

Wouldn’t it be nice if you could use a high level language to do low level programming ?

The METAL option of XL C compiler on z/OS is designed with this in mind. The notion of “metal” is such that the program can operate at the operating system level which is close to the hardware -- i.e. the metal. With the METAL option you can now use C language features to express low level programming logic, for example writing user exits. You are freed from the tedious tasks such as managing the registers and developing the correct instruction sequences while you program can enjoy highly optimized code tuned for the intended hardware.

The XL C compiler generated code requires the Language Environment to establish an overall execution context. C library functions also require the Language Environment in order to provide their services; most noticeably the functions to manage the heap storage and dynamic storage area, to do file input-output, and to handle exception conditions. If your C program needs to get close to the metal, the compiler will have to provide alternate means so that you get these services directly from the operating system, without going through LE. The operating system provides these through assembler macros. But you cannot invoke assembler macros from within C code unless you can embed short sequences of assembly instructions into a C program. This is what the METAL option provides.

The METAL option generates code in assembler source program format. You can then feed the assembly source through the High Level Assembler, like any other assembly programs. The resulting code is independent of the Language Environment. It can be used like other assembly code without tying to a runtime. You can write a statement like if (x + y * z > 0) x = -x, and the XL C compiler will turn it into assembly instructions. You can quickly try this out with a simple program:

<hello.c>
int abc;
int xyz;

int main() {
xyz = abc + abc;
return xyz;
}
</hello.c>

Under z/OS UNIX, you can use the xlc command to compiler hello.c as follows:

xlc -S -qMETAL hello.c

The -S flag ask the compiler to generate assembler code; the output file is hello.s. -qMETAL specifies the METAL compiler option. You can then assemble and linkedit hello.s like any other assembler source file.

With the METAL option, you can use certain assembler macros directly from within a C program. I will write more about this in the future.

Note: For further information on the METAL compiler option and related features, see z/OS Metal C Programming Guide and Reference (http://publib.boulder.ibm.com/infocenter/zos/v1r9/index.jsp?topic=/com.ibm.zos.r9.cbcux01/metal.htm) and z/OS XL C/C++ User's Guide (http://publib.boulder.ibm.com/infocenter/zos/v1r9/index.jsp?topic=/com.ibm.zos.r9.cbcux01/metal.htm)

2 Comments Permalink
0

The following is a a private communication from an IBM engineer Matthew Markland who asked a great question. I do not claim great expertise but I feel that there is enough of an opinion piece that some folks may like to see this discussion or continue it. I have edited the response somewhat but it is largely in tact and reprinted with Matthew's permission. Note I have no insight into PGI or any other product other then what I read in public articles, and as such makes no product claim. Any opinion regarding other company remain necessarily my own and is not IBM's position.

Michael:

I hope that the new year finds everything well for you and yours.
I'm enjoying the C/C++ Cafe posts you guys put out immensely. :-)

Please join in the discussion, or even bring this up as other experts will chime in.

I just wanted to get your opinion on some things that have been
going through my mind with respect to the multicore/hybrid
programming models that are being put out by various entities. It
seems that many people believe that the best model is an extension
to the language model, be it a pure language extension like what
CUDA and OpenCL have, or with a new model of pragmas like PGI is

OpenCL/CUDA is mostly a library based model and a language extension(modulo the 4 memory annotations). But yes I see where you are going with this ...

adding. I'm wondering, especially in the case of the PGI extensions,
I am assuming this is the pragma directives available in their technology preview:

#pragma acc directive-name [clause [,clause] ...] 


whether they make sense given the existing OpenMP spec
Where do you
So there has been parallel languages that are directive based, language extensions, and library based. Usually they start off with library based because they are easy to port, and works on many vendors' compiler. Language-based solutions are harder to implement, and can not be easily corrected if wrong. Directive-based like OpenMP makes it easily adapted in an incremental manner, and keeps the base program running even on platforms that don't accept the directive. Today, we have examples of all three. MPI is a pure library based solution. Cilk is a pure language based solution and OpenMP is a directive-based solution (although it too has a library part).

A mostly library based language like OpenCL is in a sense a step backwards. So PGI is trying a directive based approach to send the computational kernel to the accelerator/GPGPU. This is a bet from their part. I am familiar with their chief compiler engineer on the OpenMP Committee Michael Wolfe, and respects his opinion.

see this headed from a personal perspective.

Having some involvement in OpenCL, I can see where it falls somewhat short, but is nevertheless a tremendous accomplishment. It is designed for today's GPGPU architecture, assumes a weak memory model, implicitly have a dual layer of scheduling policy between the host (outer asynchronous layer) and the thread processors (inner synchronous processors with local memory). This is in addition to it being still relatively hard to program,( though easier then DirectX or OpenGL) and for people who have to port a 100,000 line of code is a large commitment on a technology that may not be around. OpenCL, is still a stream processing language and as such is limited in the scope of the parallel programs it can speed-up. What PGI is probably looking for is a more generalized programming model which works in broader situation. That is why they introduced the scheduling clause, and tied it to OpenMP. I would not be surprised if some kind of heterogenous programming support would be in OpenMP in future.

I don't have any significant personal insight but also is involved in adapting the OpenMP paradigm to fit in the next programming model without knowing where to go.

In the end (and this is based on Michael Wolfe's excellent analogy in an HPC paper), OpenCL is basically designed for a hardware that is a large wide body air carrier that can handle massive number of passengers in one run, but requires special airport transportation to get the passengers to the plane because the plane doesn't fit in the terminal. So the speed it has (in terms of # of passenger-miles) is mitigated by the wait time (DMA access)of loading the plane. It works when everything fits.

If you don't have that many passengers, or have a variable number of passengers, it doesn't buy you any extra benefit and may penalize you with a super wide-body jet. And there are lots of other kinds of air carriers out there, including the super-fast kind for the payload just has to get there by 9 am the next day and the medium sized ones that can carry your particular amount of load.
As such, there will still be a place for OpenMP, MPI, TBB, futures, UPC, TM. We are suffering under an alarming number of these so-called parallel languages/extension/libraries lately and I can only see more as we all search for the right model. At one point, we had the same in terms of sequential languages, and over time we have dwindled down into a few General Purpose languages with many domain-specific languages. The same will likely happen in the parallel language world.

0 Comments Permalink
RSS feed of this list 1 2 3 Previous Next

RSS Feed

Bottom Banner