<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:clearspace="http://www.jivesoftware.com/xmlns/clearspace/rss" version="2.0">
  <channel>
    <title>Parallel and Multi-Core Computing with C/C++</title>
    <link>http://www-949.ibm.com/software/rational/cafe/blogs/ccpp-parallel-multicore</link>
    <description />
    <pubDate>Mon, 26 Jan 2009 01:26:34 GMT</pubDate>
    <generator>Clearspace 1.10.7 (http://jivesoftware.com/products/clearspace/)</generator>
    <dc:date>2009-01-26T01:26:34Z</dc:date>
    <item>
      <title>Where is OpenCL and OpenMP headed?</title>
      <link>http://www-949.ibm.com/software/rational/cafe/blogs/ccpp-parallel-multicore/2009/01/25/where-is-opencl-and-openmp-headed</link>
      <description>The following is a a private communication from an IBM engineer Matthew Markland who asked a great question. I do not claim great expertise but I feel that there is enough of an opinion piece that some folks may like to see this discussion or continue it. I have edited the response somewhat but it is largely in tact and reprinted with Matthew's permission. Note I have no insight into PGI or any other product other then what I read in public articles, and as such makes no product claim. Any opinion regarding other company remain necessarily my own and is not IBM's position.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="jive-quote"&gt;Michael:&lt;br /&gt;
&lt;p /&gt;
I hope that the new year finds everything well for you and yours. &lt;br /&gt;
I'm enjoying the C/C++ Cafe posts you guys put out immensely. &lt;img class="jive-emoticon" border="0" src="http://www-949.ibm.com/software/rational/cafe/images/emoticons/happy.gif" alt=":-)" /&gt;&lt;/div&gt;
Please join in the discussion, or even bring this up as other experts will chime in.&lt;br /&gt;
&lt;div class="jive-quote"&gt; &lt;br /&gt;
I just wanted to get your opinion on some things that have been &lt;br /&gt;
going through my mind with respect to the multicore/hybrid &lt;br /&gt;
programming models that are being put out by various entities. It &lt;br /&gt;
seems that many people believe that the best model is an extension &lt;br /&gt;
to the language model, be it a pure language extension like what &lt;br /&gt;
CUDA and OpenCL have, or with a new model of pragmas like PGI is &lt;/div&gt;
&lt;br /&gt;
OpenCL/CUDA is mostly a library based model and a language extension(modulo the 4 memory annotations). But yes I see where you are going with this ...&lt;br /&gt;
&lt;br /&gt;
&lt;div class="jive-quote"&gt;adding. I'm wondering, especially in the case of the PGI extensions,&lt;/div&gt;
I am assuming this is the pragma directives available in their technology preview:&lt;br /&gt;
&lt;br /&gt;
&lt;pre class="jive-pre"&gt;&lt;code class="jive-code jive-plain"&gt;#pragma acc directive-name [clause [,clause] ...] 
&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;
&lt;br /&gt;
&lt;div class="jive-quote"&gt;whether they make sense given the existing OpenMP spec &lt;br /&gt;
Where do you &lt;/div&gt;
So there has been parallel languages that are directive based, language extensions, and library based. Usually they start off with library based because they are easy to port, and works on many vendors' compiler. Language-based solutions are harder to implement, and can not be easily corrected if wrong. Directive-based like OpenMP makes it easily adapted in an incremental manner, and keeps the base program running even on platforms that don't accept the directive. Today, we have examples of all three. MPI is a pure library based solution. Cilk is a pure language based solution and OpenMP is a directive-based solution (although it too has a library part).&lt;br /&gt;
&lt;br /&gt;
A mostly library based language like OpenCL is in a sense a step backwards. So PGI is trying a directive based approach to send the computational kernel to the accelerator/GPGPU. This is a bet from their part. I am familiar with their chief compiler engineer on the OpenMP Committee Michael Wolfe, and respects his opinion.&lt;br /&gt;
&lt;br /&gt;
&lt;div class="jive-quote"&gt;see this headed from a personal perspective.&lt;/div&gt;
&lt;br /&gt;
Having some involvement in OpenCL, I can see where it falls somewhat short, but is nevertheless a tremendous accomplishment. It is designed for today's GPGPU architecture, assumes a weak memory model, implicitly have a dual layer of scheduling policy between the host (outer asynchronous layer) and the thread processors (inner synchronous processors with local memory). This is in addition to it being still relatively hard to program,( though easier then DirectX or OpenGL) and for people who have to port a 100,000 line of code is a large commitment on a technology that may not be around. OpenCL, is still a stream processing language and as such is limited in the scope of the parallel programs it can speed-up. What PGI is probably looking for is a more generalized programming model which works in broader situation. That is why they introduced the scheduling clause, and tied it to OpenMP. I would not be surprised if some kind of heterogenous programming support would be in OpenMP in future.&lt;br /&gt;
&lt;br /&gt;
I don't have any significant personal insight but also is involved in adapting the OpenMP paradigm to fit in the next programming model without knowing where to go. &lt;br /&gt;
&lt;br /&gt;
In the end (and this is based on Michael Wolfe's excellent analogy in an HPC paper), OpenCL is basically designed for a hardware that is a large wide body air carrier that can handle massive number of passengers in one run, but requires special airport transportation to get the passengers to the plane because the plane doesn't fit in the terminal. So the speed it has (in terms of # of passenger-miles) is mitigated by the wait time (DMA access)of loading the plane. It works when everything fits.&lt;br /&gt;
&lt;br /&gt;
If you don't have that many passengers, or have a variable number of passengers, it doesn't buy you any extra benefit and may penalize you with a super wide-body jet. And there are lots of other kinds of air carriers out there, including the super-fast kind for the payload just has to get there by 9 am the next day and the medium sized ones that can carry your particular amount of load.&lt;br /&gt;
As such, there will still be a place for OpenMP, MPI, TBB, futures, UPC, TM. We are suffering under an alarming number of these so-called parallel languages/extension/libraries lately and I can only see more as we all search for the right model. At one point, we had the same in terms of sequential languages, and over time we have dwindled down into a few General Purpose languages with many domain-specific languages. The same will likely happen in the parallel language world.</description>
      <category domain="http://www-949.ibm.com/software/rational/cafe/blogs/ccpp-parallel-multicore/tags">openmp</category>
      <category domain="http://www-949.ibm.com/software/rational/cafe/blogs/ccpp-parallel-multicore/tags">opencl</category>
      <pubDate>Mon, 26 Jan 2009 01:55:49 GMT</pubDate>
      <author>Michael_Wong</author>
      <guid>http://www-949.ibm.com/software/rational/cafe/blogs/ccpp-parallel-multicore/2009/01/25/where-is-opencl-and-openmp-headed</guid>
      <dc:date>2009-01-26T01:55:49Z</dc:date>
      <clearspace:dateToText>10 months, 2 days ago</clearspace:dateToText>
      <wfw:comment>http://www-949.ibm.com/software/rational/cafe/blogs/ccpp-parallel-multicore/comment/where-is-opencl-and-openmp-headed</wfw:comment>
      <wfw:commentRss>http://www-949.ibm.com/software/rational/cafe/blogs/ccpp-parallel-multicore/feeds/comments?blogPostID=1143</wfw:commentRss>
    </item>
    <item>
      <title>OpenCL 1.0 released</title>
      <link>http://www-949.ibm.com/software/rational/cafe/blogs/ccpp-parallel-multicore/2009/01/05/opencl-10-released</link>
      <description>One of the most important thing that happened in the last month of 2008, was the release of the OpenCL specification by Khronos:&lt;br /&gt;
&lt;br /&gt;
&lt;a class="jive-link-external" href="http://www.khronos.org/registry/cl/"&gt;http://www.khronos.org/registry/cl/&lt;/a&gt;&lt;br /&gt;
&lt;br /&gt;
An &lt;a class="jive-link-external" href="http://www.khronos.org/developers/library/overview/opencl_overview.pdf"&gt;in-depth overview&lt;/a&gt; which breaksdown by Clause of the specification shows some of the capabilities of this specification. A shorter &lt;a class="jive-link-external" href="http://www.khronos.org/opencl/presentations/OpenCL_Summary_Nov08.pdf"&gt;summary&lt;/a&gt; will provide an overview.&lt;br /&gt;
&lt;br /&gt;
IBM is part of the group that wrote this specification.&lt;br /&gt;
&lt;br /&gt;
What is OpenCL? &lt;br /&gt;
&lt;br /&gt;
The original intent of OpenCL was to raise the abstraction of graphics programming. Game programmers will recall the battle for graphic programming using DirectX and OpenGL, which are specialized graphics languages. Now they can speak OpenCL and never have to learn these specialized graphics languages.&lt;br /&gt;
NVIDIA's vendor-specialized language CUDA was meant to do this using C, and to some extent, AMD's Close-To-Metal, and of course Microsoft's DIRECTX 11 Compute.. But an open, royal-free specification makes it far easier for everyone to invest their programming time to this. Apple, AMD, NVIDIA, and Intel are also members who participated in this specification. I wonder if Intel's Larrabee will support this.&lt;br /&gt;
&lt;br /&gt;
For parallel programming, this specification enables both data-based and task-based parallelism. It enables programmers to exploit the powers of Graphics Processing Units as General Computing Devices (GPGPU), which has recently been known to give significant speedup in specific applications.&lt;br /&gt;
&lt;br /&gt;
Unlike other parallel languages, OpenCL is aimed towards supporting heterogenous computing.</description>
      <category domain="http://www-949.ibm.com/software/rational/cafe/blogs/ccpp-parallel-multicore/tags">opencl</category>
      <category domain="http://www-949.ibm.com/software/rational/cafe/blogs/ccpp-parallel-multicore/tags">cell</category>
      <pubDate>Mon, 05 Jan 2009 14:22:06 GMT</pubDate>
      <author>Michael_Wong</author>
      <guid>http://www-949.ibm.com/software/rational/cafe/blogs/ccpp-parallel-multicore/2009/01/05/opencl-10-released</guid>
      <dc:date>2009-01-05T14:22:06Z</dc:date>
      <clearspace:dateToText>10 months, 3 weeks ago</clearspace:dateToText>
      <clearspace:replyCount>1</clearspace:replyCount>
      <wfw:comment>http://www-949.ibm.com/software/rational/cafe/blogs/ccpp-parallel-multicore/comment/opencl-10-released</wfw:comment>
      <wfw:commentRss>http://www-949.ibm.com/software/rational/cafe/blogs/ccpp-parallel-multicore/feeds/comments?blogPostID=1136</wfw:commentRss>
    </item>
  </channel>
</rss>

