enacted SIMD examples (299) and misc changes (432)

This commit is contained in:
Henry Jin 2015-02-04 16:05:23 -08:00
parent 82b82718fb
commit a633c88c31
50 changed files with 1020 additions and 108 deletions

View File

@ -1,3 +1,47 @@
[2-Feb-2015] Version 4.0.2
Changes from 4.0.1ltx
1. Source code changes (Ticket #342)
2. New addition (Ticket #299)
Examples_SIMD.tex
Example_SIMD.1c.c
Example_SIMD.1f.f
Example_SIMD.2c.c
Example_SIMD.2f.f
Example_SIMD.3c.c
Example_SIMD.3f.f
Example_SIMD.4c.c
Example_SIMD.4f.f
Example_SIMD.5c.c
Example_SIMD.5f.f
Example_SIMD.6c.c
Example_SIMD.6f.f
Example_SIMD.7c.c
Example_SIMD.7f.f
Example_SIMD.8c.c
Example_SIMD.8f.f
3. Other changes
- Move task depedence examples from tasking to a separate chapter.
tasking.15-19 -> task_dep.1-5
- Fix broken links
-Chap-4 (icv), page 11: "According to $"
According to Section 2.3 of the OpenMP 4.0 specification
-Chap-10 (fort_loopvar), page 31: "see $ and $"
see Section 2.7.1 and Section 2.14.1 of the OpenMP 4.0 specification
-Chap-12 (collapse), page 39: "According to $"
According to Section 2.12.8 of the OpenMP 4.0 specification
-Chap-16 (tasking). page 54, 57: "illustrated in $"
illustrated in Section 2.11.3 of the OpenMP 4.0 specification
[6-Jan-2015] Version 4.0.1ltx
Changes from 4.0.1ltx-21Nov-2014

109
Examples_SIMD.tex Normal file
View File

@ -0,0 +1,109 @@
\pagebreak
\chapter{SIMD Constructs}
\label{chap:SIMD}
The following examples illustrate the use of SIMD constructs for vectorization.
Compilers may not vectorize loops when they are complex or possibly have
dependencies, even though the programmer is certain the loop will execute
correctly as a vectorized loop. The \code{simd} construct assures the compiler
that the loop can be vectorized.
\cexample{SIMD}{1c}
\fexample{SIMD}{1f}
When a function can be inlined within a loop the compiler has an opportunity to
vectorize the loop. By guaranteeing SIMD behavior of a function's operations,
characterizing the arguments of the function and privatizing temporary
variables of the loop, the compiler can often create faster, vector code for
the loop. In the examples below the \code{declare} \code{simd} construct is
used on the \plc{add1} and \plc{add2} functions to enable creation of their
corresponding SIMD function versions for execution within the associated SIMD
loop. The functions characterize two different approaches of accessing data
within the function: by a single variable and as an element in a data array,
respectively. The \plc{add3} C function uses dereferencing.
The \code{declare} \code{simd} constructs also illustrate the use of
\code{uniform} and \code{linear} clauses. The \code{uniform(fact)} clause
indicates that the variable \plc{fact} is invariant across the SIMD lanes. In
the \plc{add2} function \plc{a} and \plc{b} are included in the \code{unform}
list because the C pointer and the Fortran array references are constant. The
\plc{i} index used in the \plc{add2} function is included in a \code{linear}
clause with a constant-linear-step of 1, to guarantee a unity increment of the
associated loop. In the \code{declare} \code{simd} construct for the \plc{add3}
C function the \code{linear(a,b:1)} clause instructs the compiler to generate
unit-stride loads across the SIMD lanes; otherwise, costly \emph{gather}
instructions would be generated for the unknown sequence of access of the
pointer dereferences.
In the \code{simd} constructs for the loops the \code{private(tmp)} clause is
necessary to assure that the each vector operation has its own \plc{tmp}
variable.
\cexample{SIMD}{2c}
\fexample{SIMD}{2f}
A thread that encounters a SIMD construct executes a vectorized code of the
iterations. Similar to the concerns of a worksharing loop a loop vectorized
with a SIMD construct must assure that temporary and reduction variables are
privatized and declared as reductions with clauses. The example below
illustrates the use of \code{private} and \code{reduction} clauses in a SIMD
construct.
\cexample{SIMD}{3c}
\fexample{SIMD}{3f}
A \code{safelen(N)} clause in a \code{simd} construct assures the compiler that
there are no loop-carried dependencies for vectors of size \plc{N} or below. If
the \code{safelen} clause is not specified, then the default safelen value is
the number of loop iterations.
The \code{safelen(16)} clause in the example below guarantees that the vector
code is safe for vectors up to and including size 16. In the loop, \plc{m} can
be 16 or greater, for correct code execution. If the value of \plc{m} is less
than 16, the behavior is undefined.
\cexample{SIMD}{4c}
\fexample{SIMD}{4f}
The following SIMD construct instructs the compiler to collapse the \plc{i} and
\plc{j} loops into a single SIMD loop in which SIMD chunks are executed by
threads of the team. Within the workshared loop chunks of a thread, the SIMD
chunks are executed in the lanes of the vector units.
\cexample{SIMD}{5c}
\fexample{SIMD}{5f}
The following examples illustrate the use of the \code{declare} \code{simd}
construct with the \code{inbranch} and \code{notinbranch} clauses. The
\code{notinbranch} clause informs the compiler that the function \plc{foo} is
never called conditionally in the SIMD loop of the function \plc{myaddint}. On
the other hand, the \code{inbranch} clause for the function goo indicates that
the function is always called conditionally in the SIMD loop inside
the function \plc{myaddfloat}.
\cexample{SIMD}{6c}
\fexample{SIMD}{6f}
In the code below, the function \plc{fib()} is called in the main program and
also recursively called in the function \plc{fib()} within an \code{if}
condition. The compiler creates a masked vector version and a non-masked vector
version for the function \plc{fib()} while retaining the original scalar
version of the \plc{fib()} function.
\cexample{SIMD}{7c}
\fexample{SIMD}{7f}

View File

@ -47,7 +47,8 @@ that loop is divided among the threads in the current team. An \code{ordered}
clause is added to the loop construct, because an ordered region binds to the loop
region arising from the loop construct.
According to \$, a thread must not execute more than one ordered region that binds
According to Section 2.12.8 of the OpenMP 4.0 specification,
a thread must not execute more than one ordered region that binds
to the same loop region. So the \code{collapse} clause is required for the example
to be conforming. With the \code{collapse} clause, the iterations of the \code{k}
and \code{j} loops are collapsed into one loop, and therefore only one ordered

View File

@ -5,7 +5,8 @@
In general loop iteration variables will be private, when used in the \plc{do-loop}
of a \code{do} and \code{parallel do} construct or in sequential loops in a
\code{parallel} construct (see \$ and \$). In the following example of a sequential
\code{parallel} construct (see Section 2.7.1 and Section 2.14.1 of
the OpenMP 4.0 specification). In the following example of a sequential
loop in a \code{parallel} construct the loop iteration variable \plc{I} will
be private.

View File

@ -2,7 +2,7 @@
\chapter{Internal Control Variables (ICVs)}
\label{chap:icv}
According to \$, an OpenMP implementation must act as if there are ICVs that control
According to Section 2.3 of the OpenMP 4.0 specification, an OpenMP implementation must act as if there are ICVs that control
the behavior of the program. This example illustrates two ICVs, \plc{nthreads-var}
and \plc{max-active-levels-var}. The \plc{nthreads-var} ICV controls the
number of threads requested for encountered parallel regions; there is one copy

72
Examples_task_dep.tex Normal file
View File

@ -0,0 +1,72 @@
\pagebreak
\chapter{Task Dependences}
\label{chap:task_dep}
\section{Flow Dependence}
In this example we show a simple flow dependence expressed using the \code{depend}
clause on the \code{task} construct.
\cexample{task_dep}{1c}
\fexample{task_dep}{1f}
The program will always print \texttt{"}x = 2\texttt{"}, because the \code{depend}
clauses enforce the ordering of the tasks. If the \code{depend} clauses had been
omitted, then the tasks could execute in any order and the program and the program
would have a race condition.
\section{Anti-dependence}
In this example we show an anti-dependence expressed using the \code{depend}
clause on the \code{task} construct.
\cexample{task_dep}{2c}
\fexample{task_dep}{2f}
The program will always print \texttt{"}x = 1\texttt{"}, because the \code{depend}
clauses enforce the ordering of the tasks. If the \code{depend} clauses had been
omitted, then the tasks could execute in any order and the program would have a
race condition.
\section{Output Dependence}
In this example we show an output dependence expressed using the \code{depend}
clause on the \code{task} construct.
\cexample{task_dep}{3c}
\fexample{task_dep}{3f}
The program will always print \texttt{"}x = 2\texttt{"}, because the \code{depend}
clauses enforce the ordering of the tasks. If the \code{depend} clauses had been
omitted, then the tasks could execute in any order and the program would have a
race condition.
\section{Concurrent Execution with Dependences}
In this example we show potentially concurrent execution of tasks using multiple
flow dependences expressed using the \code{depend} clause on the \code{task}
construct.
\cexample{task_dep}{4c}
\fexample{task_dep}{4f}
The last two tasks are dependent on the first task. However there is no dependence
between the last two tasks, which may execute in any order (or concurrently if
more than one thread is available). Thus, the possible outputs are \texttt{"}x
+ 1 = 3. x + 2 = 4. \texttt{"} and \texttt{"}x + 2 = 4. x + 1 = 3. \texttt{"}.
If the \code{depend} clauses had been omitted, then all of the tasks could execute
in any order and the program would have a race condition.
\section{Matrix multiplication}
This example shows a task-based blocked matrix multiplication. Matrices are of
NxN elements, and the multiplication is implemented using blocks of BSxBS elements.
\cexample{task_dep}{5c}
\fexample{task_dep}{5f}

View File

@ -74,7 +74,8 @@ the task generating loop was in a tied task.
\fexample{tasking}{6f}
The following two examples demonstrate how the scheduling rules illustrated in
\$ affect the usage of \code{threadprivate} variables in tasks. A \code{threadprivate}
Section 2.11.3 of the OpenMP 4.0 specification affect the usage of
\code{threadprivate} variables in tasks. A \code{threadprivate}
variable can be modified by another task that is executed by the same thread. Thus,
the value of a \code{threadprivate} variable cannot be assumed to be unchanged
across a task scheduling point. In untied tasks, task scheduling points may be
@ -101,7 +102,8 @@ task scheduling point.
\fexample{tasking}{8f}
The following two examples demonstrate how the scheduling rules illustrated in
\$ affect the usage of locks and critical sections in tasks. If a lock is held
Section 2.11.3 of the OpenMP 4.0 specification affect the usage of locks
and critical sections in tasks. If a lock is held
across a task scheduling point, no attempt should be made to acquire the same lock
in any code that may be interleaved. Otherwise, a deadlock is possible.
@ -186,73 +188,3 @@ are usually the opposite.
\fexample{tasking}{14f}
\section*{Task Dependences}
\section{Flow Dependence}
In this example we show a simple flow dependence expressed using the \code{depend}
clause on the \code{task} construct.
\cexample{tasking}{15c}
\fexample{tasking}{15f}
The program will always print \texttt{"}x = 2\texttt{"}, because the \code{depend}
clauses enforce the ordering of the tasks. If the \code{depend} clauses had been
omitted, then the tasks could execute in any order and the program and the program
would have a race condition.
\section{Anti-dependence}
In this example we show an anti-dependence expressed using the \code{depend}
clause on the \code{task} construct.
\cexample{tasking}{16c}
\fexample{tasking}{16f}
The program will always print \texttt{"}x = 1\texttt{"}, because the \code{depend}
clauses enforce the ordering of the tasks. If the \code{depend} clauses had been
omitted, then the tasks could execute in any order and the program would have a
race condition.
\section{Output Dependence}
In this example we show an output dependence expressed using the \code{depend}
clause on the \code{task} construct.
\cexample{tasking}{17c}
\fexample{tasking}{17f}
The program will always print \texttt{"}x = 2\texttt{"}, because the \code{depend}
clauses enforce the ordering of the tasks. If the \code{depend} clauses had been
omitted, then the tasks could execute in any order and the program would have a
race condition.
\section{Concurrent Execution with Dependences}
In this example we show potentially concurrent execution of tasks using multiple
flow dependences expressed using the \code{depend} clause on the \code{task}
construct.
\cexample{tasking}{18c}
\fexample{tasking}{18f}
The last two tasks are dependent on the first task. However there is no dependence
between the last two tasks, which may execute in any order (or concurrently if
more than one thread is available). Thus, the possible outputs are \texttt{"}x
+ 1 = 3. x + 2 = 4. \texttt{"} and \texttt{"}x + 2 = 4. x + 1 = 3. \texttt{"}.
If the \code{depend} clauses had been omitted, then all of the tasks could execute
in any order and the program would have a race condition.
\section{Matrix multiplication}
This example shows a task-based blocked matrix multiplication. Matrices are of
NxN elements, and the multiplication is implemented using blocks of BSxBS elements.
\cexample{tasking}{19c}
\fexample{tasking}{19f}

View File

@ -1,7 +1,7 @@
# Makefile for the OpenMP Examples document in LaTex format.
# For more information, see the master document, openmp-examples.tex.
version=4.0.1ltx
version=4.0.2
default: openmp-examples.pdf
@ -24,6 +24,7 @@ CHAPTERS=Title_Page.tex \
Examples_fpriv_sections.tex \
Examples_single.tex \
Examples_tasking.tex \
Examples_task_dep.tex \
Examples_taskgroup.tex \
Examples_taskyield.tex \
Examples_workshare.tex \
@ -57,6 +58,7 @@ CHAPTERS=Title_Page.tex \
Examples_lock_owner.tex \
Examples_simple_lock.tex \
Examples_nestable_lock.tex \
Examples_SIMD.tex \
Examples_target.tex \
Examples_target_data.tex \
Examples_target_update.tex \

View File

@ -17,14 +17,14 @@
\vspace{1.0in}
\textbf{Version 4.0.1.ltx -- February, 2014}
\textbf{Version 4.0.2 -- February, 2015}
\end{center}
\end{adjustwidth}
\vspace{3.0in}
\begin{adjustwidth}{0pt}{1em}\setlength{\parskip}{0.25\baselineskip}%
Copyright © 1997-2014 OpenMP Architecture Review Board.\\
Copyright © 1997-2015 OpenMP Architecture Review Board.\\
Permission to copy without fee all or part of this material is granted,
provided the OpenMP Architecture Review Board copyright notice and
the title of this document appear. Notice is given that copying is by

View File

@ -48,7 +48,7 @@
\documentclass[10pt,letterpaper,twoside,makeidx,hidelinks]{scrreprt}
% Text to appear in the footer on even-numbered pages:
\newcommand{\footerText}{OpenMP Examples Version 4.0.1 - February 2014}
\newcommand{\footerText}{OpenMP Examples Version 4.0.2 - February 2015}
% Unified style sheet for OpenMP documents:
\input{openmp.sty}
@ -91,6 +91,7 @@
\input{Examples_fpriv_sections}
\input{Examples_single}
\input{Examples_tasking}
\input{Examples_task_dep}
\input{Examples_taskgroup}
\input{Examples_taskyield}
\input{Examples_workshare}
@ -124,6 +125,7 @@
\input{Examples_lock_owner}
\input{Examples_simple_lock}
\input{Examples_nestable_lock}
\input{Examples_SIMD}
\input{Examples_target}
\input{Examples_target_data}
\input{Examples_target_update}

14
sources/Example_SIMD.1c.c Normal file
View File

@ -0,0 +1,14 @@
/*
* @@name: SIMD.1c
* @@type: C
* @@compilable: yes
* @@linkable: no
* @@expect: success
*/
void star( double *a, double *b, double *c, int n, int *ioff )
{
int i;
#pragma omp simd
for ( i = 0; i < n; i++ )
a[i] *= b[i] * c[i+ *ioff];
}

17
sources/Example_SIMD.1f.f Normal file
View File

@ -0,0 +1,17 @@
! @@name: SIMD.1f
! @@type: F-free
! @@compilable: yes
! @@linkable: no
! @@expect: success
subroutine star(a,b,c,n,ioff_ptr)
implicit none
double precision :: a(*),b(*),c(*)
integer :: n, i
integer, pointer :: ioff_ptr
!$omp simd
do i = 1,n
a(i) = a(i) * b(i) * c(i+ioff_ptr)
end do
end subroutine

62
sources/Example_SIMD.2c.c Normal file
View File

@ -0,0 +1,62 @@
/*
* @@name: SIMD.2c
* @@type: C
* @@compilable: yes
* @@linkable: yes
* @@expect: success
*/
#include <stdio.h>
#pragma omp declare simd uniform(fact)
double add1(double a, double b, double fact)
{
double c;
c = a + b + fact;
return c;
}
#pragma omp declare simd uniform(a,b,fact) linear(i:1)
double add2(double *a, double *b, int i, double fact)
{
double c;
c = a[i] + b[i] + fact;
return c;
}
#pragma omp declare simd uniform(fact) linear(a,b:1)
double add3(double *a, double *b, double fact)
{
double c;
c = *a + *b + fact;
return c;
}
void work( double *a, double *b, int n )
{
int i;
double tmp;
#pragma omp simd private(tmp)
for ( i = 0; i < n; i++ ) {
tmp = add1( a[i], b[i], 1.0);
a[i] = add2( a, b, i, 1.0) + tmp;
a[i] = add3(&a[i], &b[i], 1.0);
}
}
int main(){
int i;
const int N=32;
double a[N], b[N];
for ( i=0; i<N; i++ ) {
a[i] = i; b[i] = N-i;
}
work(a, b, N );
for ( i=0; i<N; i++ ) {
printf("%d %f\n", i, a[i]);
}
return 0;
}

48
sources/Example_SIMD.2f.f Normal file
View File

@ -0,0 +1,48 @@
! @@name: SIMD.2f
! @@type: F-free
! @@compilable: yes
! @@linkable: yes
! @@expect: success
program main
implicit none
integer, parameter :: N=32
integer :: i
double precision :: a(N), b(N)
do i = 1,N
a(i) = i-1
b(i) = N-(i-1)
end do
call work(a, b, N )
do i = 1,N
print*, i,a(i)
end do
end program
function add1(a,b,fact) result(c)
!$omp declare simd(add1) uniform(fact)
implicit none
double precision :: a,b,fact, c
c = a + b + fact
end function
function add2(a,b,i, fact) result(c)
!$omp declare simd(add2) uniform(a,b,fact) linear(i:1)
implicit none
integer :: i
double precision :: a(*),b(*),fact, c
c = a(i) + b(i) + fact
end function
subroutine work(a, b, n )
implicit none
double precision :: a(n),b(n), tmp
integer :: n, i
double precision, external :: add1, add2
!$omp simd private(tmp)
do i = 1,n
tmp = add1(a(i), b(i), 1.0d0)
a(i) = add2(a, b, i, 1.0d0) + tmp
a(i) = a(i) + b(i) + 1.0d0
end do
end subroutine

19
sources/Example_SIMD.3c.c Normal file
View File

@ -0,0 +1,19 @@
/*
* @@name: SIMD.3c
* @@type: C
* @@compilable: yes
* @@linkable: no
* @@expect: success
*/
double work( double *a, double *b, int n )
{
int i;
double tmp, sum;
sum = 0.0;
#pragma omp simd private(tmp) reduction(+:sum)
for (i = 0; i < n; i++) {
tmp = a[i] + b[i];
sum += tmp;
}
return sum;
}

18
sources/Example_SIMD.3f.f Normal file
View File

@ -0,0 +1,18 @@
! @@name: SIMD.3f
! @@type: F-free
! @@compilable: yes
! @@linkable: no
! @@expect: success
subroutine work( a, b, n, sum )
implicit none
integer :: i, n
double precision :: a(n), b(n), sum, tmp
sum = 0.0d0
!$omp simd private(tmp) reduction(+:sum)
do i = 1,n
tmp = a(i) + b(i)
sum = sum + tmp
end do
end subroutine work

14
sources/Example_SIMD.4c.c Normal file
View File

@ -0,0 +1,14 @@
/*
* @@name: SIMD.4c
* @@type: C
* @@compilable: yes
* @@linkable: no
* @@expect: success
*/
void work( float *b, int n, int m )
{
int i;
#pragma omp simd safelen(16)
for (i = m; i < n; i++)
b[i] = b[i-m] - 1.0f;
}

15
sources/Example_SIMD.4f.f Normal file
View File

@ -0,0 +1,15 @@
! @@name: SIMD.4f
! @@type: F-free
! @@compilable: yes
! @@linkable: no
! @@expect: success
subroutine work( b, n, m )
implicit none
real :: b(n)
integer :: i,n,m
!$omp simd safelen(16)
do i = m+1, n
b(i) = b(i-m) - 1.0
end do
end subroutine work

19
sources/Example_SIMD.5c.c Normal file
View File

@ -0,0 +1,19 @@
/*
* @@name: SIMD.5c
* @@type: C
* @@compilable: yes
* @@linkable: no
* @@expect: success
*/
void work( double **a, double **b, double **c, int n )
{
int i, j;
double tmp;
#pragma omp for simd collapse(2) private(tmp)
for (i = 0; i < n; i++) {
for (j = 0; j < n; j++) {
tmp = a[i][j] + b[i][j];
c[i][j] = tmp;
}
}
}

19
sources/Example_SIMD.5f.f Normal file
View File

@ -0,0 +1,19 @@
! @@name: SIMD.5f
! @@type: F-free
! @@compilable: yes
! @@linkable: no
! @@expect: success
subroutine work( a, b, c, n )
implicit none
integer :: i,j,n
double precision :: a(n,n), b(n,n), c(n,n), tmp
!$omp for simd collapse(2) private(tmp)
do j = 1,n
do i = 1,n
tmp = a(i,j) + b(i,j)
c(i,j) = tmp
end do
end do
end subroutine work

37
sources/Example_SIMD.6c.c Normal file
View File

@ -0,0 +1,37 @@
/*
* @@name: SIMD.6c
* @@type: C
* @@compilable: yes
* @@linkable: no
* @@expect: success
*/
#pragma omp declare simd linear(p:1) notinbranch
int foo(int *p){
*p = *p + 10;
return *p;
}
int myaddint(int *a, int *b, int n)
{
#pragma omp simd
for (int i=0; i<n; i++){
a[i] = foo(&b[i]); /* foo is not called under a condition */
}
return a[n-1];
}
#pragma omp declare simd linear(p:1) inbranch
float goo(float *p){
*p = *p + 18.5f;
return *p;
}
int myaddfloat(float *x, float *y, int n)
{
#pragma omp simd
for (int i=0; i<n; i++){
x[i] = (x[i] > y[i]) ? goo(&y[i]) : y[i];
/* goo is called under the condition (or within a branch) */
}
return x[n-1];
}

54
sources/Example_SIMD.6f.f Normal file
View File

@ -0,0 +1,54 @@
! @@name: SIMD.6f
! @@type: F-free
! @@compilable: yes
! @@linkable: no
! @@expect: success
function foo(p) result(r)
!$omp declare simd(foo) notinbranch
implicit none
integer :: p, r
p = p + 10
r = p
end function foo
function myaddint(int *a, int *b, int n) result(r)
implicit none
integer :: a(*), b(*), n, r
integer :: i
integer, external :: foo
!$omp simd
do i=1, n
a(i) = foo(b[i]) ! foo is not called under a condition
end do
r = a(n)
end function myaddint
function goo(p) result(r)
!$omp declare simd(goo) inbranch
implicit none
real :: p, r
p = p + 18.5
r = p
end function goo
function myaddfloat(x, y, n) result(r)
implicit none
real :: x(*), y(*), r
integer :: n
integer :: i
real, external :: goo
!$omp simd
do i=1, n
if (x(i) > y(i)) then
x(i) = goo(y(i))
! goo is called under the condition (or within a branch)
else
x(i) = y(i)
endif
end do
r = x(n)
end function myaddfloat

37
sources/Example_SIMD.7c.c Normal file
View File

@ -0,0 +1,37 @@
/*
* @@name: SIMD.7c
* @@type: C
* @@compilable: yes
* @@linkable: yes
* @@expect: success
*/
#include <stdio.h>
#include <stdlib.h>
#define N 45
int a[N], b[N], c[N];
#pragma omp declare simd inbranch
int fib( int n )
{
if (n <= 2)
return n;
else {
return fib(n-1) + fib(n-2);
}
}
int main(void)
{
int i;
#pragma omp simd
for (i=0; i < N; i++) b[i] = i;
#pragma omp simd
for (i=0; i < N; i++) {
a[i] = fib(b[i]);
}
printf("Done a[%d] = %d\n", N-1, a[N-1]);
return 0;
}

38
sources/Example_SIMD.7f.f Normal file
View File

@ -0,0 +1,38 @@
! @@name: SIMD.7f
! @@type: F-free
! @@compilable: yes
! @@linkable: yes
! @@expect: success
program fibonacci
implicit none
integer,parameter :: N=45
integer :: a(0:N-1), b(0:N-1)
integer :: i
integer, external :: fib
!$omp simd
do i = 0,N-1
b(i) = i
end do
!$omp simd
do i=0,N-1
a(i) = fib(b(i))
end do
write(*,*) "Done a(", N-1, ") = ", a(N-1)
! 44 1134903168
end program
recursive function fib(n) result(r)
!$omp declare simd(fib) inbranch
implicit none
integer :: n, r
if (n <= 2) then
r = n
else
r = fib(n-1) + fib(n-2)
endif
end function fib

48
sources/Example_SIMD.8c.c Normal file
View File

@ -0,0 +1,48 @@
/*
* @@name: SIMD.8c
* @@type: C
* @@compilable: yes
* @@linkable: yes
* @@expect: success
*/
#include <stdio.h>
#include <math.h>
int P[1000];
float A[1000];
float do_work(float *arr)
{
float pri;
#pragma omp simd lastprivate(pri)
for (int i = 0; i < 999; ++i) {
int j = P[i];
pri = 0.5f;
if (j % 2 == 0) {
pri = A[j+1] + arr[i];
}
A[j] = pri * 1.5f;
pri = pri + A[j];
}
return pri;
}
int main(void)
{
float pri, arr[1000];
for (int i = 0; i < 1000; ++i) {
P[i] = i;
A[i] = i * 1.5f;
arr[i] = i * 1.8f;
}
pri = do_work(&arr[0]);
if (pri == 8237.25) {
printf("passed: result pri = %7.2f (8237.25) \n", pri);
}
else {
printf("failed: result pri = %7.2f (8237.25) \n", pri);
}
return 0;
}

54
sources/Example_SIMD.8f.f Normal file
View File

@ -0,0 +1,54 @@
! @@name: SIMD.8f
! @@type: F-free
! @@compilable: yes
! @@linkable: yes
! @@expect: success
module work
integer :: P(1000)
real :: A(1000)
contains
function do_work(arr) result(pri)
implicit none
real, dimension(*) :: arr
real :: pri
integer :: i, j
!$omp simd private(j) lastprivate(pri)
do i = 1, 999
j = P(i)
pri = 0.5
if (mod(j-1, 2) == 0) then
pri = A(j+1) + arr(i)
endif
A(j) = pri * 1.5
pri = pri + A(j)
end do
end function do_work
end module work
program simd_8f
use work
implicit none
real :: pri, arr(1000)
integer :: i
do i = 1, 1000
P(i) = i
A(i) = (i-1) * 1.5
arr(i) = (i-1) * 1.8
end do
pri = do_work(arr)
if (pri == 8237.25) then
print 2, "passed", pri
else
print 2, "failed", pri
endif
2 format(a, ": result pri = ", f7.2, " (8237.25)")
end program

View File

@ -7,7 +7,7 @@
*/
void foo ()
{
int A[30];
int A[30], *p;
#pragma omp target data map( A[0:10] )
{
p = &A[0];

View File

@ -33,6 +33,7 @@ end interface
!$omp end task
end do
!$omp taskwait
print*, z
end subroutine pipedF

View File

@ -6,12 +6,15 @@
* @@expect: success
*/
#include <stdlib.h>
#include <omp.h>
#pragma omp declare target
extern void init(float *, float *, int);
#pragma omp end declare target
extern void foo();
extern void output(float *, int);
void vec_mult(float *p, float *v1, float *v2, int N, int dev)
{
int i;
init(p, N);
#pragma omp task depend(out: v1, v2)
#pragma omp target device(dev) map(v1, v2)
{
@ -20,7 +23,7 @@ void vec_mult(float *p, float *v1, float *v2, int N, int dev)
abort();
v1 = malloc(N*sizeof(float));
v2 = malloc(N*sizeof(float));
init(v1,v2);
init(v1, v2, N);
}
foo(); // execute other work asychronously
#pragma omp task depend(in: v1, v2)
@ -32,8 +35,8 @@ void vec_mult(float *p, float *v1, float *v2, int N, int dev)
#pragma omp parallel for
for (i=0; i<N; i++)
p[i] = v1[i] * v2[i];
output(p, N);
free(v1);
free(v2);
}
output(p, N);
}

View File

@ -5,6 +5,15 @@
* @@linkable: no
* @@expect: success
*/
#include <iostream>
#include <exception>
#define N 10000
extern void causes_an_exception();
extern void phase_1();
extern void phase_2();
void example() {
std::exception *ex = NULL;
#pragma omp parallel shared(ex)
@ -15,7 +24,7 @@ void example() {
try {
causes_an_exception();
}
catch (const std::exception *e) {
catch (std::exception *e) {
// still must remember exception for later handling
#pragma omp atomic write
ex = e;

View File

@ -12,7 +12,7 @@ subroutine example(n, dim)
! ...
!$omp do private(s, B)
do i=1, n
!$omp cancellation point
!$omp cancellation point do
allocate(B(dim(i)), stat=s)
if (s .gt. 0) then
!$omp atomic write

View File

@ -5,6 +5,11 @@
* @@linkable: no
* @@expect: success
*/
typedef struct binary_tree_s {
int value;
struct binary_tree_s *left, *right;
} binary_tree_t;
binary_tree_t *search_tree(binary_tree_t *tree, int value, int level) {
binary_tree_t *found = NULL;
if (tree) {

View File

@ -15,22 +15,18 @@ contains
type(binary_tree), intent(in), pointer :: tree
integer, intent(in) :: value, level
type(binary_tree), pointer :: found
type(binary_tree), pointer :: found_left => NULL(), &
found_right => NULL()
if (.not. associated(found)) then
allocate(found)
endif
type(binary_tree), pointer :: found_left => NULL(), found_right => NULL()
if (associated(tree)) then
if (tree%value .eq. value) then
found = tree
found => tree
else
!$omp task shared(found) if(level<10)
call search_tree(tree%left, value, level+1, found_left)
if (associated(found_left)) then
!$omp atomic write
found = found_left
!$omp critical
found => found_left
!$omp end critical
!$omp cancel taskgroup
endif
@ -39,8 +35,9 @@ contains
!$omp task shared(found) if(level<10)
call search_tree(tree%right, value, level+1, found_right)
if (associated(found_right)) then
!$omp atomic write
found = found_right
!$omp critical
found => found_right
!$omp end critical
!$omp cancel taskgroup
endif
@ -56,9 +53,7 @@ contains
integer, intent(in) :: value
type(binary_tree), pointer :: found
if (associated(found)) then
allocate(found)
endif
found => NULL()
!$omp parallel shared(found, tree, value)
!$omp master
!$omp taskgroup

View File

@ -8,12 +8,13 @@
struct typeX
{
int a;
}
};
class typeY
{
int foo() { return a^0x01;}
int a;
}
public:
int foo() { return a^0x01;}
};
#pragma omp declare target
struct typeX varX; // ok
class typeY varY; // ok if varY.foo() not called on target device

View File

@ -16,7 +16,7 @@ extern void init_vars(float *, float *, int);
extern void output(float *, int);
void foo()
{
N = init_vars(&p, &v1, &v2);
init_vars(v1, v2, N);
#pragma omp target device(42) map(p[:N], v1[:N], v2[:N])
{
vec_mult(p, v1, v2, N);
@ -26,7 +26,7 @@ void foo()
void vec_mult(float *p, float *v1, float *v2, int N)
{
int i;
int nthreads = omp_is_initial_device() ? 8 : 1024;
int nthreads;
if (!omp_is_initial_device())
{
printf("1024 threads on target device\n");

View File

@ -6,8 +6,10 @@
* @@expect: success
*/
#include <math.h>
void gramSchmidt(restrict float Q[][COLS], const int rows, const int cols)
#define COLS 100
void gramSchmidt(float Q[][COLS], const int rows)
{
int cols = COLS;
#pragma omp target data map(Q[0:rows][0:cols])
for(int k=0; k < cols; k++)
{

View File

@ -19,6 +19,7 @@ void foo(float *p0, float *v1, float *v2, int N)
}
void vec_mult(float *p1, float *v3, float *v4, int N)
{
int i;
#pragma omp target map(to: v3[0:N], v4[:N]) map(from: p1[0:N])
#pragma omp parallel for
for (i=0; i<N; i++)

View File

@ -0,0 +1,21 @@
/*
* @@name: task_dep.1c
* @@type: C
* @@compilable: yes
* @@linkable: yes
* @@expect: success
*/
#include <stdio.h>
int main()
{
int x = 1;
#pragma omp parallel
#pragma omp single
{
#pragma omp task shared(x) depend(out: x)
x = 2;
#pragma omp task shared(x) depend(in: x)
printf("x = %d\n", x);
}
return 0;
}

View File

@ -0,0 +1,19 @@
! @@name: task_dep.1f
! @@type: F-free
! @@compilable: yes
! @@linkable: yes
! @@expect: success
program example
integer :: x
x = 1
!$omp parallel
!$omp single
!$omp task shared(x) depend(out: x)
x = 2
!$omp end task
!$omp task shared(x) depend(in: x)
print*, "x = ", x
!$omp end task
!$omp end single
!$omp end parallel
end program

View File

@ -0,0 +1,21 @@
/*
* @@name: task_dep.2c
* @@type: C
* @@compilable: yes
* @@linkable: yes
* @@expect: success
*/
#include <stdio.h>
int main()
{
int x = 1;
#pragma omp parallel
#pragma omp single
{
#pragma omp task shared(x) depend(in: x)
printf("x = %d\n", x);
#pragma omp task shared(x) depend(out: x)
x = 2;
}
return 0;
}

View File

@ -0,0 +1,19 @@
! @@name: task_dep.2f
! @@type: F-free
! @@compilable: yes
! @@linkable: yes
! @@expect: success
program example
integer :: x
x = 1
!$omp parallel
!$omp single
!$omp task shared(x) depend(in: x)
print*, "x = ", x
!$omp end task
!$omp task shared(x) depend(out: x)
x = 2
!$omp end task
!$omp end single
!$omp end parallel
end program

View File

@ -0,0 +1,23 @@
/*
* @@name: task_dep.3c
* @@type: C
* @@compilable: yes
* @@linkable: yes
* @@expect: success
*/
#include <stdio.h>
int main()
{
int x;
#pragma omp parallel
#pragma omp single
{
#pragma omp task shared(x) depend(out: x)
x = 1;
#pragma omp task shared(x) depend(out: x)
x = 2;
#pragma omp taskwait
printf("x = %d\n", x);
}
return 0;
}

View File

@ -0,0 +1,20 @@
! @@name: task_dep.3f
! @@type: F-free
! @@compilable: yes
! @@linkable: yes
! @@expect: success
program example
integer :: x
!$omp parallel
!$omp single
!$omp task shared(x) depend(out: x)
x = 1
!$omp end task
!$omp task shared(x) depend(out: x)
x = 2
!$omp end task
!$omp taskwait
print*, "x = ", x
!$omp end single
!$omp end parallel
end program

View File

@ -0,0 +1,23 @@
/*
* @@name: task_dep.4c
* @@type: C
* @@compilable: yes
* @@linkable: yes
* @@expect: success
*/
#include <stdio.h>
int main()
{
int x = 1;
#pragma omp parallel
#pragma omp single
{
#pragma omp task shared(x) depend(out: x)
x = 2;
#pragma omp task shared(x) depend(in: x)
printf("x + 1 = %d. ", x+1);
#pragma omp task shared(x) depend(in: x)
printf("x + 2 = %d\n", x+2);
}
return 0;
}

View File

@ -0,0 +1,22 @@
! @@name: task_dep.4f
! @@type: F-free
! @@compilable: yes
! @@linkable: yes
! @@expect: success
program example
integer :: x
x = 1
!$omp parallel
!$omp single
!$omp task shared(x) depend(out: x)
x = 2
!$omp end task
!$omp task shared(x) depend(in: x)
print*, "x + 1 = ", x+1, "."
!$omp end task
!$omp task shared(x) depend(in: x)
print*, "x + 2 = ", x+2, "."
!$omp end task
!$omp end single
!$omp end parallel
end program

View File

@ -0,0 +1,25 @@
/*
* @@name: task_dep.5c
* @@type: C
* @@compilable: yes
* @@linkable: no
* @@expect: success
*/
// Assume BS divides N perfectly
void matmul_depend(int N, int BS, float A[N][N], float B[N][N], float
C[N][N] )
{
int i, j, k, ii, jj, kk;
for (i = 0; i < N; i+=BS) {
for (j = 0; j < N; j+=BS) {
for (k = 0; k < N; k+=BS) {
#pragma omp task depend ( in: A[i:BS][k:BS], B[k:BS][j:BS] ) \
depend ( inout: C[i:BS][j:BS] )
for (ii = i; ii < i+BS; ii++ )
for (jj = j; jj < j+BS; jj++ )
for (kk = k; kk < k+BS; kk++ )
C[ii][jj] = C[ii][jj] + A[ii][kk] * B[kk][jj];
}
}
}
}

View File

@ -0,0 +1,27 @@
! @@name: task_dep.5f
! @@type: F-free
! @@compilable: yes
! @@linkable: no
! @@expect: success
subroutine matmul_depend (N, BS, A, B, C)
integer :: N, BS, BM
real, dimension(N, N) :: A, B, C
integer :: i, j, k, ii, jj, kk
BM = BS -1
do i = 1, N, BS
do j = 1, N, BS
do k = 1, N, BS
!$omp task depend ( in: A(i:i+BM, k:k+BM), B(k:k+BM, j:j+BM) ) &
!$omp depend ( inout: C(i:i+BM, j:j+BM) )
do ii = i, i+BS
do jj = j, j+BS
do kk = k, k+BS
C(jj,ii) = C(jj,ii) + A(kk,ii) * B(jj,kk)
end do
end do
end do
!$omp end task
end do
end do
end do
end subroutine

View File

@ -9,7 +9,6 @@
integer var
contains
subroutine work
use globals
!$omp task
! do work here
!$omp task

View File

@ -10,7 +10,7 @@ implicit none
sum = 0.0e0
!$omp target map(to: B, C)
!$omp teams num_teams(num_teams) thread_limit(block_threads) &
reduction(+:sum)
!$omp& reduction(+:sum)
!$omp distribute
do i0=1,N, block_size
!$omp parallel do reduction(+:sum)

View File

@ -6,7 +6,7 @@
* @@expect: success
*/
#define N 1024*1024
float dotprod(float B[], float C[], int N)
float dotprod(float B[], float C[])
{
float sum = 0;
int i;