%\pagebreak \section{OpenMP Memory Model} \label{sec:mem_model} The following examples illustrate two major concerns for concurrent thread execution: ordering of thread execution and memory accesses that may or may not lead to race conditions. In the following example, at Print 1, the value of \ucode{xval} could be either 2 or 5, depending on the timing of the threads. The \kcode{atomic} directives are necessary for the accesses to \ucode{x} by threads 1 and 2 to avoid a data race. If the atomic write completes before the atomic read, thread 1 is guaranteed to see 5 in \ucode{xval}. Otherwise, thread 1 is guaranteed to see 2 in \ucode{xval}. \index{flushes!implicit} \index{atomic construct@\kcode{atomic} construct} \index{constructs!atomic@\kcode{atomic}} The barrier after Print 1 contains implicit flushes on all threads, as well as a thread synchronization, so the programmer is guaranteed that the value 5 will be printed by both Print 2 and Print 3. Since neither Print 2 or Print 3 are modifying \ucode{x}, they may concurrently access \ucode{x} without requiring \kcode{atomic} directives to avoid a data race. \cexample[3.1]{mem_model}{1} \ffreeexample[3.1]{mem_model}{1} \pagebreak \index{flushes!flush construct@\kcode{flush} construct} \index{flush construct@\kcode{flush} construct} \index{constructs!flush@\kcode{flush}} The following example demonstrates why synchronization is difficult to perform correctly through variables. The write to \ucode{flag} on thread 0 and the read from \ucode{flag} in the loop on thread 1 must be atomic to avoid a data race. When thread 1 breaks out of the loop, \ucode{flag} will have the value of 1. However, \ucode{data} will still be undefined at the first print statement. Only after the flush of both \ucode{flag} and \ucode{data} after the first print statement will \ucode{data} have the well-defined value of 42. \cexample[3.1]{mem_model}{2} \fexample[3.1]{mem_model}{2} \pagebreak \index{flushes!flush with a list} The next example demonstrates why synchronization is difficult to perform correctly through variables. As in the preceding example, the updates to \ucode{flag} and the reading of \ucode{flag} in the loops on threads 1 and 2 are performed atomically to avoid data races on \ucode{flag}. However, the code still contains data race due to the incorrect use of ``flush with a list'' after the assignment to \ucode{data1} on thread 1. By not including \ucode{flag} in the flush-set of that \kcode{flush} directive, the assignment can be reordered with respect to the subsequent atomic update to \ucode{flag}. Consequentially, \ucode{data1} is undefined at the print statement on thread 2. \cexample[3.1]{mem_model}{3} \fexample[3.1]{mem_model}{3} The following two examples illustrate the ordering properties of the \plc{flush} operation. The \plc{flush} operations are strong flushes that are applied to the specified flush lists. However, use of a \kcode{flush} construct with a list is extremely error prone and users are strongly discouraged from attempting it. In the codes the programmer intends to prevent simultaneous execution of the protected section by the two threads. The atomic directives in the codes ensure that the accesses to shared variables \ucode{a} and \ucode{b} are atomic write and atomic read operations. Otherwise both examples would contain data races and automatically result in unspecified behavior. In the following incorrect code example, operations on variables \ucode{a} and \ucode{b} are not ordered with respect to each other. For instance, nothing prevents the compiler from moving the flush of \ucode{b} on thread 0 or the flush of \ucode{a} on thread 1 to a position completely after the protected section (assuming that the protected section on thread 0 does not reference \ucode{b} and the protected section on thread 1 does not reference \ucode{a}). If either re-ordering happens, both threads can simultaneously execute the protected section. Any shared data accessed in the protected section is not guaranteed to be current or consistent during or after the protected section. \cexample[3.1]{mem_model}{4a} \ffreeexample[3.1]{mem_model}{4a} The following code example correctly ensures that the protected section is executed by only one thread at a time. Execution of the protected section by neither thread is considered correct in this example. This occurs if both flushes complete prior to either thread executing its \bcode{if} statement for the protected section. The compiler is prohibited from moving the flush at all for either thread, ensuring that the respective assignment is complete and the data is flushed before the \bcode{if} statement is executed. \cexample[3.1]{mem_model}{4b} \ffreeexample[3.1]{mem_model}{4b}