UC3M

Telematic/Audiovisual Syst./Communication Syst. Engineering

Systems Architecture

September 2017 - January 2018

11.5.  Activities

11.5.1.  Concurrent array processing (with threads)

Work Plan

This exercise aims to analyze a program that performs concurrent processing of a shared table. This version makes use of a simple concurrent programming pattern that distributes the load between the different threads of an application for partial processing of each part. In this case, the use of a multicore infrastructure available in many current machines, help reduce the execution times.

The basic idea is processing a long data structure and divide the work into several threads:

  1. In the example given there are 5 threads, each of which has a fifth of the table, and keep the solution at an intermediate table (array).

  2. As all threads have completed execution and leave the function do_work, the main function sums up the partial contributions of each of thread. To wait for the main thread will use a pthread_join for each thread.

  3. Note that the code execution has no anomalies before running because each thread has a memory block exclusively (you can check your code with helgrind).

  4. It is also true that the mixture of the data of the strands is safe because the main function in charge of summing up does not occur while the thread is running.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>

#define NTHREADS      5
#define ARRAYSIZE   100000000
#define ITERATIONS   ARRAYSIZE / NTHREADS

double sum=0.0;
double a[ARRAYSIZE];
double mysums[NTHREADS];

void *do_work(void *tid) 
{
  int i, start, *mytid, end;
  double mysum=0.0;

  /* Initialize my part of the global array and keep local sum */
  mytid = (int *) tid;
  start = (*mytid * ITERATIONS);
  end = start + ITERATIONS;
  printf ("\n[Thread %5d] Doing iterations \t%10d to \t %10d",*mytid,start,end-1); 
  for (i=start; i < end ; i++) 
    {
      a[i] = i * 1.0;
      mysum = mysum + a[i];
    }
  mysums[*mytid]=mysum;
  printf ("\n[Thread %5d] Sum %e",*mytid,mysum); 
  pthread_exit(NULL);
}


int main(int argc, char *argv[])
{
  int i, start, tids[NTHREADS];
  pthread_t threads[NTHREADS];
  pthread_attr_t attr; 
  pthread_attr_init(&attr);
  pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
  for (i=0; i<NTHREADS; i++) {
    tids[i] = i;
    pthread_create(&threads[i], &attr, do_work, (void *) &tids[i]);
    }

  /* Wait for all threads to complete then print global sum */ 
  for (i=0; i<NTHREADS; i++) {
    pthread_join(threads[i], NULL);
  }  
  /* Computing the output*/ 
  sum=0.0;
  for (i=0;i<NTHREADS;i++)
  { 
    sum = sum + mysums[i];     
  }
  printf("\n[MAIN] Total Sum= %e\n",sum);
  /* Clean up and exit */
  pthread_attr_destroy(&attr);  
  pthread_exit (NULL);
}
 

Now it is the time to experience the benefits in performance issues offered by multicore. To do this you are asked to make the following changes to the code:

  1. Check your code, compile it with gcc, and run it by measuring the time it takes to execute. To do this you can use the Linux command time. For example, if your executable is called concurrent_loop, you can use time ./concurrent_loop to verify the total runtime of your program.

  2. Modify it so that there is only one thread in the application. Recalculate how long it takes to run your application. Why is that time higher than in the previous case?

  3. Change the code so that there are no global variables and all are passed to threads through its initialization pointer. All tables should be created in the main function, and this this data should be packed into an struct. All these data should be packed into a structure.

  4. Finally, remodify the code so that instead of using more traditional concurrency setting (threads) is used. To do this, a simple way to achieve this goal is to remove references to the threads and call the function directly from the block of the main function. Measure the time it takes to run this new version of your code. Do you go out in the middle of the two previous times? Why do you think you are getting these results?