AUTH's THMMY "Parallel and distributed systems" course assignments.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

123456789101112131415161718192021222324252627282930313233
  1. Parallel & Distributed Computer Systems HW3
  2. January, 2025
  3. Write a program that sorts $N$ integers in ascending order, using CUDA.
  4. The program must perform the following tasks:
  5. - The user specifies a positive integers $q$.
  6. - Start a process with an array of $N = 2^q$ random integers is each processes.
  7. - Sort all $N$ elements int ascending order.
  8. - Check the correctness of the final result.
  9. Your implementation should be based on the following steps:
  10. V0. A kernel where each thread only compares and exchanges. This "eliminates" the 1:n innermost loop. Easy to write, but too many function calls and global synchronizations.
  11. V1. Include the k inner loop in the kernel function. How do we handle the synchronization? Fewer calls, fewer global synchronizations. Faster than V0!
  12. V2. Modify the kernel of V1 to work with local memory instead of global.
  13. You must deliver:
  14. - A report (about $3-4$ pages) that describes your parallel algorithm and implementation.
  15. - Your comments on the speed of your parallel program compared to the serial sort, after trying you program on aristotelis for $q = [20:27]$.
  16. - The source code of your program uploaded online.
  17. Ethics: If you use code found on the web or by an LLM, you should mention your source and the changes you made. You may work in pairs; both partners must submit a single report with both names.
  18. Deadline: 2 February, $2025$.