AUTH's THMMY "Parallel and distributed systems" course assignments.
25'ten fazla konu seçemezsiniz Konular bir harf veya rakamla başlamalı, kısa çizgiler ('-') içerebilir ve en fazla 35 karakter uzunluğunda olabilir.
 
 
 
 
 
 

1.4 KiB

Parallel & Distributed Computer Systems HW3

January, 2025

Write a program that sorts $N$ integers in ascending order, using CUDA.

The program must perform the following tasks:

  • The user specifies a positive integers $q$.

  • Start a process with an array of $N = 2^q$ random integers is each processes.

  • Sort all $N$ elements int ascending order.

  • Check the correctness of the final result.

Your implementation should be based on the following steps:

V0. A kernel where each thread only compares and exchanges. This “eliminates” the 1:n innermost loop. Easy to write, but too many function calls and global synchronizations.

V1. Include the k inner loop in the kernel function. How do we handle the synchronization? Fewer calls, fewer global synchronizations. Faster than V0!

V2. Modify the kernel of V1 to work with local memory instead of global.

You must deliver:

  • A report (about $3-4$ pages) that describes your parallel algorithm and implementation.

  • Your comments on the speed of your parallel program compared to the serial sort, after trying you program on aristotelis for $q = [20:27]$.

  • The source code of your program uploaded online.

Ethics: If you use code found on the web or by an LLM, you should mention your source and the changes you made. You may work in pairs; both partners must submit a single report with both names. Deadline: 2 February, $2025$.