How a GPU works
00 Setting up ILGPU
ILGPU should work on any 64-bit platform that .NET supports.
I have even used it on the inexpensive nvidia jetson nano with pretty decent cuda performance.
1. Install the most recent .NET SDK for your chosen platform.
2. Create a new C# project: dotnet new console
3. Add the ILGPU package: dotnet add package ILGPU
01 A GPU is not a CPU
CPU | SIMD: Single Instruction Multiple Data
CPUs have a trick for parallel programs called SIMD.
These are a set of instructions that allow you to have one instruction do operations on multiple pieces of data at once.
GPU | SIMT: Single Instruction Multiple Threads
SIMT is the same idea as SIMD
but instead of just doing the math instructions in parallel
why not do all the instructions in parallel.
using System;
using System.Threading.Tasks;
static void TestParallelFor()
{
// Load the data
int[] data = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
int[] output = new int[10_000];
// Load the action and execute
Parallel.For(0, output.Length,
(int i) =>
{
output[i] = data[i % data.Length];
});
}
-
using ILGPU
using ILGPU.Runtime;
using ILGPU.Runtime.CPU;
static void TestILGPU_CPU()
{
// Initialize ILGPU.
Context context = Context.CreateDefault();
Accelerator accelerator = context.CreateCPUAccelerator(0);
// Load the data.
var deviceData = accelerator.Allocate1D(new int[] { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 });
var deviceOutput = accelerator.Allocate1D<int>(10_000);
// load / compile the kernel
var loadedKernel = accelerator.LoadAutoGroupedStreamKernel(
(Index1D i, ArrayView<int> data, ArrayView<int> output) =>
{
output[i] = data[i % data.Length];
});
// tell the accelerator to start computing the kernel
loadedKernel((int)deviceOutput.Length, deviceData.View, deviceOutput.View);
// wait for the accelerator to be finished with whatever it's doing
// in this case it just waits for the kernel to finish.
accelerator.Synchronize();
accelerator.Dispose();
context.Dispose();
}
02 Memory and bandwidth and trheads
Computers need memory, and memory is slow.
GPU's have memory, and memory is slow.