'.NET/ILGPU' 카테고리의 글 목록

.NET/ILGPU

ILGPU Tutorials Primers 2024.01.30

ILGPU Tutorials Primers

2024. 1. 30. 13:32

https://ilgpu.net/

ILGPU - A Modern GPU Compiler for .Net Programs

A modern, lightweight & fast GPU compiler for high-performance .NET programs.

ilgpu.net

https://ilgpu.net/docs/

Documentation

ILGPU Tutorials

ilgpu.net

How a GPU works

00 Setting up ILGPU

ILGPU should work on any 64-bit platform that .NET supports.

I have even used it on the inexpensive nvidia jetson nano with pretty decent cuda performance.

1. Install the most recent .NET SDK for your chosen platform.

2. Create a new C# project: dotnet new console

3. Add the ILGPU package: dotnet add package ILGPU

01 A GPU is not a CPU

CPU | SIMD: Single Instruction Multiple Data

CPUs have a trick for parallel programs called SIMD.

These are a set of instructions that allow you to have one instruction do operations on multiple pieces of data at once.

GPU | SIMT: Single Instruction Multiple Threads

SIMT is the same idea as SIMD
but instead of just doing the math instructions in parallel
why not do all the instructions in parallel.

using System;
using System.Threading.Tasks;

static void TestParallelFor()
{
    // Load the data
    int[] data = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
    int[] output = new int[10_000];
    
    // Load the action and execute
    Parallel.For(0, output.Length,
        (int i) =>
        {
            output[i] = data[i % data.Length];
        });
}

using ILGPU
using ILGPU.Runtime;
using ILGPU.Runtime.CPU;

static void TestILGPU_CPU()
{
    // Initialize ILGPU.
    Context context = Context.CreateDefault();
    Accelerator accelerator = context.CreateCPUAccelerator(0);
    
    // Load the data.
    var deviceData = accelerator.Allocate1D(new int[] { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 });
    var deviceOutput = accelerator.Allocate1D<int>(10_000);
    
    // load / compile the kernel
    var loadedKernel = accelerator.LoadAutoGroupedStreamKernel(
        (Index1D i, ArrayView<int> data, ArrayView<int> output) =>
        {
            output[i] = data[i % data.Length];
        });
    
    // tell the accelerator to start computing the kernel
    loadedKernel((int)deviceOutput.Length, deviceData.View, deviceOutput.View);
    
    // wait for the accelerator to be finished with whatever it's doing
    // in this case it just waits for the kernel to finish.
    accelerator.Synchronize();
    
    accelerator.Dispose();
    context.Dispose();
}

02 Memory and bandwidth and trheads

Computers need memory, and memory is slow.

GPU's have memory, and memory is slow.

저작자표시 비영리 동일조건

PREV 이전 1 NEXT 다음

dozob