https://ilgpu.net/

 

ILGPU - A Modern GPU Compiler for .Net Programs

A modern, lightweight & fast GPU compiler for high-performance .NET programs.

ilgpu.net

 

https://ilgpu.net/docs/

 

Documentation

ILGPU Tutorials

ilgpu.net

 

How a GPU works

00 Setting up ILGPU

ILGPU should work on any 64-bit platform that .NET supports.

I have even used it on the inexpensive nvidia jetson nano with pretty decent cuda performance.

 

1. Install the most recent .NET SDK for your chosen platform.

2. Create a new C# project: dotnet new console

3. Add the ILGPU package: dotnet add package ILGPU

 

01 A GPU is not a CPU

CPU | SIMD: Single Instruction Multiple Data

CPUs have a trick for parallel programs called SIMD.

These are a set of instructions that allow you to have one instruction do operations on multiple pieces of data at once.

 

GPU |  SIMT: Single Instruction Multiple Threads

SIMT is the same idea as SIMD
but instead of just doing the math instructions in parallel
why not do all the instructions in parallel.

 

using System;
using System.Threading.Tasks;

static void TestParallelFor()
{
    // Load the data
    int[] data = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
    int[] output = new int[10_000];
    
    // Load the action and execute
    Parallel.For(0, output.Length,
        (int i) =>
        {
            output[i] = data[i % data.Length];
        });
}

-

using ILGPU
using ILGPU.Runtime;
using ILGPU.Runtime.CPU;

static void TestILGPU_CPU()
{
    // Initialize ILGPU.
    Context context = Context.CreateDefault();
    Accelerator accelerator = context.CreateCPUAccelerator(0);
    
    // Load the data.
    var deviceData = accelerator.Allocate1D(new int[] { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 });
    var deviceOutput = accelerator.Allocate1D<int>(10_000);
    
    // load / compile the kernel
    var loadedKernel = accelerator.LoadAutoGroupedStreamKernel(
        (Index1D i, ArrayView<int> data, ArrayView<int> output) =>
        {
            output[i] = data[i % data.Length];
        });
    
    // tell the accelerator to start computing the kernel
    loadedKernel((int)deviceOutput.Length, deviceData.View, deviceOutput.View);
    
    // wait for the accelerator to be finished with whatever it's doing
    // in this case it just waits for the kernel to finish.
    accelerator.Synchronize();
    
    accelerator.Dispose();
    context.Dispose();
}

 

 

02 Memory and bandwidth and trheads

Computers need memory, and memory is slow.

 

GPU's have memory, and memory is slow.

 

 

 

 

https://mangkyu.tistory.com/85

 

[CUDA] CUDA 프로그래밍 고급 예제 (2/2)

1. CUDA C 병렬 프로그래밍 고급 예제 [ Parallel Dot ] 아래의 예제는 동일한 크기의 A배열과 B배열의 index에 있는 숫자를 곱해 C배열에 대입하는 코드이다. #include "device_launch_parameters.h" #include #include #in

mangkyu.tistory.com

 

NVIDIA가 만든 병렬 컴퓨팅 플랫폼 및 API 모델

 

GPGPU(General-Purpose Computing on Graphics Processing Unit)

 

CUDA Platform은 GPU의 가상 명령어셋과 병렬처리 요소들을 사용할 수 있도록 만들어주는 소프트웨어 레이어

 

 

Host: CPU and its memory (Host Memory)

Device: GPU and its memory (Device Memory)

 

1. Data copied from CPU to GPU

2. Launch VectorAdd kernel on the GPU

3. Resulting data copied from GPU to CPU

 

 

디바이스 코드는 nvcc (NVIDIA's Compiler)가 컴파일하는데, 몇가지 키워드가 필요함

__global__ 키워드는 Host를 통해 호출되어 Device에서 동작하는 함수

 

Block과 Thread

GPU 코드를 병렬로 처리하기 위한 단위

1개의 Block은 N개의 Thread로 구성됨

 

kernel <<< BlockCount, Threads-per-Block >>>(...)

 

#define N                   (2048 * 2048)
#define THREADS_PER_BLOCK   512

__global__ void dot(int *a, int *b, int *c)
{
    __shared__ int temp[THREADS_PER_BLOCK];
    int index = threadIdx.x + blockIdx.x * blockDim.x;
    temp[threadIdx.x] = a[index] * b[index];
    
    __syncthreads();
    
    if (threadIdx.x == 0)
    {
        int sum = 0;
        for (int i = 0; i < THREADS_PER_BLOCK; ++i)
            sum += temp[i];
        atomicAdd(c, sum);
    }
}

int main(void)
{
    int *p_a, *p_b, *p_c;
    int *p_dev_a, *p_dev_b, *p_dev_c;
    int cbSize = N * sizeof(int);
    
    // allocate host memories
    p_a = (int *)malloc(cbSize);
    p_b = (int *)malloc(cbSize);
    p_c = (int *)malloc(sizeof(int));
    
    // allocate device memories
    cudaMalloc(&p_dev_a, cbSize);
    cudaMalloc(&p_dev_b, cbSize);
    cudaMalloc(&p_dev_c, sizeof(int));
    
    // initialize variables
    for (int i = 0; i < N; ++i)
    {
        p_a[i] = i;
        p_b[i] = i;
    }
    
    // copy host memories to device memories
    cudaMemCpy(p_dev_a, p_a, cbSize, cudaMemcpyHostToDevice);
    cudaMemCpy(p_dev_b, p_b, cbSize, cudaMemcpyHostToDevice);
    
    // run dot with N threads
    dot<<< N / THREADS_PER_BLOCK, THREADS_PER_BLOCK >>>(p_dev_a, p_dev_b, p_dev_c);
    
    // copy device memories sum result(p_dev_c) to host memories(p_c)
    cudaMemCpy(p_c, p_dev_c, sizeof(int), cudaMemCpyDeviceToHost);
    
    printf("Total Sum: %d\n", *p_c);
    
    free(p_a);
    free(p_b);
    free(p_c);
    cudaFree(p_dev_a);
    cudaFree(p_dev_b);
    cudaFree(p_dev_c);
    
    return 0;
}

 

 

Atomic Operations

  • atomicAdd()
  • atomicSub()
  • atomicMin()
  • atomicMax()
  • atomicInc()
  • atomicDec()
  • atomicExch()
  • atomicCAS()

 

1.

 

// 1. Register HttpClientFactory
public void ConfigureServices(IServiceCollection services)
{
    // Other service registrations
    ;
    services.AddHttpClient();
    ;
    
    // Register your custom services
    services.AddScoped<IMyApiService, MyApiService>();
}

// 2. Create a Service Interface
public interface IMyApiService
{
    Task<string> GetApiResponseAsync();
}

// 3. Implement the Service Using HttpClientFactory
public class MyApiService : IMyApiService
{
    readonly IHttpClientFactory m_httpClientFactory;
    
    public MyApiService(IHttpClientFactory httpClientFactory)
    {
        m_httpClientFactory = httpClientFactory;
    }
    
    public async Task<string> GetApiResponseAsync()
    {
        // Create a named client or use the default client
        HttpClient client = m_httpClientFactory.CreateClient();
        
        // Make a HTTP GET request
        HttpResponseMessage response = await client.GetAsync("https://api.example.com/api/");
        
        if (response.IsSuccessStatusCode)
        {
            var sRespBody = await response.Content.ReadAsStringAsync();
            return sRespBody;
        }
        
        // Handle error
        return "Error occurred.";
    }
}

// Use the Service in a Controller
public class MyController : Controller
{
    readonly IMyApiService m_myApiService;
    
    public MyController(IMyApiService myApiService)
    {
        m_myApiService = myApiService;
    }
    
    public async Task<IActionResult> Index()
    {
        var sApiResponse = await m_myApiService.GetApiResponseAsync();
        return View(sApiResponse);
    }
}

 

2. Using Named HttpClient Instances

private static void ConfigureServices(IServiceCollection services)
{
    services.AddHttpClient("WebApiClient", config =>
    {
        config.BaseAddress = new Uri("https://localhost:5001/api/");
        config.Timeout = TimeSpan.FromSeconds(30);
        config.DefaultRequestHeaders.Clear();
    });
    
    ;
    services.AddScoped<IHttpClientFactoryService, HttpClientFactoryService>();
}

private async Task<List<TItem>> GetItemsWithHttpClientFactory<TItem>()
{
    var httpClient = m_httpClientFactory.CreateClient("WebApiClient");
    var response = await httpClient.GetAsync("items",
        HttpCompletionOption.ResponseHeadersRead);
    using (response)
    {
        response.EnsureSuccessStatusCode();
        
        var stream = await response.Content.ReadAsStreamAsync();
        var items = await JsonSerializer.DeserializeAsync<List<TItem>>(stream, m_jsopts);
        return items;
    }
}

 

3. Using Typed HttpClient Instances

// 1. Create the custom client
public class ItemsClient
{
    public HttpClient Client { get; }
    
    public ItemsClient(HttpClient client)
    {
        this.Client = client;
        this.Client.BaseAddress = new Uri("https://localhost:5001/api/");
        this.Client.Timeout = TimeSpan.FromSeconds(30);
        this.Client.DefaultRequestHeaders.Clear();
    }
}

// 2. Register
private static void ConfigureServices(IServiceCollection services)
{
    ;
    services.AddHttpClient<ItemsClient>();
    ;
}


public class HttpClientFactoryService : IHttpClientFactoryService
{
    readonly IHttpClientFactory m_httpClientFactory;
    readonly ItemsClient m_itemsClient;
    readonly JsonSerializerOptions m_jsopts;
    
    public HttpClientFactoryService(
        IHttpClientFactory httpClientFactory,
        ItemsClient itemsClients)
    {
        m_httpClientFactory = httpClientFactory;
        m_itemsClient = itemsClients;
        
        m_options = new JsonSerializerOptions { PropertyNameCaseInsentive = true };
    }
    
    private async Task<TItems> GetItemsWithTypedClient<TItem>()
    {
        var response = await m_itemsClient.Client.GetAsync("items",
            HttpCompletionOption.ResponseHeaderRead);
        using (response)
        {
            response.EnsureSuccessStatusCode();
            
            var stream = await response.Content.ReadAsStreamAsync();
            var items = await JsonSerializer.Deserialize<List<TItem>>(stream, m_jsopts);
            return items;
        }
    }
}

 

'Web > ASP.NET Core' 카테고리의 다른 글

Static files  (0) 2021.12.06

https://download.tek.com/document/37W_17249_6_Fundamentals_of_Real-Time_Spectrum_Analysis1.pdf

 

GLOSSARY

Acquisition An integer number of time-contiguous samples.
Acquisition Time The length of time represented by one acquisition
Amplitude The magnitude of an electrical signal.
Amplitude Modulation (AM) The process in which the amplitude of a sine wave (the carrier) is varie in accordance with the instantaneous voltage of a second electrical signal (the modulating signal).
Analysis Time A subset of time-contiguous samples from one block, used as input to an analysis view.
Analysis View The fexible window used to display real-time measurement results.
Carrier The RF signal upon which modulation resides.
Carrier Frequency  The frequency of the CW component of the carrier signal.
Center Frequency The frequency corresponding to the center of a frequency span of a spectrum the analyzer display.
CZT-Chirp-Z transform A computationally effcient method of computing a Discrete Fourier Transform (DFT). CZTs offer more fexibility for example in selecting the number of output frequency points than the conventional FFT at the expense of additional computations.
CW Signal Continuous wave signal. A sine wave.
dBfs A unit to express power level in decibels referenced to full scale. Depending on the context, this is either the full scale of the display screen or the full scale of the ADC.
dBm A unit to express power level in decibels referenced to 1 milliwatt.
dBmV A unit to express voltage levels in decibels referenced to 1 millivolt.
Decibel (dB) Ten times the logarithm of the ratio of one electrical power to another.
Discrete Fourier transform A mathematical process to calculate the frequency spectrum of a sampled time domain signal.
Display Line A horizontal or vertical line on a waveform display, used as a reference for visual (or automatic) comparison with a given level, time, or frequency.
Distortion Degradation of a signal, often a result of nonlinear operations, resulting in unwanted frequency components. Harmonic and intermodulation distortions are common types.
DPX Digital Phosphor analysis - A signal analysis and compression methodology that allows the live view of timechanging signals allowing the discovery of rare transient events.
DPX Spectrum DPX technology applied to spectrum analysis. DPX Spectrum provides a Live RF view as well as the observation frequency domain transients.
Dynamic Range The maximum ratio of the levels of two signals simultaneously present at the input which can be measured to a specifed accuracy.
Fast Fourier Transform A computationally effcient method of computing a Discrete Fourier Transform (DFT). A common FFT algorithm requires that the number of input and output samples are equal and a power of 2 (2,4,8,16,…).
Frequency The rate at which a signal oscillates, expressed as hertz or number of cycles per second.
Frequency Domain View The representation of the power of the spectral components of a signal as a function of frequency; the spectrum of the signal.
Frequency Drift Gradual shift or change a signal frequency over the specifed time, where other conditions remain constant. Expressed in hertz per second.
Frequency Mask Trigger  A fexible real-time trigger based on specifc events that occur in the frequency domain. The triggering parameters are defned by a graphical mask.
Frequency Modulation (FM) The process in which the frequency of an electrical signal (the carrier) is varied according to the instantaneous voltage of a second electrical signal (the modulating signal).
Frequency Range The range of frequencies over which a device operates, with lower and upper bounds.
Frequency Span A continuous range of frequencies extending between two frequency limits.
Marker  A visually identifable point on a waveform trace, used to extract a readout of domain and range values represented by that point.
Modulate To vary a characteristic of a signal, typically in order to transmit information.
Noise Unwanted random disturbances superimposed on a signal which tend to obscure that signal.
Noise Floor The level of noise intrinsic to a system that represents the minimum limit at which input signals can be observed; ultimately limited by thermal noise (kTB).
Noise Bandwidth (NBW) The exact bandwidth of a flter that is used to calculate the absolute power of noise or noise-like signals in dBm/Hz.
Probability of Intercept The certainty to which a signal can be detected within defned parameters.
Real-Time Bandwidth The frequency span over which realtime seamless capture can be performed, which is a function of the digitizer and the IF bandwidth of a Real-Time Spectrum Analyzer.
Real-Time Seamless Capture The ability to acquire and store an uninterrupted series of time domain samples that represent the behavior of an RF signal over a long period of time.
Real-Time Spectrum Analysis A spectrum analysis technique based on Discrete Fourier Transforms (DFT) that is capable of continuously analyzing a bandwidth of interest without time gaps. Real-Time Spectrum Analysis provides 100% probability of display and trigger of transient signal fuctuations within the specifed span, resolution bandwidth and time parameters.
Real-Time Spectrum Analyzer Instrument capable of measuring elusive RF events in RF signals, triggering on those events, seamlessly capturing them into memory, and analyzing them in the frequency, time, and modulation domains.
Reference Level The signal level represented by the uppermost graticule line of the analyzer display.
Resolution Bandwidth (RBW) The width of the narrowest measurable band of frequencies in a spectrum analyzer display. The RBW determines the analyzer’s ability to resolve closely spaced signal components.
Sensitivity Measure of a spectrum analyzer’s ability to display minimum level signals, usually expressed as Displayed Average Noise Level (DANL).
Spectrogram Frequency vs. Time vs. amplitude display where the frequency is represented on x-axis and time on the y-axis. The power is expressed by the color.
Spectrum The frequency domain representation of a signal showing the power distribution of its spectral component versus frequency.
Spectrum Analysis Measurement technique for determining the frequency content of an RF signal.
Vector Signal Analysis Measurement technique for charactering the modulation of an RF signal. Vector analysis takes both magnitude and phase into account.

 

Acronym Reference

ACP Adjacent Channel Power
ADC Analog-to-Digital Converter
AM Amplitude Modulation
BH4B Blackman-Harris 4B Window
BW Bandwidth
CCDF Complementary Cumulative Distribution Function
CDMA Code Division Multiple Access
CW Continuous Wave
dB Decibel
dBfs dB Full Scale
DDC Digital Downconverter
DFT Discrete Fourier Transform
DPX Digital Phosphor Display, Spectrum, etc.
DSP Digital Signal Processing
EVM Error Vector Magnitude
FFT Fast Fourier Transform 
FM Frequency Modulation 
FSK Frequency Shift Keying 
IF Intermediate Frequency
IQ In-Phase Quadrature
LO Local Oscillator
NBW Noise Bandwidth
OFDM Orthogonal Frequency Division Multiplexing
PAR Peak-Average Ratio
PM Phase Modulation
POI Probability of Intercept
PRBS Pseudorandom Binary Sequence
PSK Phase Shift Keying
QAM Quadrature Amplitude Modulation
QPSK Quadrature Phase Shift Keying
RBW Resolution Bandwidth
RF Radio Frequency
RMS Root Mean Square
RTSA Real-Time Spectrum Analyzer
SA Spectrum Analyze
VSA Vector Signal Analyzer

+ Recent posts