GPU Support
GeneralizedGrossPitaevskii.jl provides seamless GPU acceleration through KernelAbstractions.jl, enabling the same code to run efficiently on both CPU and GPU hardware. The package automatically detects array types and dispatches to appropriate compute backends, making GPU usage as simple as providing GPU arrays as initial conditions.
This page explains how to enable GPU support, configure backends, and optimize performance for GPU simulations.
KernelAbstractions.jl
KernelAbstractions.jl provides a unified interface for writing GPU kernels that work across different backends (CUDA, ROCm, oneAPI, etc.). GeneralizedGrossPitaevskii.jl leverages this abstraction to provide hardware-agnostic simulations. Beyond CPU, the package has only been tested with the CUDA backend. The other backends are untested.
Specifying the GPU backend
Different GPU backends require different array packages and setup procedures. The choice of backend depends on your hardware and software environment.
For NVIDIA GPUs, use CUDA.jl, which needs to be installed separately:
using CUDA, GeneralizedGrossPitaevskii
# Check GPU availability
CUDA.functional() # Should return true
# Create GPU arrays
u0 = ((CUDA.zeros(ComplexF64, 512, 512)),)
# Standard problem setup - automatically uses GPU
prob = GrossPitaevskiiProblem(u0, lengths; dispersion, nonlinearity, param)
ts, sol = solve(prob, StrangSplitting(), tspan; dt, nsaves)The other backends should work similarly, but have not been tested.
Performance considerations
Floating Point Precision
GPU performance often benefits significantly from reduced precision, such as Float32 or ComplexF32, compared to Float64 or ComplexF64.
# Double precision (slower, higher accuracy)
u0_double = (CuArray(zeros(ComplexF64, N, N)),)
# Single precision (faster, sufficient for most applications)
u0_single = (CuArray(zeros(ComplexF32, N, N)),)Make sure to also change the types in the parameters and functions you provide (e.g., dispersion, nonlinearity) to match the precision of your arrays.
Consider the following factors when choosing precision:
- Accuracy requirements: Some applications may require higher precision
- Memory usage: Lower precision reduces memory consumption, allowing larger problems
Performance Tips
- Use appropriate precision: Float32 for most applications, Float64 only when necessary.
- Choose optimal grid sizes: Powers of 2 often perform better for FFTs
- Ensemble simulations: GPU parallelism is ideal for multiple stochastic trajectories. Be sure to use a large enough ensemble size to fully utilize the GPU.
- Tune workgroup sizes: A
workgroup_sizeparameter is available in thesolvefunction to allow tuning of workgroup sizes for better performance.