469 questions
359
votes
8
answers
351k
views
Different CUDA versions shown by nvcc and NVIDIA-smi
I am very confused by the different CUDA versions shown by running which nvcc and nvidia-smi. I have both cuda9.2 and cuda10 installed on my ubuntu 16.04. Now I set the PATH to point to cuda9.2. So ...
150
votes
3
answers
162k
views
How do I choose grid and block dimensions for CUDA kernels?
This is a question about how to determine the CUDA grid, block and thread sizes. This is an additional question to the one posted here.
Following this link, the answer from talonmies contains a code ...
58
votes
8
answers
38k
views
Swing rendering appears broken in JDK 1.8, correct in JDK 1.7
I have installed IntelliJ IDEA (13.1.1 #IC-135.480) and JDK 1.8.0 (x64) and I generated some GUI with the GUI Form designer.
Then I ran the code and realized that something is not alright.
Here is ...
69
votes
6
answers
278k
views
Error Message : Cannot find or open the PDB file
I tried running sample programs provided at NVIDIA's official site. Most of the programs ran smoothly except few where I get similar error messages. How can I fix that? Here's a sample of error ...
33
votes
2
answers
119k
views
What is the correct version of CUDA for my nvidia driver?
I am using ubuntu 14.04. I want to install CUDA. But I don't know which version is good for my laptop. I trace my driver that is:
$cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 ...
5
votes
1
answer
22k
views
How to create NVIDIA OpenCL project
I want to write application in NVIDIA OpenCL in Visual Studio 2017 but don't know how to create project for this purpose.
I have GPU from NVIDIA (GeForce 940M) and Intel (HD Graphics 5500) and ...
28
votes
2
answers
15k
views
How is CUDA memory managed?
When I run my CUDA program which allocates only a small amount of global memory (below 20 M), I got a "out of memory" error. (From other people's posts, I think the problem is related to memory ...
184
votes
2
answers
86k
views
How do CUDA blocks/warps/threads map onto CUDA cores?
I have been using CUDA for a few weeks, but I have some doubts about the allocation of blocks/warps/thread.
I am studying the architecture from a didactic point of view (university project), so ...
25
votes
3
answers
20k
views
How to measure the inner kernel time in NVIDIA CUDA?
I want to measure time inner kernel of GPU, how how to measure it in NVIDIA CUDA?
e.g.
__global__ void kernelSample()
{
some code here
get start time
some code here
get stop time
some ...
13
votes
1
answer
31k
views
How do I use Nvidia Multi-process Service (MPS) to run multiple non-MPI CUDA applications?
Can I run non-MPI CUDA applications concurrently on NVIDIA Kepler GPUs with MPS? I'd like to do this because my applications cannot fully utilize the GPU, so I want them to co-run together. Is there ...
11
votes
3
answers
6k
views
Why is NVIDIA Pascal GPUs slow on running CUDA Kernels when using cudaMallocManaged
I was testing the new CUDA 8 along with the Pascal Titan X GPU and is expecting speed up for my code but for some reason it ends up being slower. I am on Ubuntu 16.04.
Here is the minimum code that ...
187
votes
8
answers
622k
views
How do I select which GPU to run a job on?
In a multi-GPU computer, how do I designate which GPU a CUDA job should run on?
As an example, when installing CUDA, I opted to install the NVIDIA_CUDA-<#.#>_Samples then ran several instances ...
180
votes
2
answers
178k
views
Understanding CUDA grid dimensions, block dimensions and threads organization (simple explanation) [closed]
How are threads organized to be executed by a GPU?
130
votes
5
answers
69k
views
What is a bank conflict? (Doing Cuda/OpenCL programming)
I have been reading the programming guide for CUDA and OpenCL, and I cannot figure out what a bank conflict is. They just sort of dive into how to solve the problem without elaborating on the subject ...
118
votes
2
answers
94k
views
nvidia-smi Volatile GPU-Utilization explanation?
I know that nvidia-smi -l 1 will give the GPU usage every one second (similarly to the following). However, I would appreciate an explanation on what Volatile GPU-Util really means. Is that the number ...
19
votes
4
answers
19k
views
128 bit integer on cuda?
I just managed to install my cuda SDK under Linux Ubuntu 10.04. My graphic card is an NVIDIA geForce GT 425M, and I'd like to use it for some heavy computational problem.
What I wonder is: is there ...
85
votes
9
answers
53k
views
Horrible redraw performance of the DataGridView on one of my two screens
I've actually solved this, but I'm posting it for posterity.
I ran into a very odd issue with the DataGridView on my dual-monitor system. The issue manifests itself as an EXTREMELY slow repaint of ...
4
votes
1
answer
5k
views
Cuda kernel returning vectors
I have a list of words, my goal is to match each word in a very very long phrase.
I'm having no problem in matching each word, my only problem is to return a vector of structures containing ...
639
votes
21
answers
1.5m
views
How do I check if PyTorch is using the GPU?
How do I check if PyTorch is using the GPU? The nvidia-smi command can detect GPU activity, but I want to check it directly from inside a Python script.
11
votes
3
answers
26k
views
What can I do against 'CUDA driver version is insufficient for CUDA runtime version'?
When I go to /usr/local/cuda/samples/1_Utilities/deviceQuery and execute
moose@pc09 /usr/local/cuda/samples/1_Utilities/deviceQuery $ sudo make clean
rm -f deviceQuery deviceQuery.o
rm -rf ../../bin/...
45
votes
5
answers
59k
views
How does CUDA assign device IDs to GPUs?
When a computer has multiple CUDA-capable GPUs, each GPU is assigned a device ID. By default, CUDA kernels execute on device ID 0. You can use cudaSetDevice(int device) to select a different device.
...
44
votes
5
answers
35k
views
OpenGL without X.org in linux
I'd like to open an OpenGL context without X in Linux. Is there any way at all to do it?
I know it's possible for integrated Intel graphics card hardware, though most people have Nvidia cards in ...
39
votes
3
answers
25k
views
How can I make tensorflow run on a GPU with capability 2.x?
I've successfully installed tensorflow (GPU) on Linux Ubuntu 16.04 and made some small changes in order to make it work with the new Ubuntu LTS release.
However, I thought (who knows why) that my GPU ...
4
votes
2
answers
9k
views
CUDA program causes nvidia driver to crash
My monte carlo pi calculation CUDA program is causing my nvidia driver to crash when I exceed around 500 trials and 256 full blocks. It seems to be happening in the monteCarlo kernel function.Any help ...
71
votes
5
answers
103k
views
CUDA determining threads per block, blocks per grid
I'm new to the CUDA paradigm. My question is in determining the number of threads per block, and blocks per grid. Does a bit of art and trial play into this? What I've found is that many examples have ...
65
votes
9
answers
117k
views
Error compiling CUDA from Command Prompt
I'm trying to compile a cuda test program on Windows 7 via Command Prompt,
I'm this command:
nvcc test.cu
But all I get is this error:
nvcc fatal : Cannot find compiler 'cl.exe' in PATH
What may ...
33
votes
3
answers
26k
views
Are cuda kernel calls synchronous or asynchronous
I read that one can use kernel launches to synchronize different blocks i.e., If i want all blocks to complete operation 1 before they go on to operation 2, I should place operation 1 in one kernel ...
28
votes
2
answers
16k
views
Forcing NVIDIA GPU programmatically in Optimus laptops
I'm programming a DirectX game, and when I run it on an Optimus laptop the Intel GPU is used, resulting in horrible performance. If I force the NVIDIA GPU using the context menu or by renaming my ...
7
votes
2
answers
3k
views
Force system with nVidia Optimus to use the real GPU for my application?
I want my application to always run using the real gpu on nVidia Optimus laptops.
From "Enabling High Performance Graphics Rendering on Optimus Systems", (http://developer.download.nvidia.com/devzone/...
4
votes
2
answers
4k
views
CUFFT error handling
I'm using the following macro for CUFFT error handling:
#define cufftSafeCall(err) __cufftSafeCall(err, __FILE__, __LINE__)
inline void __cufftSafeCall(cufftResult err, const char *file, const ...
2
votes
1
answer
3k
views
cuda 11 kernel doesn't run
here is a demo.cu aiming to printf from the GPU device:
#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <stdio.h>
__global__ void hello_cuda() {
...
0
votes
1
answer
4k
views
GPU is not detected in Tensorflow
I am using Tensorflow on Windows, and I am trying to use my GPU. But Tensorflow seems unable to detect my GPU.
I created a Python virtual environment and installed Python (3.8) and TensorFlow. My ...
599
votes
19
answers
992k
views
Nvidia NVML Driver/library version mismatch [closed]
When I run nvidia-smi, I get the following message:
Failed to initialize NVML: Driver/library version mismatch
An hour ago I received the same message and uninstalled my CUDA library and I was able ...
138
votes
10
answers
333k
views
Is it possible to run CUDA on AMD GPUs?
I'd like to extend my skill set into GPU computing. I am familiar with raytracing and realtime graphics(OpenGL), but the next generation of graphics and high performance computing seems to be in GPU ...
112
votes
4
answers
91k
views
Streaming multiprocessors, Blocks and Threads (CUDA)
What is the relationship between a CUDA core, a streaming multiprocessor and the CUDA model of blocks and threads?
What gets mapped to what and what is parallelized and how? and what is more ...
52
votes
10
answers
215k
views
How do I run nvidia-smi on Windows?
nvidia-smi executed in a Command Prompt (CMD) in Windows returns the following error
C:\Users>nvidia-smi
'nvidia-smi' is not recognized as an internal or external command,
operable program or batch ...
51
votes
8
answers
99k
views
Tensorflow not running on GPU
I have aldready spent a considerable of time digging around on stack overflow and else looking for the answer, but couldn't find anything
Hi all,
I am running Tensorflow with Keras on top.
I am 90% ...
32
votes
4
answers
39k
views
How can I get number of Cores in cuda device?
I am looking for a function that count number of core of my cuda device. I know each microprocessor have specific cores, and my cuda device has 2 microprocessors.
I searched a lot to find a property ...
15
votes
6
answers
7k
views
Forcing hardware accelerated rendering
I have an OpenGL library written in c++ that is used from a C# application using C++/CLI adapters. My problem is that if the application is used on laptops with Nvidia Optimus technology the ...
13
votes
1
answer
16k
views
why do we need cudaDeviceSynchronize(); in kernels with device-printf?
__global__ void helloCUDA(float f)
{
printf("Hello thread %d, f=%f\n", threadIdx.x, f);
}
int main()
{
helloCUDA<<<1, 5>>>(1.2345f);
cudaDeviceSynchronize();
return ...
3
votes
1
answer
3k
views
Calculation on GPU leads to driver error "stopped responding"
I have this little nonsense script here which I am executing in MATLAB R2013b:
clear all;
n = 2000;
times = 50;
i = 0;
tCPU = tic;
disp 'CPU::'
A = rand(n, n);
B = rand(n, n);
disp '::Go'
for i = ...
2
votes
1
answer
6k
views
Cuda Random Number Generation
I was wondering what was the best way to generate one pseudo random number between 0 and 49k that would be the same for each thread, by using curand or something else.
I prefer to generate the ...
99
votes
6
answers
297k
views
GPU-accelerated video processing with ffmpeg [closed]
I want to use ffmpeg to accelerate video encode and decode with an NVIDIA GPU.
From NVIDIA's website:
NVIDIA GPUs contain one or more hardware-based decoder and encoder(s) (separate from the CUDA ...
59
votes
3
answers
30k
views
Running more than one CUDA applications on one GPU
CUDA document does not specific how many CUDA process can share one GPU. For example, if I launch more than one CUDA programs by the same user with only one GPU card installed in the system, what is ...
19
votes
5
answers
18k
views
How to run CUDA without a GPU using a software implementation?
My laptop doesn't have a nVidia graphic cards, and I want to work on CUDA. The website says that CUDA can be used in emulation mode on non-cuda hardware too. But when I tried installing CUDA drivers ...
15
votes
2
answers
22k
views
Using constants with CUDA
Which is the best way of using constants in CUDA?
One way is to define constants in constant memory, like:
// CUDA global constants
__constant__ int M;
int main(void)
{
...
...
15
votes
4
answers
23k
views
Compile cuda code for CPU
I'm study cuda 5.5 but i don't have any Nvidia GPU. In old version of nvcc have a flag --multicore to compile cuda code for CPU.
In the new version of nvcc, what's is the option?? I'm working on ...
15
votes
2
answers
11k
views
How to interrupt or cancel a CUDA kernel from host code
I am working with CUDA and I am trying to stop my kernels work (i.e. terminate all running threads) after a certain if block is being hit. How can I do that? I am really stuck in here.
8
votes
2
answers
12k
views
C# Performance Counter Help, Nvidia GPU
So I've been experimenting with the performance counter class in C# and have had great success probing the CPU counters and almost everything I can find in the windows performance monitor. HOWEVER, I ...
8
votes
1
answer
19k
views
CUDA5 Examples: Has anyone translated some cutil definitions to CUDA5?
Has anyone started to work with the CUDA5 SDK?
I have an old project that uses some cutil functions, but they've been abandoned in the new one.
The solution was that most functions can be translated ...