Step-by-step, let's modify the OP's code as it now (Mar 17 at 0600 eastern) appears, after the OP applied a few edits:
STEP 1:
Consider snippet 1, which is the C++ int main( ) prog. The linker will try to do what you want, which is to make squre_array( ) accessible -- i.e., callable -- from main( ). In this C++ file, you must #include the header file that declares squre_array( ) to be a C-language function -- the one and only crucial point in this whole process -- rather than a C++ function. (Why? Because the compiler formats and stores C-language symbols differently from C++ symbols; and so when the linker comes along, the C-type symbol defined in the C source is not the same as the C++-type symbol referenced in main( ).) Now, is that header file named cuda.h? Let's assume it is. Remember that such a declaration makes "extern void squre_array( )" superfluous and confusing, so take that line out of this source file:
#include <string>
#include <iostream>
#include <stdio.h>
#include <cuda.h> <-- add this line
//extern void squre_array(); <-- delete this line: we'll declare squre_array( ) in cuda.h
using namespace std;
int main() {
squre_array();
}
STEP 2:
Now consider snippet 2, which
defines the squre_array( ) function. This is plain old C code so we have to bracket all of that C code with two sets of three lines each. These six lines (total) effectively tell the linker that the symbols in the bracketed code are C-type symbols rather than C++-type"munged" symbols. When the linker is finally convinced of that, it can link the squre_array( ) function into your main program:
// insert magic three lines here, way up at the top of your .c file
#ifdef __cplusplus //if we are compiling as C++, tell
extern "C" { //the compiler that this stuff is plain old C
#endif
#include <stdio.h>
#include <cuda.h> <-- remember this "glue" file: we'll change it in step 3
//_global_ void square_array(float *a, int N) <-- remove the declaration,
void square_array(float *a, int N) { <-- but retain the definition
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if (idx
a[idx] = a[idx] * a[idx];
}
void squre_array()
{
float *a_h, *a_d;
...
cudaFree(a_d);
}
// close magic three lines
#ifdef __cplusplus //
} // closing curly bracket
#endif
STEP 3:
The important thing that is missing from the OP's understanding is that squre_array( ) (and square_array( ), if you want) must be declared; and that declaration(s) need to be enclosed within the same pair of magic three lines. (OP: why must that be?) We decided in step 1 that the declaration would go in cuda.h. Or it can go in any .h file, but wherever it's declared, that .h file has to be #included in the file where main( ) resides (OP: again, why is this?). So let's fix up cuda.h:
// magic three lines again
#ifdef __cplusplus
extern "C" {
#endif
void squre_array();
void square_array(float *a, int N);
// close magic three lines, just like before
#ifdef __cplusplus //
} // closing curly bracket
#endif
And that's it. Now your program will link.
-- pete
square_array <<< n_blocks, block_size >>> (a_d, N);? It doesn't look like C code.