CUDA矩阵加法

65 阅读 0 评论 43 点赞

我是靠谱客的博主勤奋大象，这篇文章主要介绍CUDA矩阵加法，现在分享给大家，希望可以做个参考。

实现矩阵相加
有可能相关库没有加载上请自行加载

#include <stdio.h>
#include <iostream>
#include <cuda_runtime.h> // For the CUDA runtime routines (prefixed with "cuda_")
#include <DEVICE_LAUNCH_PARAMETERS.h> //我在查询中找到的头文件 有可能有别的表达方式
//#include <stdio.h>
#define N 1024
#define TPB 16
__global__ void MatAdd(int A[N][N], int B[N][N], int C[N][N])
{

int i = blockIdx.x * blockDim.x + threadIdx.x;

int j = blockIdx.y * blockDim.y + threadIdx.y;

if (i < N && j < N)

C[i][j] = A[i][j] + B[i][j];
}
int a[N][N], b[N][N], c[N][N];
void main()
{

//int a[N][N], b[N][N], c[N][N];

int(*dev_a)[N], (*dev_b)[N], (*dev_c)[N];

for (int y = 0; y < N; ++y)

{

for (int x = 0; x < N; ++x)

{

a[x][y] = x + y * N;

b[x][y] = x * x + y * N;

}

}

cudaMalloc(&dev_a, N * N * sizeof(int));

cudaMalloc(&dev_b, N * N * sizeof(int));

cudaMalloc(&dev_c, N * N * sizeof(int));

cudaMemcpy(dev_a, a, N * N * sizeof(int), cudaMemcpyHostToDevice);

cudaMemcpy(dev_b, b, N * N * sizeof(int), cudaMemcpyHostToDevice);

dim3 threadsPerBlock(TPB, TPB);

dim3 numBlocks((N + TPB - 1) / threadsPerBlock.x, (N + TPB - 1) / threadsPerBlock.y);

MatAdd <<<numBlocks, threadsPerBlock >>>(dev_a, dev_b, dev_c);

cudaMemcpy(c, dev_c, N * N * sizeof(int), cudaMemcpyDeviceToHost);

int temp = 0;

for (int y = 0; y < N; ++y)

{

for (int x = 0; x < N; ++x)

{

temp = a[x][y] + b[x][y];

if (temp != c[x][y])

{

printf("Failure at %d
%dn", x, y);

}

}

}

printf("Sum is
%dn", temp);

cudaFree(dev_a);

cudaFree(dev_b);

cudaFree(dev_c);

//return 0;

system("pause");
}

在这段代码中，矩阵中每个索引使用一个线程来计算，通过

int i = blockIdx.x * blockDim.x + threadIdx.x;
int j = blockIdx.y * blockDim.y + threadIdx.y;

来计算当前线程的绝对坐标，该坐标正好对应于矩阵的位置。

需要注意的是，C语言使用太大的数组时，最好把数组定义为全局的，否则受栈的限制，可能会报错：
Segmentation fault: 11

最后

以上就是勤奋大象最近收集整理的关于CUDA矩阵加法的全部内容，更多相关CUDA矩阵加法内容请搜索靠谱客的其他文章。

本图文内容来源于网友提供，作为学习参考使用，或来自网络收集整理，版权属于原作者所有。

本文分类：CUDA学习日记
浏览次数：65 次浏览
发布日期：2024-01-18 17:06:14
本文链接：https://www.kaopuke.com/article/k-p-k_13_u_23_okfw_13__7__10_4.html

CUDA矩阵加法

最后

评论列表共有 0 条评论

发表评论取消回复

CUDA矩阵加法

最后

相关文章

评论列表共有 0 条评论

发表评论 取消回复

发表评论取消回复