Ubuntu2022.04+cuda12.1+torch2.1

1 Change source Domestic sources have obvious speed advantages. Commonly used ones include Tsinghua University, Alibaba, etc. # 1) System software source # vim /etc/apt/sources.list # The source code image is commented by default to improve the speed of apt update. You can uncomment it yourself if necessary. ———- deb https://mirrors.tuna.tsinghua.edu.cn/ubuntu/ jammy main restricted universe […]

Linux offline installation of cuda&cudnn and configuration of the machine and its environment packaging and migration

cuda installation cuda version adaptation Check the cuda version number supported by your computer [You can skip this step if you install the cuda toolkit on a supercomputing platform] CUDA toolkit Download official website download cuda toolkit Upload the downloaded .run executable file to the platform for offline installation $ cd /uploaded directory $ chmod […]

Reasons and solutions for GPU CUDA running speed not to increase but to decrease when using shared memory

I wrote several operators for adding two images and image filtering, respectively using shared memory for optimization. #include <stdio.h> #include <cuda_runtime.h> #include “helper_cuda.h” #include “helper_timer.h” #define BLOCKX 32 #define BLOCKY 32 #define BLOCK_SIZE 1024 #define PADDING 2 __global__ void filter5x5(float* in, float* out, int nW, int nH) {<!– –> // Thread index —> Global memory […]

CMAKE: Facilitates joint programming between CUDA and other languages (C, C++, RUST)

In CUDA programming, especially when it involves multi-language mixed programming, such as C/C++ and CUDA or RUST and CUDA, etc., it is still troublesome to use the NVCC compiler to compile some large projects. Using cmake to configure compilation options is relatively simple and powerful. cmake is a powerful automated configuration tool that is open […]

Solve the problems encountered by UniAD when running under high versions of CUDA and pytorch

UniADhttps://github.com/OpenDriveLab/UniAD is oriented to driving planning set perception (target detection and tracking) and mapping (not mapping the environment like SLAM, but real-time panoramic segmentation of roads and isolation zones in images) It is a unified large model that integrates multi-task modules such as trajectory planning and occupancy prediction. The installation instructions on the official website […]

Crop and Resize images using CV_CUDA

Maybe I’m using it incorrectly. Directly calling the C++ OpenCV api is much faster than using CV_CUDA. /* * SPDX-FileCopyrightText: Copyright (c) 2022-2023 NVIDIA CORPORATION & amp; AFFILIATES. All rights reserved. * SPDX-License-Identifier: Apache-2.0 * * Licensed under the Apache License, Version 2.0 (the “License”); * you may not use this file except in compliance […]

[Resolved] RuntimeError: CUDA error: device-side assert triggeredCUDA kernel errors might be asynchronous

Problem description The specific error message is ./aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [0,0,0] Assertion `t >= 0 & amp; & amp ; t < n_classes` failed. ../aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [1,0,0] Assertion `t >= 0 & amp; & amp; t < n_classes` failed. ../aten/src/ATen/native/cuda/Loss.cu:240: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [2,0,0] Assertion `t >= 0 & […]

Deployment of yolov8 tensorrt model under Jetson Xavier NX (Jetpack5.1.2, CUDA11.4, Cudnn8.6.0, Tensorrt8.5.2)

Article directory Preface Jetson Xavier NX environment configuration 1. TensorRT-Alpha source code download 1. Source code download 2.File settings 2. Yolov8 model deployment 1. Export yolov8 onnx model 2. Use tensorrt to convert onnx files to trt files 3. Source code modification 4.Compile 5. Run Summarize refer to Foreword Records of the deployment process and […]

ubuntu 20.04 + cuda-11.8 + cudnn-8.6+TensorRT-8.6

1. Install graphics card driver ubuntu20.04 + cuda10.0 + cudnn7.6.4_Who am I? ? Blog-CSDN Blog View supported driver versions: Check the driver information that can be configured by the local graphics card lu@host:/usr/local$ ubuntu-drivers devices == /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 == modalias : pci:v000010DEd000021C4sv000010DEsd000021C4bc03sc00i00 vendor: NVIDIA Corporation model: TU116 [GeForce GTX 1660 SUPER] driver: nvidia-driver-525-open-distro non-free driver: nvidia-driver-450-server […]

GPT practical series-ChatGLM3 local deployment CUDA11+1080Ti+ graphics card 24G practical solution

Table of Contents 1. ChatGLM3 model 2. Resource requirements 3. Deployment and installation Configuration Environment Installation process Low-cost configuration and deployment solution 4. Start ChatGLM3 5. Functional testing Freshly released, the domestic GPT version has been iteratively updated~ The Tsinghua team has just released ChatGLM3, and just before the Yunqi Conference, Baichuan also released Baichuan2-192K, […]