Data Transfer to and from PyTorch

Contents

Introduction
C++
- Data to torch::Tensor
  - Array to torch::Tensor and back
  - std::vector to torch::Tensor and back
- cv::Mat to torch::Tensor and back
Python

Introduction

PyTorch and the underlying libtorch is only of use when copying data to it.

NB!: No matter what source and target is chosen make sure that memory is contiguous!

C++

We may assume two base scenarios here:

moving data from a C++ standard structure
moving data from a cv::Mat

Data to torch::Tensor and back

Moving data from C++ primitives to libtorch is certainly less common in computer vision applications but much more common for other applications though this may also be done via thrust or Eigen::Matrix.

NB!: If data is moved via a pointer and not copied, it can be manipulated by the “previous variable” (e.g. an array) and the value changes in the tensor as well.

Array to torch::Tensor and back

Moving a standard C++ array to a torch tensor requires using the from_blob functionality. Copying it back to an array requires usage of std::memcpy.

std::cout << "cpp array to libtorch tensor" << std::endl;
uint8_t cpp_arr[3][3][3] = {
    {
        {0,  10, 20},
        {30, 40, 50},
        {60, 70, 80}
    },
    {
        {10, 10, 20},
        {30, 40, 50},
        {60, 70, 80}
    },
    {
        {20,  10, 20},
        {30, 40, 50},
        {60, 70, 80}
    }
};
torch::Tensor torch_tensor_from_cpp_arr = 
    torch::from_blob(cpp_arr, {3,3,3},
        torch::TensorOptions()
            .dtype(torch::kU8)
            .device(torch::kCPU));

std::cout << "libtorch tensor to cpp array" << std::endl;
uint8_t cpp_arr_back[3][3][3];
std::memcpy(cpp_arr_back, torch_tensor_from_cpp_arr.data_ptr(), sizeof(uint8_t)*torch_tensor_from_cpp_arr.numel());

std::vector to torch::Tensor and back

torch::Tensor expect contiguous input data when using ::from_blob. Therefore, flattening out any std::vector prevents many potential issues. Another approach would be pre-allocating a tensor, iterate over slices and load data on a slice using ::from_blob.

std::vector<uint8_t> vec_in = {0,10,20,30,40,50,60,70,80};

torch::Tensor torch_tensor_from_std_vec = 
    torch::from_blob(vec_in.data(),
        { static_cast<int64>(vec_in.size()) },
        torch::TensorOptions()
            .dtype(torch::kU8)
            .device(torch::kCPU));
std::cout << torch_tensor_from_std_vec << std::endl;

vec_in[5] = 19; // modify a value (just for demonstration)
std::cout << torch_tensor_from_std_vec << std::endl;

std::vector<uint8_t> vec_out(torch_tensor_from_std_vec.sizes()[0]);

auto tensor_lenght = torch_tensor_from_std_vec.sizes()[0];

for (int i=0; i < tensor_lenght; ++i)
{
    vec_out[i] = torch_tensor_from_std_vec[i].item<uint8_t>();
}

It seems like the most reliable method of converting data from torch::Tensor to std::vector is to iterate over the Tensor like this:

cv::Mat to torch::Tensor and back

The most common thing to do in computer vision ;)

Data to cv::Mat

Sometimes cv::Mat is handy as OpenCV provides some nice implementations of algorithms that may not only be used for computer vision. It might be worth to move some basic data to cv::Mat before passing it on to torch::Tensor.

Array to cv::Mat and back

uint8_t arr[3][3] = {
    {0,  10, 20},
    {30, 40, 50},
    {60, 70, 80}
};

cv::Mat cv_mat_from_array(3,3, CV_8U1, &arr);

// and back
uchar *arr_back = cv_mat_from_array.data;

std::vector to cv::Mat and back

Using a flattened std::vector makes it a lot easer:

std::vector<uint8_t> vec_flat = {0, 10, 20, 30, 40, 50, 60, 70, 80};
cv::Mat cv_mat_from_vec(3,3, CV_8UC1, vec_flat.data());

cv_mat_from_vec = cv_mat_from_vec.clone(); // make it contiguous!
std::vector<uint8_t> vec_from_mat;
vec_from_mat.assign(cv_mat_from_vec.data,cv_mat_from_vec.data+cv_mat_from_vec.total()*cv_mat_from_vec.channels());

cv::Mat to libtorch and back

Converting a cv::Mat to torch::Tensor might actually be the most common way if we are dealing with a computer vision problem to which we deploy a libtorch-based solution.

torch::Tensor tensor_from_mat = torch::from_blob(
    cv_mat_from_vec.data,
    {static_cast<int64_t>(cv_mat_from_vec.rows), // H
    static_cast<int64_t>(cv_mat_from_vec.cols)}, // W
    torch::TensorOptions()
        .dtype(torch::kU8)
        .device(torch::kCPU));

cv::Mat mat_from_tensor(tensor_from_mat.size()[0],tensor_from_mat.size()[1], CV_8UC1);
std::memcpy((void *)mat_from_tensor.data, tensor_from_mat.data_ptr(), sizeof(torch::kU8)*tensor_from_mat.numel());

cv::cuda::GpuMat to torch::Tensor and back

OpenCV provides some limited CUDA capabilities with many important features not exposed via the public API. Have a look at OpenCV CUDA Integration - OpenCV GpuMat and libtorch for a detailed explanation how to convert cv::cuda::GpuMat to torch::Tensor and back.

Python

PyTorch (the Python API) requires much less considerations and data transfer is pretty easy.

NumPy to PyTorch and back

Copying a numpy array to a torch tensor is straight forward:

np_array = np.array([[1,2,3],[4,5,6],[7,8,9]])
pt_tensor = torch.as_tensor(np_array)

Probably the most common way to transfer results back to a numpy array is the following:

# assuming the tensor is detached!
results_np = results_pt.cpu().numpy()

List to PyTorch and back

This is straight forward as well:

pt_tensor = torch.as_tensor([[1,2,3],[4,5,6],[6,7,8]])

py_list = pt_tensor.tolist()

PyTorch and the CUDA Array Interface

As long as on object provides the CUDA Array Interface the process is straight forward:

pt_tensor = torch.as_tensor(cupy_array, device='cuda')

For details using this with OpenCV, please look here.