5

Let's say I have a numpy array arr = np.array([1, 2, 3]) and a pytorch tensor tnsr = torch.zeros(3,)

Is there a way to read the data contained in arr to the tensor tnsr, which already exists rather than simply creating a new tensor like tnsr1 = torch.tensor(arr).

This is a simplified example of the problem, since I am using a dataset that contains nearly 17 million entries.

EDIT: I know I can manually loop through each entry in the array. With 17 million entries, that would take quite a while I believe...

2
  • For now, I'm using a little workaround that's not necessarily the most efficient and/or elegant solution. Basically, tnsr[:] = torch.tensor(arr)[:] Commented Jul 2, 2021 at 18:51
  • If anyone knows a more elegant solution I'm all ears :) Commented Jul 2, 2021 at 18:51

1 Answer 1

3

You can do that using torch.from_numpy(arr). Here is an example that shows that it's not being copied.

import numpy as np
import torch

arr = np.random.randint(0,high=10**6,size=(10**4,10**4))
%timeit arr.copy()

tells me that it took 492 ms ± 6.54 ms to copy the array of random integers. On the other hand

%timeit torch.from_numpy(arr)

tells me that it took 1.14 µs ± 131 ns to turn it into a tensor. So there is no way that the 100 mio integers could have been copied. Pytorch is still using the same data.

Finally your version i.e.

%timeit torch.tensor(arr)

gives 201 ms ± 4.08 ms. Which is quite surprising to me. Since it should not be faster than numpy's copy in copying. But when it's not copying what takes it 1/5 or a second? Maybe it's doing a shallow copy. Maybe somebody else can tell us what's going on exactly.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.