Using batch size of one in tensorflow?

Question

So, I have a model where the theoretical justification for the update procedure relies on having a batch size of 1. (For those curious, it's called Bayesian Personalized Ranking for recommender systems.)

Now, I have some standard code written. My input is a tf.placeholder variable. It's Nx3, and I run it as normal with the feed_dict. This is perfectly fine if I want N to be, say, 30K. However, if I want N to be 1, the feed_dict overhead really slows down my code.

For reference, I implemented the gradients by hand in pure Python, and it runs at about 70K iter/second. In contrast, GradientDescentOptimizer runs at about 1K iter/second. As you can see, this is just far too slow. So as I said, I suspect the problem is feed_dict has too much overhead to call it with a batch size of 1.

Here is the actual session code:

sess = tf.Session()
sess.run(tf.global_variables_initializer())
for iteration in range(100):
    samples = data.generate_train_samples(1000000)
    for sample in tqdm(samples):
        cvalues = sess.run([trainer, obj], feed_dict={input_data:[sample]})
    print("objective = " + str(cvalues[1]))

Is there a better way to do a single update at once?

asakryukin · Accepted Answer · 2018-02-03 09:42:34Z

3

Probably your code runs much slower for two reasons:

You copy your data to GPU memory (if you use GPU) only when you run session and you do it many times (And this is really time consuming)
You do it in 1 thread

Luckily Tensorflow has tf.data API which helps to solve both problems. You can try to do something like:

inputs = tf.placeholder(tf.float32, your_shape)
labels = tf.placeholder(tf.floar32, labels_shape)
data = tf.data.Dataset.from_tensor_slices((inputs, labels))

iterator = dataset.make_initializable_iterator()

sess.run(iterator.initializer, {inputs: your_inputs, labels: your_labels})

And then to get next entry from the dataset you just use iterator.get_next()

If that's what you need, tensorflow has exhaustive documentation on importing data using tf.data API where you can find suitable for you use-case: documentation

answered Feb 3, 2018 at 9:42

asakryukin

2,6141 gold badge16 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

anon Over a year ago

I appreciate the response. I am not using GPU which unfortunately cannot be helped right now. My CPU does have 4 cores with 2 threads each, and indeed my code is using only 1 core, so indeed I can speed things up by using 4 cores. But I still run into the fundamental problem that with that fix, this is computing at a maximum of 4K iterations/ second, versus 70K iterations/ second for my vanilla Python implementation. feed_dict is still a bottleneck.

Eli Korvigo Over a year ago

@anon what makes you think feed_dict is the bottleneck? Have you profiled the code?

anon Over a year ago

I have not. It's the logical conclusion from the fact that if my batch size is large enough, performance is totally fine (>100K it/sec) despite using a single thread on a single core. But if the batch size is down to 1, performance drops to ~1K it/sec. The only fundamental is the number of calls to feed_dict as well as to sess.run . Maybe the overhead is in sess.run?

Collectives™ on Stack Overflow

Using batch size of one in tensorflow?

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related