3

I am using Protobuf (v3.5.1) in a Python project I'm working on. My situation can be simplified to the following:

// Proto file

syntax = "proto3";

message Foo {
    Bar bar = 1;
}

message Bar {
    bytes lotta_bytes_here = 1;
}

# Python excerpt
def MakeFooUsingBar(bar):
    foo = Foo()
    foo.bar.CopyFrom(bar)

I am worried about the memory performance of .CopyFrom() (If I am correct, it is copying contents, instead of the reference). Now, in C++, I could use something like:

Foo foo;
Bar* bar = new Bar();
bar->set_lotta_bytes_here("abcd");
foo.set_allocated_bar(bar);

Which looks like it does not need to copy anything judging by the generated source:

inline void Foo::set_allocated_bar(::Bar* bar) {
  ::google::protobuf::Arena* message_arena = GetArenaNoVirtual();
  if (message_arena == NULL) {
    delete bar_;
  }
  if (bar) {
    ::google::protobuf::Arena* submessage_arena = NULL;
    if (message_arena != submessage_arena) {
      bar = ::google::protobuf::internal::GetOwnedMessage(
          message_arena, bar, submessage_arena);
    }

  } else {

  }
  bar_ = bar;
  // @@protoc_insertion_point(field_set_allocated:Foo.bar)
}

Is there something similar available in Python? I have looked through the Python generated sources, but found nothing applicable.

3
  • You're just worried about the performance? Did you measure it to see whether it will be a problem for your application? Commented Feb 23, 2018 at 23:03
  • @GregHewgill I am aware that while I am copying the resource, two instances exist in the memory. My application uses large resources (in tens or hundreds of megabytes), and I want to avoid the overhead. Especially since my intention is not to copy the resource, but simply move it. I understand this can be looked at as premature optimization, but if there is a built-in functionality I could use, I don't see a reason not use it. Commented Feb 23, 2018 at 23:09
  • Oh, well if you're copying hundreds of megabytes of stuff, then sure, this is worth investigating the performance aspects. :) Commented Feb 23, 2018 at 23:40

1 Answer 1

5

When it comes to large string or bytes objects, it seems that Protobuf figures the situation fairly well. The following passes, which means that while a new Bar object is created, the binary array is copied by reference (Python bytes are immutable, so it makes sense):

def test_copy_from_with_large_bytes_field(self):
    bar = Bar()
    bar.val = b'12345'
    foo = Foo()
    foo.bar.CopyFrom(bar)

    self.assertIsNot(bar, foo.bar)
    self.assertIs(bar.val, foo.bar.val)

This solves my issue of large bytes object. However, if someone's problem lies in nested, or repeated fields, this will not help - such fields are copied field by field. It does make sense - if one copies a message, they want the two to be independent. If they were not, making changes to the original message would modify the copied (and vice versa).

If there is anything akin to the C++ move semantics (https://github.com/google/protobuf/issues/2791) or set_allocated_...() in Python protobuf, that would solve it, however I am not aware of such a feature.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.