After an initial search on this, I'm bit lost.
I want to use a buffer object to hold a sequence of Unicode code points. I just need to scan and extract tokens from said sequence, so basically this is a read only buffer, and we need functionality to advance a pointer within the buffer, and to extract sub-segments. The buffer object should of course support the usual regex and search ops on strings.
An ordinary Unicode string can be used for this, but the issue would be the creating of sub-string copies to simulate advancing a pointer within the buffer. This seems to be very inefficient esp for larger buffers, unless there's some workaround.
I can see that there's a Memoryview object that would be suitable, but it does not support Unicode (?).
What else can I use to provide the above functionality? (Whether in Py2 or Py3).