3

I recently asked this question and the answers increased my understanding, but they didn't solve the actual problem I had. So, I will try to ask a similar but different question as follows.

Suppose that I want to access random rune element of a string. One way is:

func RuneElement(str string, idx int) rune {
  var ret rune
  for i, c := range str {
    if i == idx {
      return c
    }
  }
  return ret // out of range -> proper handling is needed
}

What if I want to call such a function a lot of times? I guess what I am looking for is like an operator/function like str[i] (which returns a byte) that return the rune element at i-th position. Why this element can be accessed using for ... range but not through a funtcion like str.At(i) for example?

5
  • 1
    If you don't want to convert a string to []rune in every call, you need to use []rune Commented Jun 13, 2017 at 16:45
  • @JimB But, my input is a string and I try to avoid conversion of string to []rune Commented Jun 13, 2017 at 16:51
  • My point is that you need to convert a string to a []rune in order to index it as such. If you don't want to repeatedly convert the string, then use a []rune as the argument type, and convert it once. Commented Jun 13, 2017 at 17:03
  • Or are you simply looking for: play.golang.org/p/RdH7oMCHIZ? Commented Jun 13, 2017 at 17:08
  • @JimB Yes, that is what I am looking for, but without conversion from string to []rune. It seems that it's not possible though because of the way string is designed as @icza mentions here Commented Jun 13, 2017 at 17:12

1 Answer 1

4

string values in Go store the UTF-8 encoded byte sequence of the text. This is a design decision that has been made and it won't change.

If you want to efficiently get a rune from it at an arbitrary index, you have to decode the bytes, you can't do anything about that (the for ... range does this decoding). There is no "shortcut". The chosen representation just doesn't provide this out of the box.

If you have to do this frequently / many times, you should change your input and not use string but a []rune, as it's a slice and can be efficiently indexed. string in Go is not []rune. string in Go is effectively a read-only []byte (UTF-8). Period.

If you can't change the input type, you may build an internal cache mapped from string to its []rune:

var cache = map[string][]rune{}

func RuneAt(s string, idx int) rune {
    rs := cache[s]
    if rs == nil {
        rs = []rune(s)
        cache[s] = []rune(s)
    }
    if idx >= len(rs) {
        return 0
    }
    return rs[idx]
}

It depends on case whether this is worth it: if RuneAt() is called with a small set of strings, this may improve performance a lot. If the passed strings are more-or-less unique, this will result in worse performance and a lot of memory usage. Also this implementation is not safe for concurrent use.

Sign up to request clarification or add additional context in comments.

2 Comments

Just curious what may be the rationale behind the design decision.
@AlexanderItes I guess the authors thought the advantages outweight the disadvantages. Some operations just don't care about the representation (e.g. equality check), other operations that require runes: decoding runes from UTF-8 bytes is faster than you think, and very often the UTF-8 representation is needed in the end (e.g. when transmitting textual data), so having that stored simplifies and makes such operations faster.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.