Our topic for today is to identify these three. In short, byte includes all ASCII characters, while rune is able to represent all UTF-8 encoding characters. Interestingly, the inventor of Golang and UTF-8 are the same person – Ken Tompson, he is also the inventor of Unix. Back to our topic, byte is a basic type, it is the same as int8.
// builtin.go
// ...
// byte is an alias for uint8 and is equivalent to uint8 in all ways. It is
// used, by convention, to distinguish byte values from 8-bit unsigned
// integer values.
type byte = uint8
Now let’s take a look at an example
package main
func main() {
var a byte = '文'
}
Executing the code ablve will raise an error: constant 25991 overflows byte. The correct way is to use rune. The size of rune changes according to the character it represents. We use utf8.RuneLen() to show the length of a rune.
package main
import (
"fmt"
"unicode/utf8"
)
func main() {
var a rune = 'a'
var b rune = 'à'
var c rune = '文'
size := utf8.RuneLen(a)
fmt.Println(size) // 1
size = utf8.RuneLen(b)
fmt.Println(size) // 2
size = utf8.RuneLen(c)
fmt.Println(size) // 3
}
Enough of byte and rune, We can consider a question: Is string a sequence of byte or rune?
The answer is byte
package main
import "fmt"
func main(){
var s = "Hello, World" // 12
fmt.Println(len(s))
s = "中文" // 6
fmt.Println(len(s))
}
We can take it this way, the basic unit of rune and string is byte, a rune consist of one or more byte while the same applys to string. The difference between rune and string is just that a rune represents one character while string represents a sequence of character.
package main
import "fmt"
import "unicode/utf8"
func main(){
var s = "H" // 1
fmt.Println(len(s))
var c = 'H'// 1
fmt.Println(utf8.RuneLen(c))
}