Our topic for today is to identify these three. In short, byte includes all ASCII characters, while rune is able to represent all UTF-8 encoding characters. Interestingly, the inventor of Golang and UTF-8 are the same person – Ken Tompson, he is also the inventor of Unix. Back to our topic, byte is a basic type, it is the same as int8.

// builtin.go
// ...
// byte is an alias for uint8 and is equivalent to uint8 in all ways. It is
// used, by convention, to distinguish byte values from 8-bit unsigned
// integer values.
type byte = uint8

Now let’s take a look at an example

package main

func main() {
	var a byte = '文'
}

Executing the code ablve will raise an error: constant 25991 overflows byte. The correct way is to use rune. The size of rune changes according to the character it represents. We use utf8.RuneLen() to show the length of a rune.

package main

import (
	"fmt"
	"unicode/utf8"
)

func main() {
	var a rune = 'a'
	var b rune = 'à'
	var c rune = '文'
	size := utf8.RuneLen(a)
	fmt.Println(size)      // 1
	size = utf8.RuneLen(b)
	fmt.Println(size)      // 2
	size = utf8.RuneLen(c)
	fmt.Println(size)      // 3
}

Enough of byte and rune, We can consider a question: Is string a sequence of byte or rune?

The answer is byte

package main

import "fmt"

func main(){
	var s = "Hello, World" // 12
	fmt.Println(len(s))
  	s = "中文" // 6
	fmt.Println(len(s))
}

We can take it this way, the basic unit of rune and string is byte, a rune consist of one or more byte while the same applys to string. The difference between rune and string is just that a rune represents one character while string represents a sequence of character.

package main

import "fmt"
import "unicode/utf8"

func main(){
	var s = "H" // 1
	fmt.Println(len(s))
  	var c = 'H'// 1
	fmt.Println(utf8.RuneLen(c))
}