Our topic for today is to identify these three. In short, byte
includes all ASCII characters, while rune
is able to represent all UTF-8 encoding characters. Interestingly, the inventor of Golang and UTF-8 are the same person – Ken Tompson, he is also the inventor of Unix. Back to our topic, byte
is a basic type, it is the same as int8
.
// builtin.go
// ...
// byte is an alias for uint8 and is equivalent to uint8 in all ways. It is
// used, by convention, to distinguish byte values from 8-bit unsigned
// integer values.
type byte = uint8
Now let’s take a look at an example
package main
func main() {
var a byte = '文'
}
Executing the code ablve will raise an error: constant 25991 overflows byte
. The correct way is to use rune
. The size of rune
changes according to the character it represents. We use utf8.RuneLen()
to show the length of a rune.
package main
import (
"fmt"
"unicode/utf8"
)
func main() {
var a rune = 'a'
var b rune = 'à'
var c rune = '文'
size := utf8.RuneLen(a)
fmt.Println(size) // 1
size = utf8.RuneLen(b)
fmt.Println(size) // 2
size = utf8.RuneLen(c)
fmt.Println(size) // 3
}
Enough of byte
and rune
, We can consider a question: Is string
a sequence of byte
or rune
?
The answer is byte
package main
import "fmt"
func main(){
var s = "Hello, World" // 12
fmt.Println(len(s))
s = "中文" // 6
fmt.Println(len(s))
}
We can take it this way, the basic unit of rune
and string
is byte
, a rune
consist of one or more byte
while the same applys to string
. The difference between rune
and string
is just that a rune
represents one character while string represents a sequence of character.
package main
import "fmt"
import "unicode/utf8"
func main(){
var s = "H" // 1
fmt.Println(len(s))
var c = 'H'// 1
fmt.Println(utf8.RuneLen(c))
}