Sam Stites

Data.Text is encoded as UTF-16!

April 12, 2016

Surprise! UTF-16 is faster to parse than UTF-8. This is because UTF-8 requires either 8, 16, 24 or 32 bits (one to four octets) while UTF-16 requires either 16 or 32 bits to encode a character. As an aside, UTF-32 always requires 32 bits.

As far as processing time is concerned, text with variable-length encoding such as UTF-8 or UTF-16 is harder to process if there is a need to find the individual code units, however between the two UTF-16 is less variable and thus has more predictable memory access —  making it the faster of the two.