All posts tagged unicode

Is a random set of bytes a UTF-8 string?

UTF-8 is an encoding for Unicode text with several nice properties: Any Unicode string can be encoded For pure ASCII text (characters under 128), the UTF-8 encoded form is the same as the ASCII form It’s pretty well universally recognised as a text-interchange format on the Internet Random byte streams are unlikely to be misinterpreted […]