Many email clients now offer some support for Unicode. While some use Unicode by default, many others will automatically choose between old and Unicode encodings depending on mail content, either automatically or when the user requests it.
Technical requirements for sending messages containing non-ASCII characters via email included
- encoding certain header fields (subject, sender and recipient name, sending organization and reply-to name) and, optionally, content in content transfer encoding
- non-ASCII character encoding in one of Unicode changes
- negotiate UTF-8 encoding usage in email address and reply code (SMTPUTF8)
- sends information about the encoding of content transfers and the Unicode transformation is used so that messages can be correctly displayed by the recipient (see Mojibake).
If the sender's or recipient's email address contains non-ASCII characters, message delivery also requires this encoding to a format that can be understood by the mail server.
Video Unicode and email
Unicode support in protocol
- RFC 6531 provides a mechanism for allowing non-ASCII email addresses to be encoded as UTF-8 in the SMTP or LMTP protocol
Maps Unicode and email
Unicode support in message header
To use Unicode in certain email header fields, e.g. subject line, sender and recipient name, Unicode text must be encoded using MIME "Encoded-Word" with Unicode encoding as charset. To use Unicode in the domain part of an email address, IDNA encoding must be traditionally used. Alternatively, SMTPUTF8 allows the use of UTF-8 encoding in email addresses (both on the local and in the domain name) or in the mail header section. Various standards have been created for non-ASCII data-handling retrofit to ASCII-only email protocol only:
- RFC 2047 provides support for encoding non-ASCII values ââsuch as real names and subject lines in email headers
- RFC 5890 provides support for encoding non-ASCII domain names in the Domain Name System
- RFC 6532 allows the use of UTF-8 in the mail header section
Unicode support in message body
Like all separate encodings from US-ASCII, when using Unicode text in an email, MIME must be used to determine that the Unicode transformation format is used for text.
UTF-7, although sometimes considered obsolete, has advantages over other Unicode coding since it does not require transfer coding to match the seven-bit limit of many legacy Internet mail servers. On the other hand, UTF-16 must be transferred encoded to fit the SMTP data format. Although not absolutely necessary, UTF-8 typically also transfers encoded to avoid problems in seven-bit mail servers. The UTF-8 MIME transfer encoding makes it unreadable as plain text (in base64 case) or, for some languages ââand text types, very inefficient (in quoted-printed) size.
Some document formats, such as HTML, PostScript, and Rich Text Format have their own 7-bit encoding schemes for non-ASCII characters and can therefore be sent without using any special encoding of email. For example. HTML emails can use HTML entities to use characters from anywhere in Unicode even if the HTML source text for email is in the old encoding (eg 7-bit ASCII). For details, see Unicode and HTML. The rest of this article deals with email messages where the actual raw text (whether markup or plain text) is in encoding that includes all Unicode.
See also
- Email client comparison
- List of Unicode fonts
- Free Unicode font software
- International mail
References
External links
- freeware fonts, editors, and documentation SIL
Source of the article : Wikipedia