RFC 5322 Standard Email Format
When deduplicating email messages prior to production, a common step is to standardize all email in the RFC 5322 format. This is the Internet Message Format required by SMTP to transfer email messages between different computers. The RFC 5322 standard replaced the RFC 2822 standard, which replaced the RFC 822 standard. Emails consist of a header and a body. The header can include date, from, sender, to, cc, bcc, subject, comments, keywords, message-id, and references fields, but only the date, from and sender fields are required. The field names must begin a line, be followed by a colon, and then be terminated by CRLF - carriage return / line feed.
The local part of the email address [e.g., localname@domain.com] must contain some combination of upper and lower case Latin letters, the digits 0-9, and the characters !#$%&'*+-/=?^_`{|}~, and a few other special characters. A dot can be used as well, but it cannot begin or end the local name. The domain name has to meet the requirements for a hostname, or it can be an IP address. A hostname is a series of characters separated with dots. The labels between the dots can be up to 63 characters, and the entire hostname can't be longer than 253 characters.
The body of the message must consist of US-ASCII characters . No line should more than 78 characters, and no line can be more than 998 characters. A line is ended with a CR immediately followed by a LF. CR and LF cannot appear independently of one another in the email.
Emails sent between users of the same MS Exchange network have not used SMTP and .msg files will not necessarily follow the RFC 5322 protocol. The .eml format used by Windows Live mail is plain text SMTP that comply with RFC 5322.