Email

Email ought to be simple. But it isn’t.

Overview

Email has the following key components – A way of creating a message and addressing it to its intended recipients – A way of taking a message and transferring it ultimately to a recipient – A way of retrieving delivered messages and reading them

The first and third of these processes are handled by an end-user program such as Microsoft Outlook or a webmail service. The second is the focus of this note.Systems which do this are MTAs

In order to perform the task of taking a message and transferring it to its intended recipients the first thing to consider is the interfaces between the MTA and the rest of the world.

Typically, the interface into an MTA comes from a socket listening on port 25 or port 567 and expecting the SMTP protocol.. Typically, the interface from an MTA consists of the storage of received mail into a database or files in a filesystem ( or a combination of both )

Internally, during the MTA process, it is very common for messages to be passed around between different processes and indeed to be sent to other, intermediate MTAs. An MTA which is neither the first nor the last MTA to handle a message is called a relay. All SMTP software can act as a relay, but it’s important that they are configured so as only to relay messages they trust. Accidentally configuring an ‘open relay’ on the public internet means it can be used by anyone to send anything to anyone else, and will be hijacked by spammers almost instantly.

So what does an MTA do these days? Well, provided it speaks SMTP and ultimately delivers mail either to another MTA or to a mailstore, it can do more or less whatever it wants. In practice, though, the typical roles of a modern MTA such as postfix are as follows:

Accept SMTP input from a variety of sources, such as the public internet, various trusted machines or local agents.
Support Authentication to allow input from non-trusted machines to be identified as from trusted users
Based on the source of the message, the authentication status of the connection, and the list of recipients, decide quickly whether to reject the message.
Possibly based on a more detailed examination of the message, decide quickly whether to reject the message or accept it.
After acceptance, perform further tests on the message, both headers and content, to establish whether and where to send it next – possibly multiple destinations, and whether and where to send notifications such as non-delivery receipts, delivery receipts &c.
Add one or more headers to the message as appropriate.
Deliver the message to its next destinations, either local or further smtp hops.
The ‘quickly’ in the above points refers to the fact that SMTP times out if a message is not either accepted or rejected within a certain time limit. Depending on the amount of mail received, and the size of the mail infrastructure, a decision has to be made what checks to perform during the SMTP conversation and what afterwards.

One of the reasons why mail systems can get so complicated is that for efficiency reasons, mail systems try to split up the logic of these steps into small efficient tests performed at different times by different bits of the system. As a result, a perfectly sensible intention can turn into a bewidering array of subtle configuration decisions.

My requirements

Accept inbound mail on my public internet, private vpn and localhost addresses on port 25
Accept outbound mail on port 567 always from private vpn and localhost, and also if sasl authenticated on the public internet

For inbound mail

Require helo
check against sender_access file
reject senders without fqdn
reject destination domain which is not one of ours
check various blacklists
check recipient against various lists and regular expressions
potentially greylist based on above results
run through various milters for dkim, spf, dmarc, and amavis
within amavis look up authenticated senders and whitelisted destinations before running spam checks
run destinations via virtual_alias_maps to change destination routing

SMTP key information

A typical SMTP conversation consists of

 EHLO my.identifying.fqdn
 MAIL FROM: <sender@my.mailfrom.address>
 RCPT TO: <intendedrecipient@recipient.email.address>
 DATA
 Headers
 <blank line>
 Message
 .
 QUIT

The headers must contain

 Message-Id: <some-id-which-looks-like-mail@somewhere-who-cares>
 From:<from@my.from.address>
 To: <intendedrecipient@recipient.email.address> ( repeated multiple times if sent to multiple people)
 Date:

and can contain almost anything else.

As soon as it’s accepted by the first SMTP server, the server will add at least two headers to the top of the message:

 Return-Path: the RCPT TO
 Received: which contains something like from: ip/dns of immediate sender by: my hostname with: my mta  for: the MAIL FROM

It may add more.

As the mail is transferred between servers going from hop to hop, each new server will add extra Received: headers along with anything else they feel like. These headers are placed immediately above the most recent Received header. So the typical format of the finally delivered email will be:

 Return-Path:
 Received: .... by the final smtp host
 other headers put there by that host
 Received: .... by the second last smtp host
 ....
 Received by the first smtp host
 ....
 The original message

So to understand a finally delivered email, start at the bottom with the last set of headers. These correspond to the original message and will tell you to whom it was originally sent and who the sender claimed to be. The immediately preceding Received and Return-Path will probably then tell you which recipient this particular copy was destined for.

Within the original message, the following information is particularly important in identifying the source:

 from: - is the email address of the logical originator of the message
 reply-to: - is the optional email address of the mailbox to which replies should be sent
 sender: - is the optional email address of the actual sender of the message

sender and reply-to default to from, if from only appears once. However, in some cases there may be multiple from addresses in which case sender and reply to are mandatory.

This can all get rather confusing, because there can be four different records in the email each apparently related to the sender. This table may help

Return-Path	The MAIL FROM claim in the original SMTP conversation
From	The one or more From addresses of the people or entities which wanted the email sent
Rcpt-To	The address to which any reply should be sent
Sender	The email address of the person or entity which actually sent the email

If you want to know what’s wrong with SMTP, the basic answer is that every one of the four pieces of information above can be forged – either by the original sender, or by an intermediate system. SMTP was designed in days when it was assumed that anyone sending email wanted to get things right and there was no point in worrying about deliberate attempts to mislead. These days, that assumption is way off the mark. And as a result, email systems have various ways of trying to protect themselves.

Delivery information

When email was first being introduced, the internet didn’t yet have a significant role, and most email was sent through a complex chain of hops between machines connected in weird and wonderful ways. As a result, the fact that an email was successfully sent out of your machine told you little about its eventual delivery. It could get lost or stuck at any point thereafter. As a result, the system provided a way for mail servers to communicate back to the sender with information about delivery or non-delivery well after the original send. Unfortunately, these mechanisms used the Return-Path information to identify where to send this information, and as we’ve mentioned, that may turn out to have been forged. The result will be that non-delivery notifications either clog up the recipients sending queue, or are returned to an innocent third-party who will quite understandably treat them as spam and may end up getting you blacklisted.

Note that notifications of successful delivery are less problematic – spammers who want to know if you received or read your spam need to provide a legitimate return address! (Although requesting a delivery receipt to the wrong address could be used as a DOS attack?)

So we can’t afford to send non-delivery notifications any more. How can we tell legitimate senders if something went wrong – like trying to send mail to the wrong email address ( typing errors etc )? There is only one way – we have to reject the message during the initial SMTP conversation – and leave it up to the sending server to know what to do.

This is fine, provided our email infrastructure can cope with the volumes we receive and still perform all the checks needed ( in particular anti-spam and virus checks ) within the time limits set by SMTP for a single conversation. But it means that the technique, still much recommended in howtos and other documents, of accepting mail first and then processing it for spam/viruses asynchronously using a queueing system is no longer able to inform senders when it rejects a mail, in case the sender was forged and the reject becomes backscatter.

For example, if you use postfix for mail with Amavis for spam/virus checking, you need either to run amavis as a milter ( meaning it does its work during the original smtp conversation ) or if you run it as a content_filter after accepting the mail, you must never BOUNCE messages, you have to DISCARD or reroute them.

SPF, DKIM and DMARC

As a result of the problems with SMTP, several attempts have been made to help resolve matters. Of these, at the time of writing, the only ones which have achieved widespread acceptance are SPF, DKIM and DMARC (which builds on the other two ). There’s lots of good stuff about them on the web but basically here’s the lowdown.

SPF

Spf works by having a domain publish a special DNS TXT record which tells anyone who wants to know whether email which claims to have come from that domain places any restrictions on the ip addresses of the servers sending the mail. In the simplest cases, this works well, because typically email is sent between organisations ( whether corporate or ISP ) in a single hop from the sender’s mail servers to the recipient’s. SPF is a way of telling the recipient that it can trust specific mail servers to belong to the sender. This doesn’t guarantee that the mail is valid or safe, but it does ensure that the owner of the domain trusts the sending machine to send on its behalf.

DKIM

DKIM also publishes a special DNS record. This allows an email message to be cryptographically signed, thus ensuring that key information it contains cannot be tampered with, and also allowing a recipient to check that the signer has the private key which proves that they are trusted by the owner of the domain’s DNS.

DMARC

DMARC builds on SPF and DKIM by allowing domains to publish a DNS record which tells recipients how to interpret the results of SPF and DKIM tests. One problem with SPF and DKIM is that since they are voluntary, a recipient cannot tell how to interpret an email which arrives without either – is this a forgery or just a domain which allows but does not enforce SPF or DKIM on its senders ( and there are legitimate reasons why a domain may not choose to use these techniques ). The idea behind DMARC is that a sending domain can publish a record which

tells recipients whether to accept messages without DKIM and SPF validation
allows ( but does not require ) recipients to send status reports back to the purported sending domain with information about emails it has rejected, to permit analysis.

By implementing DMARC for your domain, you provide some assurance to the recipient that mail you really did send was from you. And by checking the DMARC status of mail you receive, you benefit from that assurance. Again, remember that this doesn’t guarantee that the mail is safe, just that it almost certainly came from who it says it did.

Forward….

The techniques above are straightforward enough in the common case where an email is sent directly from the originating domain to the receiving domain ( for example from aol.com to gmail.com ). Unfortunately, this is by no means the only case. Some email systems act as forwarders – for example, I may have an email address at an organisation I belong to, and want mail sent there to be delivered to my gmail account. Another common situation is where I subscribe to a mailing list, which accepts email from subscribers and sends them on to a list of addressees.

Let’s look at basic forwarding first. Suppose fred@aol.com sends a mail to jim@myorg.com, and jim wants all his mail as jim@gmail.com aol uses dmarc, so the original mail will have a dkim record from fred@aol.com, and will be successfully received by myorg.com

Now myorg.com wants to send it on to gmail.com. If it just sends the message as-is, gmail will say: hmm, this message is from aol.com but aol’s spf policy doesn’t trust myorg.com to send its mail, and aol’s dmarc policy tells me to reject anything which fails its spf tests. So goodbye.

Fortunately, there’s a solution to this one. There’s a documented way of rewriting the sender’s email address called srs. This results in the original email appearing to come from myorg.com, so it can be accepted, but with the addressee in a special format which mail systems can ( and should ) unpack so as to recover the details of the original sender, whose legitimacy can be validated using that sender’s dkim information. srs seems to work nicely. In the case above, the final email received by gmail will contain:

 Return-Path: <SRS0=2pVA=K3=aol.com=fred@myorg.com>
 Received: from mail.myorg.com (mai.myorg.com. [1.2.3.4])
        by mx.google.com with ESMTP id pe3si23708903wjb.62.2015.10.23.02.01.58
        for <jim@gmail.com>;
        Fri, 23 Oct 2015 02:01:58 -0700 (PDT)
 Received-SPF: pass (google.com: domain of SRS0=2pVA=K3=aol.com=fred@myorg.com designates 1.2.3.4 as permitted sender) client- ip=1.2.3.4;
 Authentication-Results: mx.google.com;
       spf=pass (google.com: domain of SRS0=2pVA=K3=aol.com=fred@myorg.com designates 1.2.3.4 as permitted sender)  smtp.mailfrom=SRS0=2pVA=K3=aol.com=fred@myorg.com;
       dkim=pass header.i=@mx.aol.com;
       dmarc=pass (p=REJECT dis=NONE) header.from=aol.com

Note in particular that the dkim check is against the original aol message signed by aol, the spf check is against myorg.com and its mail senders, and the dmarc policy check is against aol.com

So this tells us that we can trust that this message came from aol.com originally, and that the system which sent it to us owns the server which did the sending.

Of course, if myorg.com were badly configured, it might be fooled into accepting an email from someone pretending to be aol, because it doesn’t check spf, or more likely doesn’t treat a dmarc fail as conclusive. But if it were to forward the fake message on, gmail may not know that it got there dubiously, but it would still know the original sender can be believed, because only aol could have signed it.