2009-05-01
Abstract
John Levine discusses the ways in which a DKIM-authenticated domain fits into a mail-handling system, and looks at related technologies that build on DKIM to help recognize good mail senders and deter phishing.
Copyright © 2009 Virus Bulletin
In last month’s article (see VB, April 2009, p.S1), we learned about the mechanics of DKIM, DomainKeys Identified Mail, a message authentication system that has recently been standardized by the IETF. DKIM allows a signer to add a DKIM-Signature header to a mail message. The header includes a hash of the message body and headers, and a cryptographic signature that can only be decoded by a key stored in the signing domain’s DNS. The recipient(s) of the message can check the signature to verify both that the message has not been modified since it was signed and, using the decoding key from the DNS, that it is a genuine signature from the signing domain.
It is surprisingly tricky to integrate DKIM into mail filtering setups, not because the authentication system is inherently hard to use, but because most current filtering technology is based on weeding out unwanted mail, and DKIM doesn’t fit into that model. DKIM will be most useful for recognizing mail from known good senders who sign their mail. This can be achieved by whitelisting mail with those senders’ signatures.
The key to effective use of DKIM is for each message to have a signature that the recipient recognizes. If a message has a signature with a d= domain that is included on a recipient’s whitelist, the recipient’s mail server can safely skip the spam filters and just deliver it. This works even when the d= domain doesn’t match the From: line header. For example, on my system, my users have about a dozen personal domains that they use in their outgoing mail, but I know them all and am confident that they will behave themselves, so I put a signature on all of the outgoing mail with my own domain. This lets them all benefit from a receiver’s single whitelist entry.
Large mail systems already maintain reputation databases that track the sources of good and bad mail. When an AOL user clicks the spam button to complain about a message, AOL updates its reputation database about the source of the message. Currently message sources are tracked by IP address, since that’s the only reliable source identity available, but as DKIM signatures become more common, they will become the preferred identity.
Domains are much better identity handles than IP addresses. They are more stable and don’t change if a sender switches ISPs, or if a change in circumstances requires new mail hosts with new IP addresses. They also provide a more useful granularity in situations where a group of senders share a small set of IP addresses, such as at a shared web host or email service provider. When a domain has been in use for a while and is widely recognized as having a good reputation, the domain’s value will increase and its owner will be encouraged to be careful with its mail to preserve that value.
A message may arrive with multiple signatures, some of which validate and some of which don’t. Which signatures should a receiver use? DKIM has not been around long enough to provide an answer based on experience, but if using it for whitelisting, the logical answer is to use the valid signature with the best reputation.
There are both innocent and malicious reasons as to why a message might have invalid signatures. Intermediate mail hosts may mutate a message in ways that break a signature, such as tidying up header lines, or it may have been sent through a mailing list (a topic addressed later).
Creating a correct DKIM signature involves some tricky programming, so a new signer may just have buggy code that sometimes generates broken signatures. A test event in late 2008, attended by many DKIM developers including several of those who wrote RFC 4871 [1], found a lot of obscure compatibility issues and generated 15 separate RFC errata clarifying parts of the spec [2].
Just as bad guys add fake Received: headers to spam to try to disguise the true source of their messages, they will also likely add fake DKIM signatures. Trying to guess the difference between innocent broken signatures and forged signatures would be just as hard as any other kind of heuristic spam detection, so recipients just ignore broken signatures.
Having limited our attention to valid signatures, the other part of the question is what to do if there are more than one. Imagine a message that has two signatures, the first of which is from someone you trust and the second of which is from someone you don’t. You can’t tell whether they were both applied at the same time, or whether they were applied sequentially as the message passed through different mail systems. But if you really trust the first signer only to sign mail you want to receive, why does it matter if the message passed through a bad neighbourhood on its way to you? Since the signature validated, you know that the message you got was the same one they signed, so the message should be good.
While large ISPs can afford to maintain their own whitelists and reputation databases, doing so is beyond the ability of small operators. For many years, receivers have used shared DNS blacklists of IP addresses that send unwanted mail – examples include the Spamhaus SBL and XBL. Shared blacklists of bad domains are unlikely to be useful, however, since the supply of domains is unlimited and bad guys will just discard any that appear on blacklists. On the other hand, good domains are quite stable, so shared whitelists of good domains will be of great use.
A small consortium called the Domain Assurance Council, (of which I was one of the directors) has designed a shared whitelisting system called Vouch by Reference (VBR)[3], which is scheduled to be issued as an IETF standards track RFC. VBR can be used with other authentication schemes such as Sender-ID, but we designed it to work with DKIM.
VBR provides a very simple way to publish a list of domains about which the publisher wants to make a positive statement. There is no standard term for what VBR does, but it is most often called certification. If a sender expects a certifier to vouch for its mail, it puts a VBR-Info: header into each message:
VBR-Info: md=bigbank.com; mc=transaction; mv=certifier.com:certifier-b.com;
The md= domain must match the d= domain on a valid DKIM signature on the message. The mc= field is a message category asserted by the sender. This can be ‘transaction’, ‘list’, or ‘all’, to say that the message is related to a transaction, that it has been sent to a mailing list, or that it is some other kind of mail. The mv= field is a list of domains of certifiers that the sender expects to vouch for them.
VBR publishers have a set of VBR records, one for each domain they certify. Each VBR record is just a DNS TXT record whose name is the certified domain, the token _vouch, and the voucher’s domain:
bigbank.com._vouch.certifier.com TXT “transaction”
The content of the record is a space-separated list of words from the set ‘transaction’, ‘list’, or ‘all’, to indicate that the publisher vouches for transactional mail, list mail, or all mail from that domain.
To check whether the sender of a message is certified, a receiver first ensures that there is a valid DKIM signature with an appropriate domain. Then it checks to see if any of the certifiers in the mv= list are certifiers that it trusts. (The mv= list is an optimization to avoid having to check certifiers that aren’t likely to have vouching data. Since bad guys can set up their own certifiers, receivers should only check certifiers they know.) Assuming there’s a good signature and at least one known certifier, then the receiver looks up the VBR record, and if it exists and its content agrees with the mc= category, the certifier has vouched for the sender. This sounds complex, but it is quite a fast process since it involves at most a single DNS lookup per publisher.
Some VBR publishers might want to assert that all the mail from the domains in its list will be worthy of delivery, but I expect VBR to be more useful to identify groups of organizations of a particular type. For example, the FDIC (the agency that insures banks in the United States) might publish a list of the domains of its member banks and vouch for their transactional mail. The FDIC can’t promise that its banks won’t send you unwanted ads, since that’s legal in the US, but they can at least assert that a signed message from a domain in their list is really from the bank and isn’t a phish.
Each receiving system can choose which VBR publisher to use to whitelist signed mail by domain, just as it chooses now which blacklists it trusts to block mail by IP. The current version of VBR effectively communicates, one bit per lookup, that a domain’s mail is good. We considered more sophisticated VBR data such as reputation scores, but decided that at this point we don’t understand reputation systems well enough to design a scoring system that would be broadly useful.
One of the most confusing application areas for DKIM is mailing lists. Some are ‘announcement’ lists, where all the messages are sent by a single party, while others are ‘discussion’ lists, where members can send in messages which are then passed on to the entire list. Each presents its own challenges.
Announcement lists, particularly those used for advertising, are often outsourced to specialist companies known as Email Service Providers (ESPs) which handle the mechanics of list management and delivery issues. Depending on the ESP, the mail may appear to come directly from the ESP’s client, with the involvement of the ESP visible only by looking at mail headers, or the ESP may co-brand the mail with the client. Large clients tend to do the former, small clients the latter.
In each case the ESP (usually with the client’s advice) needs to decide what signatures to put on each message, and for signatures in the client’s domain, how to manage the signature keys. For the relatively invisible ESPs, the signature is typically the client’s, which means that the DKIM validation key has to be installed in the DNS under the client’s domain. One way to do this would be for the ESP to generate the key records and give them to the client to install, but that doesn’t work well, since clients’ DNS management skills vary widely, to put it politely. A more workable approach is for the client to delegate part of their DNS tree to the ESP.
As a real example, online travel agent Orbitz uses ESP Responsys to manage its weekly online newsletter. The company has delegated the subdomain my.orbitz.com to Responsys’ name servers. The newsletters have a return address of [email protected], and the DKIM signature d= domain is also my.orbitz.com. This allows Responsys to handle all of the DKIM mechanics, while maintaining orbitz.com as the responsible party. In particular, if Orbitz were to switch ESPs, the company would take the reputation of my.orbitz.com with it, since it ultimately controls its delegation.
At the other end of the spectrum, Constant Contact is an ESP that provides a service to tens of thousands of mostly tiny businesses with small lists and small mailings. In their case, it makes sense to sign mail with both the client’s domain and constantcontact.com, since many individual clients will have too small a mail volume to get much of a reputation, while the ESP’s aggregate volume is large enough and its list management is good enough that many receivers would be willing to whitelist mail that it has signed. (I don’t believe it has worked out the mechanics of its client signing yet.)
Discussion lists present a different set of identity issues, since each message sent through such a list has a From: address of the original contributor, even though the list sent it to the list members. This has engendered a great deal of confusion among DKIM implementers. Some lists make few enough changes to the messages they pass through that a DKIM signature on incoming messages might still be valid when received by list members. DKIM includes a few features for list mail, such as an optional message length field which is intended to let recipients skip the footers that are added by list software. One theory says that list software should refrain from making any changes that will break signatures, so recipients can apply their reputation and filtering rules based on the original senders. This seriously misunderstands the way that DKIM works (in my opinion at least), and is unworkable with modern list software anyway.
It is a rare list package that doesn’t break the signatures on its messages. Something as simple as adding the list name to the Subject line will do so, and modern list software often rewrites list bodies, deleting attachments, turning HTML into plain text and vice versa, and in some cases such as Yahoo Groups, rewriting the HTML of the message to add a message footer. Fortunately, there’s no need to preserve the signatures on the messages, because for the recipients, the signature and reputation that matters is that of the list, not of the individual contributors.
When someone subscribes to a list, they do so (presumably) because of the list’s contents, and they depend on the list’s operator to control what mail the list sends. It is perfectly reasonable for the list management software to perform DKIM checks on its incoming mail as part of the process of deciding what messages to accept, but once a message is accepted, the list software puts its own signature on its outgoing mail, and that’s what the recipients use. Advocates of preserving incoming signatures ask ‘what if bad guys send forged mail to lists?’, to which the reasonable answer is that list managers will deal with it, just as they’ve dealt with other kinds of abuse over the past 40 years.
At the moment, most of this argument remains hypothetical since relatively few lists do anything with DKIM at all, but we are starting to see lists sign their outgoing mail with a list signature, which should encourage recipients to use that signature in their mail management.
Some domains are subject to heavy phishing attacks, some of the most notable examples being PayPal and eBay, and online greeting card sites like Blue Mountain and American Greetings. In the former cases the phish is trying to steal credentials, in the latter it is trying to trick users into clicking a link that will install malware on their PCs. If one knew that all of a domain’s legitimate mail were signed, it would be possible to reject some phishes by rejecting mail purporting to be from that domain but without a signature.
ADSP, Author Domain Signing Practices, is an add-on to DKIM that allows a domain to publish its practices and state that it signs all mail that includes its domain on the From: line (‘all’), or that it signs all of its mail and it considers itself to be a phishing target so it wants you to throw away unsigned mail (‘discardable’). ADSP is currently in the midst of design arguments. These are partly about its basic utility, since there’s little reason to believe that the domains that would publish ‘discardable’ ADSP would all be or even mostly be actual phish targets. The other arguments are over whether the i= field should be used in the signature to force it to match the entire From: address rather than just the domain.
Whether or not ADSP is published, there are a few heavily phished domains that really do sign all of their mail, paypal.com being the prime example. Recent versions of the popular SpamAssassin filtering package have an ADSP option that uses a short built-in list of phishing targets to mark unsigned mail as spam.
Discarding unsigned mail from phishing targets is unlikely to make much practical difference, since it’s easy to send phishes without using the target’s own return address. For example, with a From: line like the one below, many popular mail programs will display the PayPal address in the comment, rather than the actual rotten.biz return address that could be signed with an ADSP-compatible signature:
From: [email protected] <[email protected]>
Effective measures against phishing will depend on highlighting the good mail, perhaps with an enhanced VBR that shows a brand logo (e.g. ‘look for the golden dollar sign’ on mail from an FDIC member bank) so people come to understand that if it’s not highlighted, it’s not really from a bank, or from eBay, or from the greeting card company. DKIM can authenticate the real mail, but it’s just one part of a total package.
DKIM is an authentication system that provides an effective way to assign a stable identity to mail messages beyond ad-hoc identities based on IP addresses and message From: addresses. It shows signs of wide adoption, already being used by Yahoo, Google’s Gmail, and many email service providers. In combination with whitelists, certification and reputation systems, it will be a key tool to separate mail that recipients want from mail they don’t want.