2012-03-01
Abstract
In early February, a new security project known as DMARC (Domain-Based Message Authentication, Reporting and Conformance) hit the headlines. The project involves some of the best known companies on the Internet and attempts to reduce email-based abuse by solving a couple of long-standing issues related to email authentication protocols. John Levine - who is, among other things, the designated liaison between DMARC and MAAWG - has all the details.
Copyright © 2012 Virus Bulletin
In early February, a new group called DMARC (Domain-Based Message Authentication, Reporting and Conformance) received a great deal of press attention. Some of the breathless reporting suggested that this was the FUSPP (Final, Ultimate Solution to the Phishing Problem) – needless to say, it isn’t. DMARC is a modest, but interesting security project involving some of the best known companies on the Internet. (See http://www.dmarc.org/ for background information, a list of participating organizations and the current draft spec.)
Some of the big names involved in the group include: Google, Yahoo!, Microsoft, AOL, Comcast, Facebook, American Greetings, LinkedIn, PayPal, Bank of America and Fidelity Investments, along with infrastructure companies such as Cloudmark and Return Path. Having all these big gorillas on board means that whatever DMARC does is likely to have fairly widespread adoption. Google is already checking DMARC and sending status reports (described in more detail later).
Phishing is a huge problem for the institutions that are targeted in phishing campaigns, and indirectly for ISPs whose users fall for them and who have to help clean up the mess. Authentication schemes, notably DKIM and SPF, now provide tools to verify that a message was sent by the apparent sender (or more specifically, from a certain domain), but until now the ability to use that knowledge to deter phishing has been limited.
Part of the problem is that SPF and DKIM offer (by design) only limited tools for handling phishy email. They can tell recipients that they authenticate all their mail (the SPF -all option, and DKIM ADSP all and discardable), but that doesn’t translate directly into useful advice for receivers. Furthermore, most large senders of emails will have a hodgepodge of sending systems, and it is a challenge to achieve 100% authentication coverage across all those systems. DMARC provides some support for senders with less than perfect authentication, and provisions for feedback so they can see how they’re doing.
DMARC limits itself to what it calls ‘domain phishing’ – that is, phishes that use the exact domain name of the target, such as paypal.com or americangreetings.com. A lot of phishes use ‘cousin’ domains, which are similar but not identical to the target. I asked some of the DMARC group whether cousins would make DMARC irrelevant, and they told me that a surprising fraction of phishes actually use the exact domain. Since domain phishes are a technically much more tractable problem than cousins, that’s where DMARC is starting.
DMARC consists of three interrelated parts: an authentication framework, a way for domains to publish their policies, and a system for receivers to send feedback to senders. The draft specification (which is on the DMARC website at http://www.dmarc.org/) is subject to change, although I don’t expect it to change much.
The only identifier that DMARC authenticates is the domain of the address on the From: line, not Sender:, Resent-From:, or anything else. There are two ways to authenticate that domain, SPF and DKIM. The domain is authenticated if there is a successful SPF or DKIM check of a domain that matches the From: domain.
Authenticated domain matches can be either strict or relaxed, as determined by the sender. A strict match is an exact match – if the return address is [email protected], the authenticated domain must be mktg.bigbank.com. A relaxed match only requires that the ‘organizational domains’ match. Roughly, that is the domain at the level at which it was registered with an external registry – such as bigbank.com or bigbank.co.uk. While there is no exact way to identify organizational domains, in practice it seems unlikely that this will be a problem since there aren’t a lot of major phishing targets in domains with obscure registration points.
For SPF authentication, the receiver makes the usual SPF check on the envelope MAIL FROM address. If the check passes, and the domain in that address matches the one in the From: line, the domain is authenticated.
For DKIM authentication, the receiver performs the usual DKIM validation of any DKIM signatures on the message. If a valid signature has a d= domain that matches the one in the From: line, the domain is authenticated.
A From: domain is authenticated if any of the authentication methods (just SPF and DKIM at this point) succeed. There’s no way for a sender to state which methods it uses – if it doesn’t use one, it won’t publish verification records so the method will fail, but that doesn’t matter if another method succeeds.
Senders can publish DMARC policy records to describe their signing policy, offer advice about what to do with mail that fails authentication, and ask for feedback reports.
A domain’s DMARC record is a DNS TXT record named _dmarc.<domain>, where the domain is the domain in the From: line of mail the sender sends. The format of a DMARC record is similar to that for a DKIM record. Here’s one of mine:
v=DMARC1; p=none; rf=afrf; rua=mailto:[email protected]; ruf=mailto:[email protected]
It starts with a version tag, followed by a list of tag=value clauses. The p= tag must come first. Others are optional and can appear in any order. P stands for policy and indicates what the sender wants receivers to do with unauthenticated mail. The options are none, quarantine and reject. Quarantine is a request to turn up the filters, put the unauthenticated message into a spam folder, or otherwise treat it with extra scepticism, but still accept it. Reject is a request to reject the message at the end of the SMTP session, and not deliver it at all. None, which is the default, indicates that the receiver should handle the message however it would have been handled otherwise. It’s up to a receiver how much attention it pays to the sender’s suggestions, if any, since there’s no way to tell whether an unknown sender’s policy statement accurately represents what the sender really does. (This is a well known problem for SPF -all and ADSP discardable.) An optional sp= tag has the same values as p=, to be applied to subdomains.
To manage authentication, the aspf= and adkim= options specify whether to use relaxed or strict domain matching on SPF and DKIM, respectively.
The DMARC spec is a little vague about which DMARC record(s) a receiver should look up if the domain in a From: line or in the SPF or DKIM check is a subdomain of an organizational domain. That is, if the From: address is [email protected], the receiver looks up _dmarc.sales.example.com, but if that’s not found the receiver is then supposed to look up the organizational domain, _dmarc.example.com. Or, if the From: domain is [email protected] and DKIM is d=sales.example.com, and _dmarc.example.com isn’t found, it’s not clear whether the receiver is supposed to look up _dmarc.sales.example.com. The draft spec mentions using DNS wildcards, but _dmarc.*.example.com doesn’t do what one might hope. There are ways around this, but none is particularly elegant.
Since a site sending a lot of mail may take a while to get its authentication under control, two clauses in the policy record allow senders to try out policies while limiting the damage if they’re wrong. The pct=NN clause specifies that the DMARC policy should be applied only on NN% of incoming mail, e.g. pct=5 would check and potentially quarantine or reject only every 20th message from the domain. The pct= clause doesn’t affect reporting; any reports are supposed to include all mail received.
The rest of the DMARC spec is about receivers sending reports back to senders – both reports of individual authentication failures and daily (or more frequent) aggregate reports. In the _dmarc record, a sender can include an ruf=URI tag to tell receivers where to send individual failure reports, and an rua=URI tag to tell them where to send aggregate reports.
Individual reports can be in either IODEF (RFC 5007) or AFRF (Authentication Failure using ARF, still in draft form [the current draft of AFRF, the spec for ARF authentication failure reports, is at http://tools.ietf.org/html/draft-ietf-marf-authfailure-report. It is likely to become an RFC in mid 2012). My impression is that most reports will be AFRF, since it is specifically designed to include elements needed to diagnose an SPF or DKIM failure.
Aggregate reports take the form of XML files compressed into a ZIP file, because reports for busy domains can be quite large. They are normally sent once a day, but the ri=NN tag can be used to request a reporting interval of NN seconds, such as ri=3600 for hourly reports. The XML includes a copy of the fields from the _dmarc record used to generate the report, together with a summary of all the sources that sent mail with the domain’s From: address and the authentication results. Google is now sending daily reports – so far, it is the only receiver to do so. In one of my more heavily forged domains, a daily report included 672 entries, each of which was an IP address that sent one or more (often many more) messages purporting to be from my domain, along with information about DKIM signatures and the MAIL FROM domains checked by SPF, and what Gmail did with them. The reports are voluminous, and not easy for humans to read, but they are eminently suited to being parsed and put into a database. They can help to find both people forging one’s domain, and equally important, legitimate mail that failed to authenticate.
The spec allows reporting URLs to be either mailto: (to send the report as a mail attachment) or http: (to upload it to a web server). At this point, Google only supports mailed aggregate reports, and as far as I can tell, nobody is sending failure reports at all. I’ve published DMARC records for most of the domains that my mail server handles, and have received lots of aggregate reports from Google, but no individual reports yet.
DMARC is a work in progress, but an interesting one. The aggregate reports are worth getting, and I’d encourage anyone who cares whether their mail is delivered to publish a _dmarc record to collect daily reports. Most senders should publish a p=none policy (don’t do anything special when the mail arrives, just send reports).
A few parts of DMARC still need to be cleaned up. One of those is the issue of subdomains and wildcards, as I mentioned above, to clarify what policy records apply to what subdomains.
Currently, a sender can put any email address or URL into the ruf= or rua= clauses, which offers a way to remotely mail bomb someone. My DNS server currently handles DNS for about 50 domains, so I’ve published 50 _dmarc records and get 50 daily reports from Google every morning. That’s fine, since I want the reports and they go to a special mailbox I set up, but if I accidentally or deliberately misdirected the reports, and added ri=3600 to the _dmarc records so that the reports went out hourly, that could send over a thousand messages a day to an unwilling recipient. This is straightforward to fix, either by requiring that reports be sent back to the same domain as they’re about, or by providing a way for the targets of the reports to publish their own DNS records to say that they want them. Since the reason they allow arbitrary addresses is probably to make it easy to send reports to third-party analysis services, the latter fix is more likely.
DMARC is designed to be extensible, so it’s possible that other authentication schemes will be added, perhaps S/MIME, as well as finer-grained reporting. A huge gap, which the DMARC group acknowledges, is that it deals only with exact From: domain matches. If a message comes from [email protected], there’s no way to tie that to a policy published by bankofamerica.com. Also, many mail programs display the From: line comment rather than the address, allowing spoofs like
From: PayPal Security <[email protected]>
These are vastly harder problems to address, so it makes sense that DMARC is starting with the low hanging fruit. It may well turn out that those problems are insoluble, and the only way to separate the real from the fake is to keep manual whitelists of known legitimate domains, put a gold star next to authenticated mail from them, and try to teach users that if it doesn’t have a star, it’s not your bank. But in order for that to happen, mail has to be authenticated in the first place, and DMARC is a small step towards making authentication work.