2011-01-01
Abstract
Which sender authentication techniques work in real life? Which ones don’t quite measure up? Can we use authentication to mitigate spoofing? Can we use it to guarantee authenticity? And how do we authenticate email, anyway? Terry Zink concludes his series of articles with a look at Author Domain Signing Policies.
Copyright © 2011 Virus Bulletin
In the last article in this series (see VB, December 2010, p.12), we looked at how digital signatures in email are accomplished through the use of DomainKeys Identified Mail, or DKIM. While DKIM does have its niche applications, and is useful for whitelisting and identification in the positive case, one of the barriers to mass implementation is that it is less useful for detecting spoofing. This is because the protocol states that the failure of a DKIM validation should be treated as if there were no DKIM signature at all in the message. Since the receiver of a message doesn’t know whether or not a domain should even have a DKIM signature, the lack of one doesn’t indicate that a message is spoofed. There are any number of legitimate reasons why it might not have one. If a broken signature is just as ‘valid’ as the lack of a signature, then for most receivers who are primarily concerned about filtering spam, the usefulness of DKIM is minimal.
Author Domain Signing Policies, or ADSP [1], is an optional extension to DKIM that allows senders and receivers to specify whether a domain signs all of its mail or only some of its mail with a DKIM signature.
The protocol is simple. Any domain that signs with DKIM can publish one of three values in the dkim= field of its DNS txt record at _adsp._domainkey.<domain>.com:
unknown – the domain might sign some or all of its outbound mail. If a message from this domain arrives and is signed, it can be treated as authoritative. However, if a message arriving from this domain is not signed, then nothing can be concluded about the source of the message.
An ‘unknown’ result is similar to the neutral result of an SPF check. In SPF/SenderID, a neutral result means that the message should be treated as if it had no SPF record at all. You cannot use this result for detection of spoofing, only positive identification of authorized IPs. In a similar manner, the ‘unknown’ field means that you are only validating signed mail if it exists.
discardable – all mail from the domain is signed with an Author Domain Signature. The lack of a DKIM signature from this domain means that you can discard the message (mark it as spam, reject it during the SMTP transaction before the 250, etc.).
A ‘discardable’ result is similar to an SPF hard fail. SPF hard fails are strong assertions about the message and indicative of spoofing. Or rather, they are supposed to be indicative of spoofing, but there are many cases when legitimate messages hard fail SPF checks. With DKIM, if a message has no signature and the domain says it is discardable, you can safely reject the message. Regardless of the reason for the lack of signature, the message cannot be trusted and should be discarded.
Note that you can only discard the message for the lack of a signature. The protocol does not say that you should discard a message with a signature that does not validate.
all – all mail from the domain is signed with an Author Domain Signature. The case of ‘all’ is ambiguous. The protocol does not indicate what you should do with a domain that publishes ‘dkim= all’, but what action you should take when a message arrives without a signature. Did the sender forget to sign the message? If so, he hasn’t used the ‘discardable’ option to specify that a message without a signature can be discarded. If he was confident that he signed every single message, he would have used ‘discardable’. But he didn’t – so what should we do with this message?
In this case, the ‘all’ option is best treated in the same way as an SPF soft fail. The soft fail indicates that the message should be accepted, but can be used as a low weight in a spam filter, perhaps as part of the content filter. The best course of action is to do the same thing with any message that fails the ‘all’ test – the lack of a signature means it can be used as a low weight in the content filter. If it passes, then take the normal action that you would typically take.
The use of ADSP nets some advantages but also imposes some constraints on the sender. One of the flexibilities of DKIM is that it allows a sender to send on behalf of someone else. The signing domain is specified in the d= field in the DKIM signature header. Thus, the From: field can specify the domain that the sender wants the recipient to see in their inbox, while reputation checks can be performed on the signing domain. For example, if travel company Oceanic contracts out the sending of its marketing messages to Big Communications, Inc., Big Communications will sign the message and put ‘bigcommunications.com’ in the d= field, but ‘[email protected]’ in the From: field. The receiver performs the reputation check not on oceanic.com but on bigcommunications.com. Big Communications is taking responsibility for the quality of the message.
However, in order to use ADSP, the sender’s From: field must be the same as the domain in the d= field. The reason is that the signing policy must first be looked up before checking to see if the DKIM-Signature field exists. The mail receiver first performs the lookup for the domain in the From: field to see whether or not it has a signing policy, and if it does, the receiver extracts data from the DKIM-Signature field (if it exists). If it does not exist, then the action specified by the ADSP is applied.
If the From: field is different, then this defeats the entire point of ADSP. If the domain is different, then a spammer could spoof the From: address and then sign the message with a different signing domain. For example, suppose the From: field is [email protected], but the signing domain is d= spammer.com. If PayPal says that it signs every message, then to a mail receiver the signature in the DKIM-Signature field will check out because a spammer could easily set up his own public/private key pair.
Thus, if a domain wishes to use ADSP, then it cannot have mail sent by others on its behalf. (This is required by the protocol in RFC 5617 section 2.7; the domain in the d= field must match the domain in the From: field.)
This leads to some interesting nuances. Consider the following sequence of events:
Extract the domain from the From: address.
Extract the domain from the d= field in the DKIM-Signature field.
If they are the same, then check the ADSP record.
What about the first time a sender transmits mail to a receiver and doesn’t have a DKIM-Signature, but does have an ADSP record? In this case, the receiver would extract the domain in the From: field but would be unable to extract the domain from the d= field since the d= field doesn’t exist. If this sender has an ADSP record that says ‘discardable’, the receiver would be unable to discard the message because it can’t check the ADSP record as it has only one of the two required variables. This is important for the case of spoofing and phishing. If a phisher spoofs the From: address but does not include the DKIM-Signature, even if the spoofee has an ADSP record saying that unsigned mail should be discarded, this will not help the receiver. They don’t know what action to take because an unsigned message is just that – an unsigned message. The receiver must rely on traditional spam filtering techniques.
Thus, from a logistical point of view, in order to make use of ADSP a receiver needs the following:
It has first to see an actual, signed message from the sender and look up its ADSP record.
It must remember that original ADSP record and compare it to future messages that purport to be from the sender.
It must periodically update its memory of the ADSP record to check that it hasn’t changed.
From that point forward, all incoming messages for this particular sender are checked against this local copy and the ADSP policy enforced against it (DNS cache is one mechanism for achieving this, although it isn’t the only one).
As we have seen, the requirement for the d= field to match the From: field adds a serious constraint to DKIM. What sorts of senders benefit from using ADSP?
The most useful ADSP record is the ‘discardable’ record because it is the only one that allows a receiver to combat the problem of spoofing. But in order to use ADSP with this record, only the domain itself can send out mail for its brand – the organization cannot outsource any of its mail to marketers (or rather, if it does, it adds a lot of risk by lending its reputation to a third party).
Furthermore, the organization must control all of its outbound servers and have a strong sense of ownership. For a small organization located in one city or state, this isn’t difficult as uniform IT policies can be enforced across the organization. For global organizations with IT departments in multiple countries in multiple time zones, perhaps spread across multiple continents, this imposes logistical difficulties. It is non-trivial for an IT department to control and maintain physical resources spread across different locales. Servers get upgraded, personnel come and go, and software changes. People go on vacation and sometimes software upgrades are missed; if this happens and servers are misconfigured, then it means that legitimate mail coming from a single locale that is out of spec compared to the rest of the organization can end up being discarded. Global synchronization can be accomplished, but it takes a lot of time and effort. The larger an organization becomes, the tighter its security policies must be if it wants to use ADSP.
Financial institutions benefit directly from tight control over their brand identity and suffer greatly when it is abused. If a phisher spoofs a financial organization and succeeds in tricking the recipient into giving up their credentials, then that customer can lose funds. More and more these days, banks are starting to protect their users from phishing losses but this means that the bank (in many cases) absorbs the loss. However, it is not just banks that benefit from anti-spoofing; credit card companies also benefit. Credit card companies frequently advertise anti-fraud protection after the first $50 or so. Typically, they will fight with the vendors to get merchandise charges revoked, or they absorb the loss (or use insurance to offset the damages). But by ensuring that their identity is protected such that spam filters discard spoofed mail, they are reducing their vulnerability. No technology can wipe out the threat of spoofing, but organizations should be using whatever means they can to make it more difficult for phishers to trick their users.
Some of the most phished brands worldwide are PayPal, HSBC, Bank of America, and eBay [2]. These are all organizations that are strongly associated with money. They also send email communication to their user base directly instead of outsourcing it to a third party, and therefore, these organizations are excellent candidates for the implementation of DKIM and ADSP.
However, it’s not just money that attracts phishers and spoofers. Any organization that protects identity or is very popular and has a massive user base is a beneficiary of ADSP. For example, Facebook is one of the top 10 most phished brands. It has a user base of 500 million users. Malware authors target Facebook all the time and they do this because Facebook is so popular. The odds of being able to successfully trick at least one Facebook user with a spoofed email are fairly high. The odds shrink when ADSP is used because receivers can look up and see that Facebook signs (or could sign) all of its mail, and that mail that isn’t signed should be marked as spam.
In addition, organizations that have shared outbound IP space benefit from the use of ADSP. If one organization using the shared IP becomes infected with malware and starts to spoof another organization, then a recipient that only performs SPF checks cannot detect this as spoofing. The reason is that both organizations put the shared IP space into their SPF records, but a recipient cannot walk back through the Received headers to see which was the originating organization. Since both organizations use the same outbound IP space, the message will pass an SPF check even though it was spoofed. However, using ADSP, a receiver would be able to determine that the source of the message was not the organization it claimed to be.
Organizations that need to outsource their mail campaigns don’t benefit as much from ADSP. Without being able to assert ‘all’ or ‘discardable’, this forces organizations into the less useful ‘unknown’ option. (When I say ‘less useful’ I mean from an anti-spoofing viewpoint, not an authentication viewpoint.) An airline, a small, independent bookstore, a dance studio, or a flower shop each might decide to use an email service provider to carry out their marketing campaigns. This drives down the costs for the organization in question, but it prevents it from using a strong assertion about whether or not it signs mail for its domain. However, in these cases, it is not as important as it is for some other types of organization – an independent bookstore does not protect its users’ financial assets, nor does a flower shop know its customers’ social security numbers. While they all might have credit numbers or similar, they do not have the footprint of a large financial institution or a social networking site.
It ultimately comes down to a cost/benefit ratio. When an organization becomes large enough to attract the attention of spammers and phishers, the chances of its customers being phished, and how much that will cost the organization, needs to be weighed against bringing its email campaigns in house.
ADSP and SPF hard and soft fails accomplish similar things. How can they be used in conjunction with one another? What happens when we toss SenderID in there?
On the sending side, if an organization wants to use ADSP with ‘all’ or ‘discardable’, then it makes sense to complement it with SPF hard fails. The reason is that a hard fail implicitly states that you know all the IPs that you will ever send from. Ergo, this means that you have tight control over the mail that originates from your organization.
Since you know where your mail originates and want receivers to discard any mail from you that isn’t signed, there is a very good chance that you also know where all of your outbound IP addresses are that relay mail to the Internet. By using ‘discardable’, you cannot have anyone send mail on your behalf. This means that no one can ever send mail as you from anything other than your own email servers, and therefore you should use SPF hard fails in your SPF record.
On the flip side, if you use SPF hard fails, should you also use ‘all’ or ‘discard’? It’s actually fairly complicated, so let’s look at some possible combinations.
This is the easy case, the message is ‘doubly authenticated’. For a case such as this, you might treat the message the way you would treat any authenticated message – pass it through to the fast track of filtering, or simply collect statistics.
In this case, the receiver should mark the message as spam. Both cases of identity checks are failing and therefore the message should be assumed to be spoofed. It could be a misconfiguration on the sender’s side, but they are explicitly telling the world to reject the message.
This situation can occur when an organization has a new set of outbound IPs that they have just brought up but have not yet added them to their SPF record. However, they are signing with DKIM. This is possible if they have a role for their outbound servers (i.e. a pre-defined image that Operations needs to simply go and deploy), perhaps in a new data centre. In this case, the role for the servers already knows where to look up the private key and grab it to sign the mail. However, the SPF records have not yet been updated.
In this case, even though there is a contradiction between the permitted IPs in SPF records and the DKIM result, the results of a DKIM check are stronger than an SPF check, at least in the positive authentication case. Since DKIM is stronger, then the result of a DKIM pass should override the result of an SPF hard fail. What is important is that the mail originated from the organization, and DKIM asserts that.
This situation is similar to the above. An organization might start sending mail outside of its normal range of IP addresses but have previously added these new IP addresses to their SPF record (e.g. they had originally allocated more than they needed). However, what if they have not yet configured all of their new outbound mail servers to DKIM sign all of their mail?
Here, the ADSP fail should take priority. DKIM is stronger than SPF, but that’s not the main reason a receiver should reject the mail. As discussed previously, an organization might share outbound IP space with other organizations if they are using another service as a relay (such as an ISP). However, they won’t share their private key with other organizations. If another organization on the shared IP spoofs the MAIL FROM address of the first organization, this could result in an SPF pass. However, if the first organization uses an ADSP record, that check would then fail and a spoof would be detected. Since organizations that use ADSP are trying to protect against spoofing, and shared IP space is a weakness of SPF, then when an ADSP check fails and SPF passes, the mail should still be marked as spam or at least a very heavy weight applied to it.
When an organization asserts ‘all’ for its ADSP record, they are not issuing as strong a statement about whether or not they sign all of their mail. Should the receiver mark all mail from them without a DKIM signature as spam? Earlier, I said that a receiver should assign it a weight. I still stand by that, and if an SPF hard or soft fail occurs, then the two of them taken together should be assigned a heavier weight than if either one were to occur in isolation.
These are not the only scenarios but they do illustrate some possibilities for harmonizing the two technologies.
In this series on authentication, I have described the two major authentication technologies as well as the pros and cons of each. Ultimately, authentication is all about establishing identity and it is by no means limited to email; DNSSEC is another technology that is used to authenticate identity. One of the reasons why there is so much abuse on the Internet is because of the Internet’s inherent anonymity. When it was originally designed, its creators did not foresee that it would become as popular as it did. Thus, when they built it, they simply wanted to get it up and running as soon as possible and this meant people could send mail as anyone. However, because the ability to actually transmit mail (or perform any Internet transaction) was limited to a small set of people, abuse was relatively rare. To put it one way, geeks all trust each other to act ethically.
As the Internet grew, its vulnerability increased because malicious players also started to use it. They account for a small proportion of Internet users but they are able to do a lot of damage because of the inherent insecurity of the underlying protocols of the Internet. If the creators had to do it all over again, it is unlikely that they would allow the anonymity that is permitted today and they would likely implement some mechanism of identity.
However, the Internet is not only a technological phenomenon, it is also a cultural one; in particular it reflects western values (this next part represents my personal views). In the west, freedom of speech is one the most treasured values, and the Internet is viewed as a mode of communication. Thus, to the west, the Internet is seen as a mechanism of transmitting one’s points of view, be it for entertainment, economic or political purposes.
That last one is important because one of the United States’ values is the minimization of government lest it become too large. To US citizens (and, to a lesser extent, those of other western nations), the ability to speak out against repressive governments requires anonymity. The Internet is the perfect tool to communicate on a massive scale while still preserving that anonymity. Thus, if the technology sector ever wanted to end the anonymity of the Internet in order to end the scale of widespread abuse, they would encounter significant pushback from the political sector – both grassroots and organized movements. How are dissidents supposed to speak out against their governments without the safety of their anonymity?
The technology sector would claim that anonymity was not the original intent of the Internet. But regardless of whether it was intended, a lot of infrastructure and dependencies have been built up on top of anonymity such that its removal is virtually impossible. That is the current reality.
Will we ever see a cultural shift that allows us to value identity over anonymity? It’s difficult to say. It’s likely that with the deployment of IPv6, identity will be required by receivers in order to filter mail as IP blacklists will lose their effectiveness – the theoretical hiding spots for spamming IPs will become nearly infinite. Mail receivers will start pushing for everyone on IPv6 to start authenticating their mail so they can implement technical shortcuts for reputation filtering (perhaps allow instead of reject).
If email stays on IPv4 for many more years, and organizations send mail out of a common set of shared IPs, then identity will become even more valuable as receivers will insist that senders start signing their mail in order to avoid the collateral damage of blocking good senders who are forced to share an IP with bad senders. If that occurs, then technical requirements could force a shift in values.
In any case, the value of using SPF and DKIM, at this point, should be clear. On the sending side, by establishing your identity, you enable receivers to trust you and better facilitate communication between you and your recipients. On the receiving side, it allows you to differentiate organizations and apply different policies, and it can even help you detect spoofing. The time and effort spent implementing sender authentication can certainly outweigh its costs.
[1] ADSP is defined in RFC 5617, http://tools.ietf.org/html/rfc5617.