TOC |
|
Copyright © 2004 Internet Systems Consortium, Inc. All Rights Reserved.
The Domain Name System is nearly 20 years old. It has served its purpose well.
New Internet services place new demands on the DNS. It has been common in the past
to satisfy those demands by abject hackery: exploiting loopholes in its specification. We
propose that a more-sound implementation technique for a new service would be a
combination of using a new top-level domain (TLD) and using that TLD to identify
the principal service type, so that it would not be confused with existing services
such as the World Wide Web or email. While any Internet host can offer any service,
the mental association of service type with name has been found to be useful to
the people that the Internet ultimately serves.
1. Introduction
2. The Architect's Dilemma
3. System Architecture and the Domain Name System
3.1. Limits to the Domain Name System
3.2. On Naming Structures
3.3. Encoding Type Information in a Domain Name
3.4. Global services that do not use the Domain Name System
4. On Top-Level Domains
5. Conclusion
TOC |
The Internet is, conceptually, so simple and pervasive that it is easy to lose track of what it is and how it got that way. During its early formative years, the Internet was used primarily by its creators or by people who understood the principles behind its creation. By the advent of the 21st century, most users of the Internet could not distinguish it from Internet Explorer. If the Internet is to be extended and improved, we technologists face the challenge of preserving not the Internet itself, but preserving the popular illusion of what it is.
The Internet, by itself, is not useful. It is useful only because it enables various services and applications, which are themselves useful, and which could not have existed without the substrate of the Internet on which to be formed. It is easy for people whose exposure to the Internet has been limited to one or two of those applications to lose track of the difference between the Internet and the applications that it supports.
As the Internet continues its evolution and innovation, it is imperative that we who are helping to guide that evolution do not lose track of the principles that brought it to what it is today, nor to the difference between the Internet and its currently-popular services. While it is important that base Internet technology be able to support those services, we must collectively ensure that no one ever confuse the Internet with the services that it supports. Otherwise there is a real danger that the Internet will be doomed to stagnation, its evolution no longer determined by principle and innovation but by market pressures. We must not waver from the global vision, no matter how entrancing its local consequences.
One of the founding principles of the Internet was "We reject kings, presidents, and voting. We believe in rough consensus and running code."[1]. As the Internet matures, it develops more inertia, which is a lot like voting. If the Internet is not to stagnate, we (the Internet technical community) must ensure that we never ask for more complete approval than a rough consensus, and at the end of an experiment we never settle for less than running code.
While the true essence of the Internet is not its implementation but its guiding principles, we assert that its essential functional core (that which is not part of any service or application but which exists to enable all of them) is its small collection of protocols for naming, addressing, and routing. A name identifies something, an address tells us where it is, and a route tells us how to get there [2].
In this Technical Note, we focus on naming issues. Specifically, we focus on the role of naming issues in the innovation of new Internet-based services while preserving the essence of its being.
TOC |
Museums of anthropology often show the history of the use of tools, but rarely their misuse. The urge to solve a problem by any means that works, rather than by staying inside the design criteria of the tools and supplies, has been a human trait since the beginning of written history. It is therefore not surprising that software components and tools are misused.
It is not difficult to change the tire on an automobile without a jack, by using two stacks of bricks and a crowbar. The crowbar can raise the car enough that one of the bricks can be turned on its side to support the new height. Then the crowbar is moved to the other stack of bricks, and the incremental lifting is continued, back and forth, until the tire is off the ground. A person who used this technique to change a tire under emergency conditions would be called creative or resourceful. An automotive engineer who specified that a new car should contain some bricks and a crowbar instead of a jack would be called other words, less printable. This example is somewhat far-fetched in order that there be no question as to the correct solution to the design problem: of course one does not load up a new car with bricks and a crowbar. But in conventional system design, the architect faces similar decisions every day: should an existing mechanism be used, or an old one extended? If the old mechanism can be used without modification, how does one decide when it is time to create a new one?
The architect of any computer system is responsible for deciding how each of its components and tools will be used. If a system architecture is based on a quirk or flaw in one of its components, there can be no guarantee that the system will continue to work. Recent technology history is rife with examples of designers' depending on properties outside the specification of the component, including these oft-cited examples:
A good number of early architectural failures similar to these examples have been traced to issues of types and typelessness[6]. Programming languages that did not support strong typing encouraged programmers to use one built-in type to implement several different abstract types. The computer science community responded in the 1960s and early 1970s by developing strongly-typed languages, which allowed the programmer to make the design intent explicit[7,8]. Strongly-typed programming languages have not solved all architectural-failure problems, but requiring consistency in the use of data types has made programs more portable across platforms that previously would have been difficult, such as the 32-bit to 64-bit conversion mentioned above[9].
In following sections we argue that the use of loopholes in the Domain Name System is an inappropriate use, and suggest that it is within the scope of the Internet's basic design principles to encode primitive type information in a DNS reply. We then assert that if the DNS is used to encode type information, it should be through the front door, with an explicit revision of the protocol specification, rather than by using some quirk or loophole of the existing DNS specification.
TOC |
The Internet Domain Name System (DNS) was devised in 1987 [10] as a means of mapping host names to IP addresses that did not require central administration but could provide global uniqueness. Before that time, a single text file of name-to-address mappings was released at intervals by the Network Information Center, and all hosts on the Internet took copies of that file so that they could resolve names.
At the time that the Domain Name System was designed, there were three principal services offered on the Internet: telnet, ftp, and mail. It is therefore not surprising that the initial specification for the DNS was very suitable for those services. But the designers of DNS realized that the Internet would evolve to support other services, and therefore the DNS protocols were made extensible. A DNS lookup would return a set of records as an answer, rather than a fixed-format or fixed-type answer. Called Resource Records (RRs), they are the building block of DNS zones: stored in DNS servers and returned upon request to DNS clients. The initial set of RR types provided full support for those three applications, but the ability to add new RR types ensured that future applications could get expanded support from the DNS.
During the 17 years since its original design, the DNS has had new protocols added [11,12], new RR types added [13, 14], new supporting structures such as DNSSEC [15], and numerous implementations. Its fundamental character as a coherent distributed database with decentralized administration has not changed even though many of the details have changed. Used properly, it provides global uniqueness, global coherence, distributed authority, and arbitrary levels redundancy. There are, of course, various ways that it can be used improperly, under which circumstances it might not exhibit those characteristics. In the next section we discuss reasons why the DNS might be used improperly.
The design for any successful support mechanism will evolve to have limitations simply because, in being successful, a support mechanism must be tightly specified. Constant innovation (so much a part of the Internet since its inception) will broaden the demand on the mechanism to the point where a decision must be made as to whether the mechanism should be revised or a new mechanism created, leaving the old in place.
There is always room to revise and improve any protocol, any specification, any design. In order to improve anything, we must know which way is up and which way is down: not every change is an improvement; not every extension is an enhancement. To know how to improve anything, we must know its problems and limits, and we must know how to tell good from evil in the design process.
The most commonly identified limit to the Domain Name System is its lack of support for local or localized (context-sensitive) names. A local name is one that exists in some places but not others and can be freely duplicated without the worrying about conflicts. A localized name is one that exists globally, but resolves differently depending on the IP address of the resolving client. DNS was designed to provide a globally-unique naming system embedded in a coherent distributed database. All DNS names are global; this is an explicit design goal of the DNS [10].
Web-based systems such as Google [16] or Akamai's edge content system [17] that implement a form of localized name resolution are exploiting a loophole in the DNS specification. They use tiny Time-To-Live values to prevent result caching, thereby forcing all requests to come to authoritative servers, which can give different replies to different clients. While one can argue that this technique works, it is an example of exploiting an architectural quirk rather than staying inside the design specifications of DNS.
If DNS is to be used for local or localized names, support for them ought to be through the front door, designed into the protocol, documented in an RFC, and implemented consistently. Other improvements, to give better functionality, better performance, and less real-time dependence on root servers, ought to be designed in at the same time. But it would be a mistake to propose a package of improvements that took as long to reach consensus and fruition as, say, DNSSEC, a decade in gestation.
A naming structure, sometimes also called a naming scheme, is a plan for the systematic use of a portion of a DNS hierarchy. A historical example of a naming structure is the original plan for the use of the .US domain [18], in which .US names were structured according to the state, city, and/or county of the named object. Another example of a naming structure is the USENET hierarchy defined in the 1980s, by which newsgroups would be named according to their function and purpose (sci, rec, net, comp, talk, etc) and then hierarchically named within those categories. Both of these examples of naming structures were deemed to be failures, because the individual naming decisions were made by people who either did not understand the naming structure or who chose to ignore it.
Another example of a naming structure that has not lived up to its original goals is the top-level domain .GOV. Before its management was assumed by the US Government Services Agency in 1997, names defined in .GOV were at the choice of the registrant, which often did not know about the naming structure or chose to ignore it. As an example of the inconsistency in the implementation of this naming structure, the name BELMONT.GOV belongs to the City of Belmont, California; the name CA.GOV belongs to the State of California, and the name OHIO.GOV belongs to the State of Ohio.
The historical record shows that naming structures do not work very well unless they are rigidly enforced. Rigid enforcement requires centralization, and can be made to work within an organization, but it has not been possible to achieve compliance with any naming structure in which naming decisions are made in a decentralized fashion. The word "cesspool" in the context of the Internet usually refers to the chaos of ungoverned name spaces such as USENET (reserving the word "swamp" for another meaning in the realm of routing).
During the lifetime of the Domain Name System it has been called upon to support numerous architectures and protocols. For example, in about 1990 the World Wide Web first appeared as an Internet application, and it used the Domain Name System to implement references. Since a DNS name was globally unique, a World Wide Web hyperlink could be reliably assumed to be globally unique. Another example was the Internet fax system [19] that encoded destination telephone numbers in the TPC.INT domain.
In early use of the World Wide Web, many server operators wished to encode, in the domain name, a statement that the name identified a web site. Since the DNS protocol had no way of encoding type information, an informal convention developed that a name prefixed with "www." was a site on the World Wide Web, and could be expected to be running an HTTP server. Because the use of that prefix was never a standard nor even a formal convention, but just a habit, at best it served as a hint that probably the name identified an HTTP server. Many website owners chose to operate their servers so that the results were the same with or without the www prefix; for example, http://www.nytimes.com/ and http:/nytimes.com/ refer to the same website. But http://bbc.co.uk/ forwards to http://www.bbc.co.uk/, while http://www.news.bbc.co.uk/ is undefined; the BBC News website is accessible only at http://news.bbc.co.uk/. There is no consistency, because there has been no requirement for consistency. As a further example, http://w3c.org/ reaches the website of the W3C consortium, but http://www.w3c.org/ forwards to http://www.w3.org/, which is the same content as at w3c.org but a different domain name. (These observations were made on September 3, 2004, and may not still be true on future dates).
The use of the "ftp." prefix to denote an FTP server, while not as widespread as the use of the "www." prefix, is another example of encoding type within a domain name. Many Internet users consider secure HTTP sites, used for their banks and private email, to be another form of data type. The question as to whether or not a website is "secure" is typically answered by looking for a type indicator not in the site name, but in the access protocol. The secure site for Wells Fargo Bank is at https://online.wellsfargo.com/; its security is identified by the use of the HTTPS access protocol. There is no guarantee that http://whatever and https://whatever will deliver the same website; it is merely a convention.
There is a certain tendency for people to assume that the World Wide Web and the Internet are the same thing, but of course they are not. The World Wide Web is a service offered via the Internet. We presume that there will be other global information services delivered via the Internet, perhaps not as ubiquitous as the World Wide Web but still innovative and valuable. Working from the observations and arguments of the previous subsection, we move to a discussion of the possibility of top-level domains in encoding type information, since encoding it in other parts of domain names has not worked. We therefore offer, in the next section, the observation that type information can be more reliably encoded in a top-level domain.
There is no requirement that a successful global service be based on the Domain Name System. Perhaps the best-known example of such a service is Instant Messenger[20,21]. The name space for Instant Messenger was originally the name space for AOL subscribers, and remains closely bonded to it. Extensions and alternaties to the original Instant Messenger service typically work by including an @domain string at the right of the IM identifier, defaulting back to AOL's namespace in the absence of the @. This means that in multi-vendor applications produced for markets such as Instant Messenger, names are marked with their organizational or group identity--the identity of the issuing or owning organization--via a domain name. This use of a domain name to mark ownership of "domainless" names--those that do not make operational use of the DNS--is an important example of the use of a domain name as branding, and increases the brand value of the domain without a concomitant load on the name servers.
We have seen no evidence that there is any technical value in building an Internet service around names that are not part of the DNS. There may well be marketing or business-positioning reasons for doing so, but there is no credible technical reason for doing so. Maintaining the Instant Messenger namespace as names within an ordinary domain is, while taxing to many implementations of DNS, entirely feasible.
A centrally-controlled namespace, such as that used for Instant Messenger, is an effective business technique to lock out competition to an established base. The test for whether or not a namespace is centrally controlled is whether or not a randomly-chosen party with sufficient resources can define and implement a name in that namespace without the permission of the namespace owner. Internet names that are implemented within a domain that does not enjoy adequate freedom of name creation are effectively under central control, even though from a theoretical standpoint, DNS names are not centrally controlled.
TOC |
The original design for the structure of the Domain Name System [22] specified a number of top-level domains, such as .COM, .ORG, .NET, .GOV, and also permitted country-code top-level domains, such as .DE or .FR, based on the ISO 2-letter country codes. The original purpose of having multiple top-level domains was to divide the administrative burden of domain name management. RFC920 states:
Domains are administrative entities. The purpose and expected use of domains is to divide the name management required of a central administration and assign it to sub-administrations. There are no geographical, topological, or technological constraints on a domain. The hosts in a domain need not have common hardware or software, nor even common protocols. Most of the requirements and limitations on domains are designed to ensure responsible administration.
The domain system is a tree-structured global name space that has a few top level domains. The top level domains are subdivided into second level domains. The second level domains may be subdivided into third level domains, and so on.
The administration of a domain requires controlling the assignment of names within that domain and providing access to the names and name related information (such as addresses) to users both inside and outside the domain.
In other words, the design purpose of having multiple top-level domains was to associate with a domain the ownership or administrative information about it. The provisions for creation of new TLDs were based on the administrative structure of the named entities--"Multiorganizations"--and not on any characteristic of their content.
As new services are developed, it is increasingly seen as valuable to encode a small amount of type information in the domain name. The previous section of this Technical Note discusses this concept. Encoding type information in the portion of a domain name that is specific to its registrant has been found not to be reliable (see Section 3.2, above). Therefore it seems rational to explore the use of top-level domains to encode type information about a name. The alternative is either to assume that the Internet and the World Wide Web are the same thing (and thus all names identify websites) or to assume that individual registrants will be consistent in their use of naming conventions. Neither of those assumptions is valid.
If an innovative or experimental service is being developed for the Internet, and that service is not part of an existing service, e.g. not part of the World Wide Web, or of email, or ftp, or time service, it makes perfect sense to create a new top-level domain for naming of entities in that service. If the service ends up being a failure, the entire top-level domain can quietly fade away; if the service ends up being a success, then servers and clients will be able to bank on the name encoding the service type.
Since domains are intended to be administrative entities, and since the technical structure of the Domain Name System is the same at every point in its hierarchy, it stands to reason that the principal difference between implementing a new Internet service under a new Top-Level Domain or not would be administrative. Allocation of names within a TLD can be regulated by ICANN; allocation of names within a domain belonging to some specific organization is in general controlled by that organization. Regardless, therefore, of any technical considerations of using a new Top-Level Domain to brand a new service, its use is most likely to offer open access to everyone. If this assertion seems preposterous, here is a brief reprise of our derivation:
TOC |
In this Technical Note we have made three claims, which are related. The first claim that we made is that from an architectural standpoint, it is better to go through a formal modification process to update a specification, rather than to take advantage of loopholes in it. The second claim that we have made is that previous attempts to encode type information in domain names have failed because they are neither under central control nor part of any formal specification. The third claim is a prediction that encoding type information in a top-level domain is more likely to be effective than the previous techniques for this.
Taken together, these three points lead us to the conclusion that any innovative new service, one that is not part of the World Wide Web, and whose architecture might expect to be able to use local or localized names, should be in its own TLD and should, from the beginning, work to update the DNS protocol to support localization rather than exploiting weaknesses in the DNS specification to simulate localization.
[1] | Clark, David D. Proceedings of the 24th IETF, p 551, July 1992 |
[2] | Shoch, John F. Inter-Network Naming, Addressing, and Routing, 17th COMPCON, IEEE Computer Society, Fall 1978, pp 72-79. |
[3] | Josey, Andrew. Data size neurality and 64-bit support, in ;login:, December 1997 |
[4] | Berners-Lee, Tim. Why "www."? In Style Guide for online hypertext, W3C, 1999 |
[5] | Mashey, John R. The C "volatile" qualifier, in comp.arch, 23 May 1997 |
[6] | Kaplan, Marc A. and Ullman, Jeffrey D. A scheme for the automatic inference of variable types. JACM 27:1, p128, January 1980 |
[7] | Lindsey, C.H. A history of Algol 68. In 2nd HOPL, pp 97-132, April 1993. |
[8] | Kalsow, Bill . A history of Modula-3. Compaq Systems Research Center, 1995. |
[9] | Koltashev, Andrey. A practical approach to software portability based on strong typing. In Modular Programming Languages, Springer-Verlag, 2003. |
[10] | Mockapetris, Paul. Domain names -- concepts and facilities. RFC1034, November 1987. |
[11] | Vixie, Paul et al. Dynamic Updates in the Domain Name System (DNS UPDATE). RFC2136, April 1997. |
[12] | Vixie, Paul. A Mechanism for Prompt Notification of Zone Changes (DNS NOTIFY). RFC1996, August 19996. |
[13] | Eastlake, Donald III. Secret Key Establishment for DNS (TKEY RR). RFC2930, September 2000. |
[14] | Mealling, M and Daniel, R. The Naming Authority Pointer (NAPTR) Resource Record. RFC2915, September 2000. |
[15] | Eastlake, Donald. Domain Name System Security Extensions. RFC2535, March 1999. |
[16] | Page, Larry. Google Technology Overview. |
[17] | Leighton, Tom. Akamai Edge Platform. |
[18] | Cooper, A and Postel, J. The US domain. RFC1480, June 1993. |
[19] | Rose, Marshall and Malamud, Carl. An experiment in remote printing. RFC1486, July 1993. |
[20] | AOL Instant Messenger. |
[21] | Dornfest, Rael. The AOL Protocol. Developer Weblog, O'Reilly Publishing, 2001. |
[22] | Postel, Jon and Reynolds, Joyce. Domain Requirements. RFC920, October 1984. |
TOC |
Brian Reid | |
Internet Systems Consortium, Inc. | |
950 Charter Street | |
Redwood City, CA 94063 | |
US | |
Phone: | +1 650 423-1327 |
EMail: | Brian_Reid@isc.org |