IEN 188 ISSUES IN INTERNETTING PART 3: ADDRESSING Eric C. Rosen Bolt Beranek and Newman Inc. June 1981 IEN 188 Bolt Beranek and Newman Inc. Eric C. Rosen ISSUES IN INTERNETTING PART 3: ADDRESSING 3. Addressing This is the third in a series of papers that discuss the issues involved in the design of an internet. The initial paper was IEN 184, familiarity with which is presupposed. In this paper, we will deal with two basic issues. The first has to do with the Network Access Protocol. It is concerned with the sort of addressing information which a source Host has to supply, along with its data, to a source Switch (gateway, in the Catenet context), in order to enable the Switch to get the data delivered to the proper destination Host. The second issue has to do with the question of how the Switches (both source Switch and the intermediate Switches) are to interpret and act upon the addressing information supplied by the source Host. We begin by stating generally the sort of addressing scheme we envision (which is by no means original), and by comparing it to the very different sort of addressing currently in use in the Catenet. Next we will discuss some of the issues and details that arise in considering how to make such a scheme work reliably. We will then show how this scheme lends itself quite naturally to the solution of certain problems which are very difficult to handle in the current Catenet architecture. Although addressing and routing are rather intimately bound up, - 1 - IEN 188 Bolt Beranek and Newman Inc. Eric C. Rosen we will avoid routing considerations here whenever possible. Routing in the internet will be the topic of a longer paper which will be the next to appear in this series. 3.1 Logical Addressing / Flat Addressing For maximum flexibility and robustness of operation, a source Host should be able to simply "name" the destination Host it wants to reach, where a "name" is just an arbitrary identifier for a Host. That is, the source Host should not need to know anything about the physical location of the destination Host, NOT EVEN WHAT NETWORK IT IS ON. In other words, the internet should have logical addressing. The advantages of logical addressing are thoroughly discussed in IEN 183, and that discussion shall not be repeated here. IEN 183 presents a logical addressing scheme which was designed with the ARPANET in mind. However, since we regard the internet as a Network Structure whose Switches are gateways and whose Hosts are generally multi-homed to the gateways, most of the ideas presented in IEN 183 can be carried over directly to the internet environment. The present IEN will emphasize those aspects of the logical addressing scheme which are specific to the internet environment, but the proposed scheme is basically the same as the one discussed in IEN 183. Anyone with a real interest in these issues will want to become familiar with that document. - 2 - IEN 188 Bolt Beranek and Newman Inc. Eric C. Rosen The basic idea of logical addressing is that a source Host should name the destination Host, and the Switches should map that name into a physical address that is meaningful within the Network Structure of the Switches. The mapping between names and (physical) addresses will, in general, be many-many. That is, one name may refer indeterminately to several distinct physical addresses, either because some one physical machine is multi-homed, or because the user does not care which of several physical machines he reaches. Similarly, one physical machine may have several names, which may either be synonyms, or may be used for further multiplexing within the destination Host. (This may be particularly important when a Host within one Network Structure is really a Switch, e.g., a port expander or local network, within another.) Logical addressing tends to result in a flat addressing space, rather than a hierarchical one. This may seem surprising in the context of the internet, since an internet is a hierarchical structure, and internet routing is almost certainly going to be some form of hierarchical routing. However, it simply does not follow that the addressing space used in the internet Network Access Protocol must be a hierarchical addressing space. In fact, since the form of the addressing space has an effect on the Network Access Protocol, and hence on Host-level software, whereas the routing algorithm is a purely internal matter to the Network Structure, proper protocol - 3 - IEN 188 Bolt Beranek and Newman Inc. Eric C. Rosen layering would seem to require that the form of the addressing and the form of the routing be independent. We would like to be able to change the internal routing algorithm of the Network Structure without requiring corresponding changes in Host software, i.e., without changing the form of the addressing. What we are proposing is quite different from the way addressing is done in the current Catenet Network Access Protocol, IP. IP uses both physical addressing and hierarchical addressing. (Note that physical addressing within a hierarchical Network Structure will almost certainly be hierarchical addressing, whereas logical addressing allows the internal structure of the Network Structure to be better hidden from the users. This is one of its main advantages.) The first component of the address is a network number, and the second component is a physical address which is meaningful within that network. In IEN 183, we discuss a number of reasons for the superiority of logical over physical addressing. Other criticisms of the Catenet's current addressing scheme have been voiced by other authors. For example, the way in which hierarchical addressing is incorporated into Catenet addressing mechanisms has recently come under criticism in IEN 177 by Danny Cohen, who focuses his criticism on the particular case of the ARPANET. His main criticism is that it does not allow enough hierarchical levels. That is, with the presence of local nets or port expanders which appear to the ARPANET as Hosts, there is really another level of - 4 - IEN 188 Bolt Beranek and Newman Inc. Eric C. Rosen hierarchy after the ARPANET. He suggests, therefore, that ARPANET addressing (1822-level) be changed to provide this additional hierarchical level, and that end-users (or at least Host software modules) fill in this additional level. It is not obvious, though, that a single additional level of addressing will do for all applications. If we are sending data not just to a local net, but to an internet of local nets, maybe several additional levels of hierarchy are needed. We may also need more hierarchy on the "front end" of the address. A protocol which begins the internet address with a field which is supposed to identify the destination network (e.g., IP) assumes that there is no need to establish a hierarchy among the networks themselves. (This is equivalent to assuming that all Switches can "know about" all networks.) As long as we have only a small number of networks, it may be reasonable enough to assume that destination network addresses need not themselves be hierarchical. However, it is not difficult to imagine a very large internet composed of thousands of networks, where before specifying a network, we must first specify, say, a continent. So maybe our protocol for hierarchical addressing needs a "continent address" field before the network address field. It begins to look as if the addressing structure needs to be INFINITELY EXTENSIBLE in both directions. In fact, in IEN 179 Cohen proposes a scheme which seems intended to provide this sort of infinite extensibility. That seems both an inevitable - 5 - IEN 188 Bolt Beranek and Newman Inc. Eric C. Rosen consequence of hierarchical addressing, and a reductio ad absurdum of it. It is also worth noting that a given number of Hosts can generally be addressed with fewer bits in a flat addressing scheme than in a hierarchical addressing scheme. Given, say, 32 bits of addressing, flat addressing can represent 2**32 Hosts. However, if these 32 bits are broken into four 8-bit fields, hierarchically, fewer Hosts can be represented, since in general, not every one of the four fields will actually take on the full 256 values. Inevitably, one finds that at least one field must take on 257 values, while at least one other turns out to have a smaller number of values than expected. This tends to lead to the feeling that the address field needs "just one more level" of hierarchy. It also tends to lead to the use of funny escape values and multiplexing protocols so that different fields can be divided up in different ways by different applications. The same problems usually reappear, however, in a few years, as the need for "just one more level" is proclaimed yet again. Yet the alternative of making the address fields arbitrarily long, hence infinitely extensible, is rather infeasible, if bandwidth considerations are taken into account. The need for infinite extensibility at the Host interface can be avoided by using logical addressing (although this is only one of its many advantages). We can then identify a single Host - 6 - IEN 188 Bolt Beranek and Newman Inc. Eric C. Rosen by using a single, structure-less, unique name which is meaningful at each level of internet hierarchy. That is, the Switches at each level of the hierarchy would be able to recognize the name, and to map it into a physical address that is meaningful at that level of hierarchy. Neither the end-user nor the source Host would be responsible for determining the physical addresses at each level of a never-ending hierarchy. Of course, neither these arguments, nor those of IEN 183, can be regarded as finally settling the "flat vs. hierarchical" issue. In networking, no one issue can ever be settled in isolation, and attempts to do so result only in endless and unproductive arguments. A network (or internet) is a whole whose performance and functionality result from the combination of its protocols, addressing schemes, routing algorithm, hardware and software architecture, etc. Particular addressing schemes can only be judged when it is seen how they actually fit into particular designs. The only real argument in favor of a particular addressing scheme is that it fits naturally into a network architecture which provides the needed functionality and performance. It is hoped that the addressing scheme we propose will be judged as part of the architecture we are developing in this series of papers, rather than in isolation. - 7 - IEN 188 Bolt Beranek and Newman Inc. Eric C. Rosen 3.2 Model of Operation: An Overview The model of operation we are proposing is as follows. A source Host submits a packet to a source Switch, naming (not addressing) the destination Host. THE SOURCE SWITCH THEN TRANSLATES (OR MAPS) THAT NAME INTO A PHYSICAL SWITCH ADDRESS WHICH IS MEANINGFUL WITHIN ITS OWN NETWORK STRUCTURE; THAT WILL BE THE ADDRESS OF THE DESTINATION SWITCH WITHIN THAT NETWORK STRUCTURE. The data is then routed through the Network Structure to the destination Switch so addressed. The name (logical address) of the destination Host is also carried through the Network Structure along with the data and the physical address of the destination Switch. When the destination Switch receives the data, it forwards it to the destination Host over (one of) its Pathway(s) to that Host. If the Pathway is itself a network or internet configuration with logical addressing, the name of the destination Host is passed on via the Pathway Access Protocol. If logical addresses or names are not unique across all component networks of an internet, translation from the internet logical address to the Pathway logical address would have to be done at this point. If the network or internet underlying the Pathway does not even have logical addressing, the Host name will have to be translated into a Pathway physical address by the destination Switch. - 8 - IEN 188 Bolt Beranek and Newman Inc. Eric C. Rosen Note that, at any particular hierarchical level (i.e., within any particular Network Structure), the ADDRESSABLE ENTITIES are the Switches at that level (which are physically addressed), and all the Hosts (which are logically addressed, or named). Component networks of the internet are treated as structure-less Pathways, AND NEITHER THE COMPONENT NETWORKS THEMSELVES NOR THE SWITCHES OF THE COMPONENT NETWORKS ARE INDEPENDENTLY ADDRESSABLE. Furthermore, a name (logical address) which adequately identifies the destination Host is present at each level of the hierarchy. Of course, a particular name only needs to be unique at a single level of the internet hierarchy, within a particular Network Structure. The names can change as we travel up and down the hierarchy of Network Structures that make up the internet. 3.3 Some Issues in Address Translation In order to do the sort of translation from logical to physical address that we have been discussing above, the Switches must have translation tables. Many of the issues involved in the design of a robust translation table mechanism are discussed in IEN 183, and much of that discussion applies without change to the internet. We will confine our discussion here, therefore, to issues which are not considered in that note, or which are more specific to the internet environment. - 9 - IEN 188 Bolt Beranek and Newman Inc. Eric C. Rosen The main problem with the model of operation we have proposed is a very mundane one, but unfortunately a very important one. If there may be thousands of Hosts on an internet, each one with an unlimited number of different names, and if a source Switch must be able to map any name to the address of a destination Switch, then each Switch will have to have a very large table of names to drive this translation function. By itself, this is not much of a problem. To be sure, in the past, it has been considered important to keep the gateways as small as possible. It now seems to be more generally accepted that the current Catenet gateways provide inadequate performance, and that building a robust operational internet system requires us to build Switches that are large enough to handle the required functionality at a reasonably high level of performance. We would expect Switches built in the future to be much larger than the current gateways are. However, it is one thing to require large tables, and quite another thing to require tables which may grow without bound. Since the number of Hosts on the internet may grow without bound, it does not seem feasible to require the Switches to have tables with one or more entries for each and every Host in the internet. If we cannot fit the complete set of translation tables into each Switch, a natural alternative is to turn the tables into a DISTRIBUTED DATA BASE, with each Switch having only a subset of the complete set of tables. For each Switch, there would be a - 10 - IEN 188 Bolt Beranek and Newman Inc. Eric C. Rosen subset of logical addresses for which the Switch would have complete physical addressing information. These logical addresses would fall into one of two classes: 1) Those logical addresses which refer to Hosts which are homed (in some Network Structure) directly to that Switch. 2) Those logical addresses which refer to distant Hosts which are in FREQUENT communication with the Hosts which are directly homed to that Switch. The logical addresses in these two classes are the ones for which the Switch will be most often called upon to do logical-to-physical address translation, and for best efficiency, the information needed to do the translation ought to be present in the Switches. For other logical addresses, which are less often seen, all that is needed is for the Switch to know where the address translation information can be found. Then if a packet with an infrequently-seen logical address is encountered, it can be forwarded to a place where the proper information is known to reside, or else the packet can be held while the information is obtained. (We may want to have a scheme which is a hybrid of these two alternatives. For example, packets with logical addresses that are not contained in the resident tables can be forwarded to a place with more addressing information, and this can in turn cause the needed addressing information to be - 11 - IEN 188 Bolt Beranek and Newman Inc. Eric C. Rosen sent back to the source Switch, so that additional packets with the same address can be handled directly by the source Switch. That is, the source Switch might maintain, in addition to its permanently resident tables, a cache of the most recently needed addressing information.) It is important to note that the two classes defined above may vary dynamically, and we may want a procedure for altering the members of those classes in some specific Switch depending upon the traffic that the Switch is actually seeing in real time. Unfortunately, any such scheme would seem to require the inclusion of at least one additional level of hierarchy in the addressing structure, since when a Switch sees a logical address for which it does not have complete information, it must be able to determine how to get that complete information. The scheme would be self-defeating if it meant that we had to have a table of all the logical addresses, with an indication for each one of which other Switch has the complete information. Rather, we need to be able to group the logical addresses into "areas", of which there will be a bounded number. Then each Switch will be able to keep a table indicating which other Switches contain the complete translation information for each area. This table of areas would then be the only part of the complete set of translation tables that had to be resident at ALL Switches. While this is much more feasible than requiring each Switch to keep a table containing - 12 - IEN 188 Bolt Beranek and Newman Inc. Eric C. Rosen all the logical addresses, it does means that the destination address provided by the source Host must include not only a destination Host identifier, but also an "area code" for that logical address. If we are going to organize the logical addresses of all internet Hosts into a relatively small set of "areas", we would like to find some means of organization which is fairly optimal. Unfortunately, there are a number of fairly subtle considerations which make this quite tricky to do. Certain intuitively attractive ways of organizing the internet into these areas will result in various sorts of significant and quite annoying sub-optimalities. Suppose, for example, we treated "area" as meaning "home network", much as in the present Catenet IP (where network number is part of the address that the Hosts must specify.) Then we would require all and only the ARPANET gateways to contain the logical-to-physical addressing information for the ARPANET Hosts, all and only the SATNET gateways to contain the tables for the logical addresses of the SATNET Hosts, etc. The user, in addressing a particular Host, would not only name it, but also name its "home network", and the source Switch would choose some Switch which interfaces directly to the home network of the destination Host from which to obtain the translation information. This method of organization, however, has several unsatisfactory consequences. One problem is that if any Host is on two "home networks", we want the Switches, not the Hosts, to - 13 - IEN 188 Bolt Beranek and Newman Inc. Eric C. Rosen choose which "destination network" to use. This is necessary if we want the routing algorithm to be able to choose the "best" path to some destination Host, and is really the only way of ensuring that packets can be delivered to a Host over some path, if one of the Host's home networks is down but the other is up. (This is jumping ahead a bit, since a full discussion of the "partitioned net" problem will not appear until section 3.4. The point, though, is that the choice of "home network" to use when sending traffic to a particular destination Host is a ROUTING PROBLEM, NOT AN ADDRESSING PROBLEM. Therefore it ought to be totally in the province of the Switches, which are responsible for routing, and not at all in the province of the Hosts, which must participate in the addressing, but not the routing.) Another problem arises as follows. Suppose we have adopted the scheme of sending packets for a certain area to a Switch in that area, depending on that Switch to do the further logical-to-physical translation. It is possible that when this further translation is done, we will find that the route which the packet travels from that Switch takes it back through the source Switch. This could mean a very lengthy and delay-producing "detour" for the packet. It might at first appear that this is not very likely. If a packet is going to some ARPANET Host, and we send it to some Switch which is directly connected to the ARPANET, surely we have sent it closer to its final destination, not further away. Unfortunately, that - 14 - IEN 188 Bolt Beranek and Newman Inc. Eric C. Rosen just is not necessarily true. Network partition or congestion may force a packet for an ARPANET Host to travel from an ARPANET gateway to a gateway (or series of gateways) outside the ARPANET, back around (through a potentially long route) to another ARPANET gateway. (Consider the partitioned net and the expressway problems.) In such cases, the Network Structure may already be in a condition of stress which is likely to result in below par performance. We do not want to make things even worse by adding any further unnecessary but lengthy detours just because we cannot keep all the addressing information at the source Switch. One way of helping to avoid these sorts of problems is to separate the notion of "area" from any physical meaning. The purpose of adding the notion of area to the logical addressing scheme is just to enable us to distribute the data base needed to do logical-to-physical address translation. There is no reason to suppose that the addressing information needed for some particular Host ought to be contained only in Switches that are "near" that Host. That would be a mistake. Rather, the addressing information ought to be somewhere which is "near" the SOURCE Host, not somewhere which is near the destination Host. This maximizes the chances that the necessary address translation will be done as soon as possible after the packet enters the Network Structure. The sooner we do the address translation, the more information we have which we can make use of to improve the routing of the packet, and the less likely any unnecessary detours will be. - 15 - IEN 188 Bolt Beranek and Newman Inc. Eric C. Rosen One might think that at least Hosts which are on the same home network should be grouped into the same area. This will work until the first time a Host is moved from one network to another. Since the area codes are given by the individual Host or user as part of the address in the Network Access Protocol of the internet, changing a Host's area code would involve changing Host-level software or tables, which has to be avoided. (Avoiding the need to make such changes when Hosts move physically is one of the main reasons for using logical addressing.) So we really have to think of "areas" as random collections of Hosts. What we are proposing is a truly distributed logical address translation table, rather than a scheme where each Switch maintains only local information. To make this more concrete, consider how this might be done in the Catenet. All the information about logical addresses which refer to Hosts on the ARPANET would be contained not only in all the gateways which are directly connected to the ARPANET, but also in a set of additional gateways which are uniformly scattered around the internet. Then, although the addressing information would not be in every potential source Switch, it would be somewhere close to every potential source Switch, and packets would not have to travel a long distance only to find out that they are going in the wrong direction. - 16 - IEN 188 Bolt Beranek and Newman Inc. Eric C. Rosen 3.4 Model of Operation: More Detail Let's assume that a source Host has given a message to a source Switch, with a logical address and an "area code" indicating the destination Host. If the source Switch does not have the complete address translation information in its tables, it will look in its table of area codes. The given area code will be associated in the latter table with some set of Switches (within the same Network Structure). The sequence of operations that we envisage is the following: 1) The source Switch picks one of these Switches, and sends the message to it. There must be enough protocol between these two Switches so that the chosen Switch knows that it is not the final destination Switch, but only an intermediate Switch, and that it is expected to complete the address translation and then to forward the message further. 2) The chosen Switch must be able to recognize the logical address of the destination Host, and associate it with one or more possible destination Switches. The message will be forwarded to one of these Switches. Furthermore, the addressing information can be sent back to the source Switch where it can be held in a cache in case the message is followed by a flood of additional messages for the same logical address. - 17 - IEN 188 Bolt Beranek and Newman Inc. Eric C. Rosen In the case where the source Switch does contain complete address translation information for the destination logical address, that logical address will be associated with some set of potential destination Switches. The source Switch will choose one, and send the message directly to it. Logical-to-physical address translation should be done by only one Switch; either the source Switch or the Switch chosen by the source Switch on the basis of the area code. There is no need to allow intermediate Switches to do any logical-to-physical address translation. (There is only one exception to this, namely the case where a message arrives at an intermediate Switch only to discover that the destination Switch chosen by the source Switch is no longer accessible. In this case, re-translation is the alternative to dropping the message entirely.) Remember that many Hosts will be multi-homed (in the internet, virtually every Host is multi-homed, since most networks will have at least two internet gateways connected to them), so that there will in general be more than one possible destination Switch. By prohibiting re-translation at intermediate Switches, we avoid the problems of looping that might arise if different intermediate Switches make different choices of destination Switch. As we shall see, this also simplifies our approach to the partitioned net problem, and at any rate, there is no great advantage to allowing intermediate Switch translation (cf. IEN 183). - 18 - IEN 188 Bolt Beranek and Newman Inc. Eric C. Rosen We suggested above that if a source Switch does not recognize a particular logical address, and hence must send a message to another Switch (as determined by the area code), the latter Switch should send the addressing information back to the source Switch, to be kept temporarily in a cache. We have to emphasize "temporarily." The source Switch should time out the addressing information which it keeps in the cache, and then discard it. If it later receives from any of its source Hosts any subsequent messages for the same destination logical address, it will have to reobtain the information. The reason for this is that it will be necessary, from time to time, to change the translation tables. It is not that hard to develop an updating procedure which ensures consistent updating of all Switches where the information about a logical address normally resides. But it might be more difficult to develop a procedure which ensures consistent updating of all the temporary (cached) copies of that information. Timing out the temporary copies of the addressing information will prevent out-of-date information from being preserved in inappropriate places. (Though the use of an out-of-date translation is not so terrible, since it would elicit a DNA message, rather than causing mis-delivery of data. See IEN 183 for details. In this sense, out-of-date information is self-correcting.) When either a destination Host name (logical address) or an area code maps into several Switches, the source Switch must - 19 - IEN 188 Bolt Beranek and Newman Inc. Eric C. Rosen apply some criterion to choose one from among them, since in general we will want to send only one copy of the message to its destination. (Though there may indeed be cases in which we want to send a copy of the message to each possible destination Switch, in order to increase the reliability of the system, or to be sure that we get the message to its destination Host as fast as possible.) There are several possible criteria that we might consider using: a) We might always choose the "closest" Switch, according to some particular distance metric (which might or might not be the same distance metric used by the routing algorithm). b) The list of potential destination Switches might have a "built-in" ordering, so that the first one is always used unless it is down, in which case the second one is always used, unless it is down, in which case the third one is used, etc. c) If the set of potential destination Switches has the right sort of topological distribution, we might try to round-robin them in order to achieve some sort of load-splitting. d) If we can obtain some information about the relative loadings of the various Switches, we can try to choose - 20 - IEN 188 Bolt Beranek and Newman Inc. Eric C. Rosen the one with the smallest load (to try to avoid causing congestion within the destination Switches), or we might try to trade off the increase in load that we will cause at the destination Switch with the distance we have to travel to get there. e) Certain possible destination Switches might be favored for certain classes of traffic (as determined by the "type of service" field, or by access control considerations). That is, certain destination Switches might be favored for interactive traffic, and certain others (with more capacity?) for bulk traffic. Or there might be administrative access control restrictions which prohibit certain classes of traffic from being sent to certain Switches. (This may be particularly applicable in an internet context where different Switches are under the control of different administrations. It is possible, though, to imagine applications of this sort of access control even in a single-administration Network Structure. For example, we might want to prohibit military traffic from entering certain Switches, in order to preserve capacity for important university traffic.) f) It is possible to combine some of the above criteria, e.g., choose the closest (i.e., shortest delay) Switch for interactive traffic and the most lightly loaded one for bulk traffic. - 21 - IEN 188 Bolt Beranek and Newman Inc. Eric C. Rosen Remember that in the internet case, all the Hosts on some network are considered to be homed to all the gateways on that network, so that in general most Hosts will be multi-homed, and the way we select the destination Switch could have a significant effect on internet performance. Of course, a destination Switch might itself have two or more Pathways to a particular destination Host. Perhaps the Switch is a gateway on two networks, and the Host is also on those two networks. Or perhaps the Switch is multi-homed onto the network of the Host. In such cases, a further choice remains -- the destination Switch must choose which of several possible Pathways to the destination Host it should use for sending some particular packet. Each (destination) Switch, therefore, will have to have a second logical-to-physical address translation table, which it accesses in order to choose the proper Pathway to a destination Host. This second translation table, however, contains information which is only useful locally. In addition to containing information needed to map the logical address onto one of the Switch's access lines, it must also contain any information needed in order to specify the address of the destination Host in the Pathway Access Protocol. In some cases, the logical address of the Host in its "home network" may be the same as its logical address in the internet, in which case no additional information is needed. If this is not the case, or if the "home network" does not have logical - 22 - IEN 188 Bolt Beranek and Newman Inc. Eric C. Rosen addressing, the local translation tables must contain information for mapping the internet logical address to an address (logical or physical) which is meaningful in the "home network." The issues of choosing one from among a set of possible Pathways according to some criteria are basically the same as those we have been discussing from the perspective of the source Switch, however. An interesting little issue: suppose that traffic for Host H can be sent to either Switch A or B, but that the route to Switch B contains Switch A as an intermediate Switch. Does this mean that the traffic should always be sent to A, rather than B? Not necessarily. Perhaps A has plenty of bandwidth available for forwarding traffic to other Switches, but only a little available for sending traffic directly to a Host. Or the Pathway from Switch A to Host H may itself have such a long delay that it is quicker to send the traffic through A to B and then on B's Pathway to H. While it may turn out to be very difficult to take account of such factors, we ought not to rule them out by a priori considerations, and we ought not to design a system in which such factors cannot be considered. A variant on this issue can arise as follows. Suppose Host H1 wants to send some data to Host H2, and H1 puts this data into the internet by submitting it to source Switch S. Now S will look in its address translation table to find the possible - 23 - IEN 188 Bolt Beranek and Newman Inc. Eric C. Rosen destination Switches for H2. Let's suppose that there are two such possible destination Switches, one of which is D, and the other of which is S itself. That is, S has a choice of sending the data directly to H2 (over a Pathway with no intermediate Switches), or of sending it to D, so D can transmit it directly to H2. Nothing in the proposed scheme constrains S to choose itself as the destination Switch. If we want, we can have S make the choice of destination Switch without taking any special cognizance of the fact that it itself is a possible destination Switch. Or we might even require that S not choose itself as the destination Switch. That is, when a gateway on the ARPANET, for example, gets some data from an ARPANET Host which is destined for another ARPANET Host, maybe we want the data to be sent through another gateway, rather than just sending it right back into the ARPANET. This possibility might be crucial to solving the "expressway" problem. While we are not at present making any proposals for allowing the internet to be used as an "expressway" between two Hosts on a common, but very slow, network, we are trying to ensure that nothing in our proposed addressing scheme will make this impossible. This is a very important difference between our proposed scheme and the scheme presently implemented in the Catenet, where a source Switch which is also a potential destination Switch is highly constrained to pick itself as the actual destination Switch. Of course, for this to work, there must be enough protocol so that a Switch which receives some data - 24 - IEN 188 Bolt Beranek and Newman Inc. Eric C. Rosen can know whether it is getting it directly from a source Host, or whether it is getting it from another Switch. When we say that a particular Host name maps onto a set of possible Switches, what we are really saying is that each member of that set of Switches has a Pathway to the Host. Remember the definition of "Pathway" -- a Pathway in Network Structure N between two Switches of Network Structure N or between a Switch and a Host of Network Structure N is a communications path between the two entities which does not contain any Switches of Network Structure N. The logical-to-physical address translation tables will not map a Host name to a particular set of destination Switches unless each of those Switches has a Pathway to that Host. But we must remember that at any particular time, one or more of these Pathways may be down. Before we apply the above criteria (or others) to the set of possible destination Switches in order to choose a particular one, we must first eliminate from the set any Switches whose Pathway to the destination Host is down. This is a non-trivial task which breaks down naturally into two sub-tasks. First, the destination Switch must be able to determine which of the Hosts that are normally homed to it is reachable at some particular time. Second, this information must be fed back to the source Switch. Each of these sub-tasks raises a number of interesting issues. - 25 - IEN 188 Bolt Beranek and Newman Inc. Eric C. Rosen In IEN 187, we discussed the importance of having a Pathway up/down protocol run between each Host and each Switch to which it is homed, so that a source Host can know which source Switches it has a currently operational Pathway to. Now we see the other side of the coin -- each destination Switch must be able to determine which Hosts it currently has an operational Pathway to. Many of the considerations discussed in IEN 187 apply here too, and need not be mentioned again. Basically, the Switch will have to run a low-level up/down protocol which relies on the network which underlies the Pathway to tell it whether a particular Host is reachable (e.g., the ARPANET returns an 1822 DEAD Reply to any ARPANET source Host which attempts to send a non-datagram message to an unreachable destination Host), and the Switch will also have to run a higher-level up/down protocol whereby it queries the Host and infers that the Host is unreachable if no replies to the queries are received. Of course, if some Pathway consists of a simple datagram-oriented network that provides no feedback to the source, then a higher-level protocol will have to be used alone. Assuming that the Switches have some way of determining whether their Pathways to particular Hosts are operational, we have the following subsidiary issue -- should these determinations be made on a regular basis, for all Hosts that might be reachable, or should they be made on an exception basis, with the information obtained only as needed? Let's consider the - 26 - IEN 188 Bolt Beranek and Newman Inc. Eric C. Rosen analogous operation in the ARPANET. In the ARPANET, the up/down status of each Host is maintained continuously, as a matter of course, by the IMP to which that Host is homed. This information, however, is not generally maintained at other IMPs. If a packet for a dead Host (on a live IMP) is submitted to some source IMP, the packet will always be sent to the destination IMP, which will (unless the packet is a datagram) return an 1822 DEAD reply. The source IMP receives the DEAD reply, signals it to the source Host, and then discards the information. IMPs do not maintain status information about remote Hosts, but the information is available to them as they need it (i.e., on an exception basis). On the other hand, each IMP always maintains complete, accurate, and up-to-date information about the reachability of each other IMP. Whenever any IMP goes down or comes up, this information is broadcast to all other IMPs in an extremely quick and reliable manner. If a source Host attempts to send a packet to a Host on an unreachable IMP, no data is sent across the network at all; the source IMP already knows that the destination IMP cannot be reached, and tells the source Host immediately. Why don't IMPs maintain regular status information about all ARPANET Hosts? It's not as if this is against the law, and under certain conditions, it might be advantageous to do so. However, the more entities about which regular status information is maintained, the more bandwidth (trunk and CPU) and memory must be - 27 - IEN 188 Bolt Beranek and Newman Inc. Eric C. Rosen devoted to handling the information. With a potentially unbounded number of Hosts being able to connect to the ARPANET, it does not seem feasible for all IMPs to maintain this status information for every Host. Fortunately, it just is not as important to maintain status information for Hosts as it is for IMPs. Status information about the IMPs is necessary in order to do routing, so failure to maintain this information regularly would degrade the routing capability, with a consequent global degradation in network service. Since Hosts, on the other hand, are not used for storing-and-forwarding packets, routing does not have to be so aware of Host status, and global degradations due to incorrect assumptions about Host status are less likely. If we can't expect ARPANET IMPs to maintain regular status information for each Host, we certainly can't expect internet gateways to maintain regular status information for each and every Host in the internet. In fact, in the internet, the situation is even worse. In the ARPANET, each IMP at least maintains regular status information about the few Hosts to which it is directly connected. This is simple enough to do, since the number of Hosts on an IMP is bounded (barring the introduction of local nets or port expanders) and there are machine instructions to detect the state of the Ready Line. However, we can hardly expect a gateway to maintain regular status information about all the Hosts on all the networks to which the gateway is directly connected. So we will suppose that in general, status - 28 - IEN 188 Bolt Beranek and Newman Inc. Eric C. Rosen information about the Hosts which are homed to a particular Switch will be obtained by that Switch on an exception basis, as needed. Of course, saying that this will be true in general does not mean that it must be universally true. If there are a few Hosts somewhere that are major servers with many many important users scattered around the internet, there is no reason why the Switches to which those servers are homed cannot maintain regular status information about those few Hosts. If the number of such special Hosts is kept small, this would not be prohibitively expensive, and if these Hosts really do handle a large portion of the internet traffic, this might be an important efficiency savings. If a source Switch knows that a particular destination Host logical address can be mapped to any of a number of destination Switches, then, as we have pointed out, it must be able to tell when, due to some sort of failure or network partition, the destination Host is (temporarily) unreachable via some particular Switch. It must have that information in order to be able to avoid choosing a destination Switch whose Pathway to the Host is non-operational. If we agree that the Pathway up/down status between a particular destination Switch and a particular destination Host which is ordinarily homed to it can only be obtained, on an exception basis, by that destination Switch itself, it follows that this information can also only be obtained by the source Switch on an exception basis. That is, - 29 - IEN 188 Bolt Beranek and Newman Inc. Eric C. Rosen the only way for a source Switch to find out that a particular Host can temporarily not be reached through a particular destination Switch is to send a message for that Host to that Switch. The destination Switch must then determine that it has no operational Pathway to that Host, and it must send back a control message to the source Switch informing it of this fact. (In IEN 183, we christened these messages "DNA messages", for "Destination Not Accessible.) The source Switch will store this information in its address translation tables, so that from then on it does not choose a destination Switch whose Pathway to the Host is down. (Of course, in addition to sending this control information back to the source Switch, the putative destination Switch should also try to forward the message it received to one of the other Switches to which the destination Host is homed.) This should work well, unless the Pathway between the original destination Switch and the destination Host comes back up. We must develop some way of informing the source Switch that that destination Switch is now once again usable as a destination Switch for that Host. A simple and robust way to handle this is as follows. When a source Switch is informed, according to the mechanism of the previous paragraph, that a particular destination Switch cannot reach a particular destination Host (without forwarding traffic through additional intermediate Switches), it marks (in its address translation tables) that Switch as UNUSABLE as a destination for that Host. However, this - 30 - IEN 188 Bolt Beranek and Newman Inc. Eric C. Rosen information is reset periodically, say, every few minutes. In effect, this approach would cause a source Switch which is handling traffic for that destination Host to query the destination Switch periodically to see if it has become usable again. Note that no special control message is needed for the querying. The querying is done simply by sending data addressed to the destination Host to the destination Switch. If the destination Switch is still unusable, no data is lost, since the data can be readdressed by the destination Switch and sent to some other destination Switch which does have an operational Pathway to that destination Host. Note also that with this scheme, not all source Switches will be in agreement as to which destination Switches can be used to reach which destination Hosts at some particular time. But this is not much of a problem, as long as address translation is done only once, and not re-done at each intermediate Switch. Further, any source Switch which tries to use the wrong destination Switch will be told, via a DNA message, to use another one. Lest there be any misunderstanding, we should emphasize that we are not proposing this as a general mechanism for determining which Hosts are homed to which Switches. That information is not to be obtained dynamically at all, but rather is to be installed in the translation tables at each Switch by the Network Control Center (or whatever equivalent of the Network Control Center we devise for the internet.) This mechanism is only used to - 31 - IEN 188 Bolt Beranek and Newman Inc. Eric C. Rosen determine that a Pathway which ORDINARILY exists between some Switch and some Host is TEMPORARILY out of operation. If a destination Host happens to be unreachable from EACH potential destination Switch (which will happen if the Host is down), this procedure will eventually result in the source Switch marking all potential destination Switches unusable. Once this happens, the source Switch should discard any data it receives which is destined for that destination Host, and should return some sort of negative acknowledgment to the source Host. The source Host can then try again, every few minutes, to send more data to the destination Host. Since the information marking a destination Switch as unusable (for a particular destination Host) is reset every few minutes, the source Host will be able to establish communication with the destination Host soon after it becomes reachable again. Strictly speaking, a negative acknowledgment from the source Switch is not required, and the current IP makes no provision for such a thing. Yet the information contained in the negative acknowledgment might well help the source Host to choose a suitable retransmission interval. If a destination Host is unreachable, it makes sense for a TCP to retransmit more infrequently than if the TCP has no information at all about why it is not getting any acknowledgments from the destination Host. Also, this information would be useful to the end-user (if the various protocol layers in his Host succeed in passing it back to him.) - 32 - IEN 188 Bolt Beranek and Newman Inc. Eric C. Rosen A user who is not getting any response from the system may want to take a different action if he knows his destination cannot be reached than if he thinks that the network (or internet) is just slow. This procedure, which is basically the same as the one we recommended (in IEN 183) for use with logically addressed multi-homed Hosts on the ARPANET, should resolve the partitioned net problem. Our approach is not dissimilar to one proposed by Sunshine and Postel in IEN 135. To quote them: A simpler solution to the partitioning problem follows the spirit of querying a database when things go wrong. Suppose there were another database listing networks and all the gateways attached to each net (whether up or down). This database would change slowly only as new equipment was added to the internet system. Further suppose that the gateways and internet routing are totally unaware of network partitions, except that gateways to partitioned nets find out when they cannot reach some Host on their own net. In this case, the gateway would return a Host Unreachable (through me) advisory message to the source. The source could then query the global database to get a list of all gateways to the destination net, and construct explicit source routes to the destination going through each of these gateways, trying each one in turn until it succeeded. Note, however, that our proposal does not require any source routing, because it is Switches (i.e., gateways) themselves which are the addressable entities in our scheme, rather than networks (though the authors quoted above were considering how to handle the problem in the current Catenet environment, rather than how to design a new environment). The database they propose can be identified with the translation tables we have spoken of. Also, - 33 - IEN 188 Bolt Beranek and Newman Inc. Eric C. Rosen our proposal handles the situation where a Pathway that was down becomes usable again, a case they don't seem to mention. It is sometimes claimed that hierarchical addressing requires less table space than flat addressing, since there is no need to have an entry in a translation table for each address. We can see now that this is not true. If we wish to be able to handle multi-homing, and in particular to handle the "partitioned net" problem, we need to maintain table space for the Hosts with which we are in communication. This is true no matter what kind of addressing scheme we adopt. Let's look now at how our scheme would handle the problem of mobile Hosts, i.e., Hosts which move from one network to another. We distinguish the case of "rapidly mobile" Hosts from the case of "slowly mobile" Hosts. A Host is slowly mobile if its move from one net to another can be made with enough lead time to allow manual intervention to update the logical-to-physical address translation tables. This case is handled simply by the presence of the logical addressing. When the Host moves to another network, it can still be addressed by the same name, but the translation tables are changed so that the logical address is now mapped to a different set of Switches. This creates some work for the internet administration and control center, but is completely transparent to higher level protocols, since the logical address does not change. On the other hand, we consider - 34 - IEN 188 Bolt Beranek and Newman Inc. Eric C. Rosen a Host to be rapidly mobile if it moves from one net to another too quickly or too frequently to allow the procedure of modifying the address translation tables to be feasible. If we can know in advance that there is some limited set of networks to which that Host might connect, we can map the logical address of that Host onto the set of all gateways which connect to any of those networks. Our procedure for choosing one gateway to use as the destination gateway might be as follows. Try the first gateway on the list. If a DNA message is received, try the second, etc., etc. Once a source gateway begins sending traffic for a mobile Host to a particular destination gateway, it should always continue to use that gateway, until it receives a DNA message, in which case it should try the next one. You will note that this procedure is very similar to that used for non-mobile Hosts. In fact, it might be entirely identical. The only possible difference is that we might want to be much more reluctant to switch from one destination gateway to another in the case of mobile Hosts than in the case of non-mobile Hosts, since we expect that a mobile Host will not generally be reachable through all of the potential destination gateways at every time. - 35 - -------