我是靠谱客的博主 听话飞机,最近开发中收集的这篇文章主要介绍4-The Network Layer,觉得挺不错的,现在分享给大家,希望可以做个参考。

概述

Please indicate the source: http://blog.csdn.net/gaoxiangnumber1

Welcome to my github: https://github.com/gaoxiangnumber1

4.1 Introduction

  • Figure 4.1 shows a network with two hosts, H1 and H2, and several routers on the path between H1 and H2. Suppose H1 is sending information to H2.
  • The network layer in H1 takes segments from the transport layer in H1, encapsulates each segment into a datagram(a network-layer packet), and then sends the datagrams to its nearby router, R1.
  • At the receiving host, H2, the network layer receives the datagrams from its nearby router R2, extracts the transport-layer segments, and delivers the segments up to the transport layer at H2.
  • The primary role of the routers is to forward datagrams from input links to output links. Except for control purposes, routers do not run application and transport-layer protocols.

4.1.1 Forwarding and Routing

  • The role of the network layer is to move packets from a sending host to a receiving host. Two important network-layer functions can be identified:
  • Forwarding refers to the router-local action of transferring a packet from an input link interface to the appropriate output link interface.
  • Routing refers to the network-wide process that determines the end-to-end paths that packets take from source to destination. The algorithms that calculate these paths are referred to as routing algorithms.
  • Every router has a forwarding table. A router forwards a packet by examining the value of a field in the arriving packet’s header, and then using this header value to index into the router’s forwarding table. The value stored in the forwarding table entry for that header indicates the router’s outgoing link interface to which that packet is to be forwarded.
  • Depending on the network-layer protocol, the header value could be the destination address or an indication of the connection to which the packet belongs. Figure 4.2 provides an example.

  • The routing algorithm determines the values that are inserted into the routers’ forwarding tables. It may be centralized(e.g., with an algorithm executing on a central site and downloading routing information to each of the routers) or decentralized(i.e., with a piece of the distributed routing algorithm running in each router). In either case, a router receives routing protocol messages to configure its forwarding table.
  • We use term “packet switch” to mean a general packet-switching device that transfers a packet from input link interface to output link interface, according to the value in a field in the header of the packet. Some packet switches that base their forwarding decision on values in the fields of the link layer frame are called link-layer switches(Chapter 5). Other packet switches that base their forwarding decision on the value in the network layer field are called routers. Routers are network-layer(layer 3) devices, but must also implement layer 2 protocols as well, since layer 3 devices require the services of layer 2 to implement their(layer 3) functionality.

Connection Setup

  • Except two functions(forwarding and routing), in some computer networks there is a third network-layer function, connection setup.
  • Some network-layer architectures(ATM, frame relay, and MPLS) require the routers along the chosen path from source to destination to handshake with each other in order to set up state before network-layer data packets within a given source-to-destination connection can begin to flow. This process is called connection setup in the network layer.

4.1.2 Network Service Models

  • The network service model defines the characteristics of end-to-end transport of packets between sending and receiving end systems.
  • In the sending host, when the transport layer passes a packet to the network layer, specific services that could be provided by the network layer include:
  • The following services could be provided to a flow of packets between a given source and destination:
  • The Internet’s network layer only provides best-effort service. Table 4.1:

  • The ATM network architecture provides for multiple service models, meaning that different connections can be provided with different classes of service within the same network.
  • Two important ATM service models are constant bit rate and available bit rate service:
  • Constant bit rate(CBR) ATM network service.
  • Available bit rate(ABR) ATM network service.

4.2 Virtual Circuit and Datagram Networks

  • A network layer can provide connectionless service or connection service between two hosts. In all major computer network architectures, the network layer provides either connectionless or connection service, but not both. Providing only a connection service is called virtual-circuit(VC) networks; only a connectionless service is called datagram networks. Virtual-circuit and datagram networks are two fundamental classes of computer networks.
  • Differences between network-layer connection/connectionless services and transport-layer connection/connectionless services:
  • Network layer services are host-to-host services provided by the network layer for the transport layer; transport layer services are process-to-process services provided by the transport layer for the application layer.
  • The implementations of connection service in the transport layer and in the network layer are different. The transport-layer connection-oriented service is implemented at the edge of the network in the end systems; the network-layer connection service is implemented in the routers in the network core as well as in the end systems.

4.2.1 Virtual-Circuit Networks

  • Network-layer connections are called virtual circuits(VCs). A VC consists of
  • A packet belonging to a virtual circuit will carry a VC number in its header. Because a virtual circuit may have a different VC number on each link, each intervening router must replace the VC number of each traversing packet with a new VC number that is obtained from the forwarding table.

  • Consider the network shown in Figure 4.3. The numbers next to the links of R1 in are the link interface numbers. Suppose that Host A requests that the network establish a VC between itself and Host B, also assume the network chooses the path A-R1-R2-B and assigns VC numbers 12, 22, and 32 to the three links in this path. When a packet leaves Host A, the value in the VC number field in the packet header is 12; when it leaves R1, the value is 22; and when it leaves R2, the value is 32.
  • For a VC network, each router’s forwarding table includes VC number translation; the forwarding table in R1 might look like this:

  • In a VC network, the network’s routers must maintain connection state information for the ongoing connections.
  • Whenever a new connection is established across a router, a new connection entry must be added to the router’s forwarding table;
  • Whenever a connection is released, an entry must be removed from the table.
  • Why a packet doesn’t keep the same VC number on each of the links along its route?
  • Replacing the number from link to link reduces the length of the VC field in the packet header.
  • VC setup is simplified by permitting a different VC number at each link along the path. With multiple VC numbers, each link in the path can choose a VC number independently of the VC numbers chosen at other links along the path. If a common VC number were required for all links along the path, the routers would have to exchange and process a number of messages to agree on a common VC number to be used for a connection.

  • Three phases in a virtual circuit:
  • VC setup. The sending transport layer contacts the network layer, specifies the receiver’s address, and waits for the network to set up the VC. The network layer:
  • Data transfer. Once the VC has been established, packets can begin to flow along the VC.
  • VC teardown. This is initiated when the sender(or receiver) informs the network layer of its desire to terminate the VC. The network layer will then inform the end system on the other side of the network of the call termination and update the forwarding tables in each of the packet routers on the path to indicate that the VC no longer exists.
  • Distinction between VC setup at the network layer and connection setup at the transport layer(E.g., the TCP three-way handshake).
  • Connection setup at the transport layer involves only the two end systems. During transport-layer connection setup, the two end systems alone determine the parameters(E.g., initial sequence number and flow-control window size) of their transport-layer connection. The routers within the network are completely oblivious to it.
  • For a VC network layer, routers along the path between the two end systems are involved in VC setup, and each router is aware of all the VCs passing through it.
  • The messages that the end systems send into the network to initiate or terminate a VC, and the messages passed between the routers to set up the VC(i.e., to modify connection state in router tables) are known as signaling messages, and the protocols used to exchange these messages are referred to as signaling protocols.

4.2.2 Datagram Networks

  • In a datagram network, each time an end system wants to send a packet, it stamps the packet with the address of the destination end system and then pops the packet into the network. Figure 4-5.

  • As a packet is transmitted from source to destination, it passes through a series of routers. Each of these routers has a forwarding table that maps destination addresses to link interfaces; when a packet arrives at the router, the router uses the packet’s destination address to look up the appropriate output link interface in the forwarding table. The router then intentionally forwards the packet to that output link interface.
  • Suppose all destination addresses are 32 bits and our router has four links, numbered 0 through 3, and packets are to be forwarded to the link interfaces as follows:

  • We could have the following forwarding table with just four entries:

  • The router matches a prefix of the packet’s destination address with the entries in the table; if there’s a match, the router forwards the packet to a link associated with the match. When there are multiple matches, the router finds the longest matching entry in the table and forwards the packet to the link interface associated with the longest prefix match.
  • In a datagram network the forwarding tables are modified by the routing algorithms, which update a forwarding table every one-to-five minutes or so. In a VC network, a forwarding table in a router is modified whenever a new connection is set up through the router or whenever an existing connection through the router is torn down.
  • Because forwarding tables in datagram networks can be modified at any time, a series of packets sent from one end system to another may follow different paths through the network and may arrive out of order.

4.2.3 Origins of VC and Datagram Networks

4.3 What’s Inside a Router?

  • A high-level view of a generic router architecture is shown in Figure 4.6. Four router components can be identified:
  • Input ports. An input port performs several key functions.
  • Switching fabric. The switching fabric connects the router’s input ports to its output ports and it is completely contained within the router.
  • Output ports. An output port stores packets received from the switching fabric and transmits these packets on the outgoing link by performing the necessary link-layer and physical-layer functions. When a link is bidirectional, an output port will usually be paired with the input port for that link on the same line card(a printed circuit board containing one or more input ports, which is connected to the switching fabric).
  • Routing processor. The routing processor executes the routing protocols(Section 4.6), maintains routing tables and attached link state information, and computes the forwarding table for the router. It also performs the network management functions(Chapter 9).
  • A router’s input ports, output ports, and switching fabric together implement the forwarding function and are almost implemented in hardware. These forwarding functions are sometimes collectively referred to as the router forwarding plane.
  • While the forwarding plane operates at the nanosecond time scale, a router’s control functions(executing the routing protocols, responding to attached links that go up or down, and performing management functions) operate at the millisecond or second timescale. These router control plane functions are usually implemented in software and execute on the routing processor.

4.3.1 Input Processing

  • A detailed view of input processing is given in Figure 4.7.
  • The input port’s line termination function and link-layer processing implement the physical and link layers for that individual input link.
  • The lookup action is that the router uses the forwarding table to look up the output port to which an arriving packet will be forwarded via the switching fabric. The forwarding table is computed and updated by the routing processor, with a shadow copy stored at each input port. The forwarding table is copied from the routing processor to the line cards over a separate bus(the dashed line from the routing processor to the input line cards in Figure 4.6). With a shadow copy, forwarding decisions can be made locally at each input port, without invoking the centralized routing processor on a per-packet basis and So avoiding a centralized processing bottleneck.
  • Given the existence of a forwarding table, lookup is simple: we just search through the forwarding table looking for the longest prefix match. But at Gigabit transmission rates, this lookup must be performed in nanoseconds. So, techniques beyond a linear search through a large table are needed.
  • Once a packet’s output port has been determined via the lookup, the packet can be sent into the switching fabric. In some designs, a packet may be temporarily blocked from entering the switching fabric if packets from other input ports are currently using the fabric. A blocked packet will be queued at the input port and then scheduled to cross the fabric at a later point in time.
  • Beyond lookup action in input port processing, other actions must be taken:

4.3.2 Switching

  • The switching fabric is the heart of a router because packets are switched(i.e., forwarded) from an input port to an output port through this fabric. Switching can be accomplished in following ways(Figure 4.8).

Switching via memory.

  • The simplest, earliest routers were computers, with switching between input and output ports being done under direct control of the CPU(routing processor). Input and output ports functioned as I/O devices in operating system. An input port with an arriving packet first signaled the routing processor via an interrupt. The packet was then copied from the input port into processor memory. The routing processor then extracted the destination address from the header, looked up the appropriate output port in the forwarding table, and copied the packet to the output port’s buffers.
  • If the memory bandwidth is B packets per second that can be written into, or read from memory, then the overall forwarding throughput(the total rate at which packets are transferred from input ports to output ports) must be less than B/2. Two packets cannot be forwarded at the same time, even if they have different destination ports, since only one memory read/write over the shared system bus can be done at a time.
  • Many modern routers switch via memory. Difference from early routers is that the lookup of the destination address and the storing of the packet into the appropriate memory location are performed by processing on the input line cards. In some ways, routers that switch via memory are similar to shared-memory multiprocessors, with the processing on a line card switching(writing) packets into the memory of the appropriate output port.

Switching via a bus.

  • An input port transfers a packet directly to the output port over a shared bus, without intervention by the routing processor. This is done by having the input port pre-pend a switch-internal label(header) to the packet indicating the local output port to which this packet is being transferred and transmitting the packet onto the bus. The packet is received by all output ports, but only the port that matches the label will keep the packet. The label is then removed at the output port, as this label is only used within the switch to cross the bus.
  • If multiple packets arrive to the router at the same time, each at a different input port, all but one must wait since only one packet can cross the bus at a time. Because every packet must cross the single bus, the switching speed of the router is limited to the bus speed. Switching via a bus is often sufficient for routers that operate in small local area and enterprise networks.

Switching via an interconnection network.

  • One way to overcome the bandwidth limitation of a single, shared bus is to use interconnection network.
  • A crossbar switch is an interconnection network consisting of 2N buses that connect N input ports to N output ports(Figure 4.8). Each vertical bus intersects each horizontal bus at a crosspoint, which can be opened or closed at any time by the switch fabric controller that is part of the switching fabric itself. When a packet arrives from port A and needs to be forwarded to port Y, the switch controller closes the crosspoint at the intersection of buses A and Y, and port A then sends the packet onto its bus, which is picked up(only) by bus Y.
  • Note that a packet from port B can be forwarded to port X at the same time, since A-to-Y and B-to-X packets use different input and output buses. So, crossbar networks are capable of forwarding multiple packets in parallel. But if two packets from two different input ports are destined to the same output port, then one will have to wait at the input, since only one packet can be sent over any given bus at a time.
  • More sophisticated interconnection networks use multiple stages of switching elements to allow packets from different input ports to proceed towards the same output port at the same time through the switching fabric.

4.3.3 Output Processing

  • Output port processing(Figure 4.9) takes packets that have been stored in the output port’s memory and transmits them over the output link. This includes selecting and de-queueing packets for transmission, and performing the needed link-layer and physical-layer transmission functions.

4.3.4 Where Does Queueing Occur?

  • The location and extent of queueing(either at input port queues or output port queues) will depend on the traffic load, the relative speed of the switching fabric, and the line speed. As these queues grow large, the router’s memory can eventually be exhausted and packet loss will occur when no memory is available to store arriving packets. It is at these queues in router where packets are dropped and lost in the network.
  • Suppose that the input and output line speeds(transmission rates) all have a transmission rate of R-line packets per second, and that there are N input ports and N output ports. Assume that all packets have the same fixed length, and the packets arrive to input ports in a synchronous manner(i.e., the time to send a packet on any link is equal to the time to receive a packet on any link and during such an interval of time, either zero or one packet can arrive on an input link). Define the switching fabric transfer rate R-switch as the rate at which packets can be moved from input port to output port.
  • If R-switch is N times faster than R-line, then only negligible queuing will occur at the input ports. Because even in the worst case, where all N input lines are receiving packets, and all packets are to be forwarded to the same output port, each batch of N packets(one packet per input port) can be cleared through the switch fabric before the next batch arrives.
  • What can happen at the output ports? Suppose that R-switch is N times faster than R-line and packets arriving at each of the N input ports are destined to the same output port. So, in the time it takes to send a single packet onto the outgoing link, N new packets will arrive at this output port. Since the output port can transmit only a single packet in a unit of time(the packet transmission time), the N arriving packets will have to queue(wait) for transmission over the outgoing link. Then N more packets can arrive in the time it takes to transmit just one of the N packets that had just previously been queued. Eventually, the number of queued packets can grow large enough to exhaust available memory at the output port, in which case packets are dropped.

  • Figure 4.10. At time t, a packet has arrived at each of the incoming input ports, each destined for the uppermost outgoing port. Assuming identical line speeds and a switch operating at three times the line speed, one time unit later(i.e., in the time needed to receive or send a packet), all three original packets have been transferred to the outgoing port and are queued awaiting transmission. In the next time unit, one of these three packets will have been transmitted over the outgoing link. In our example, two new packets have arrived at the incoming side of the switch; one of these packets is destined for this uppermost output port.
  • Given that router buffers are needed to absorb the fluctuations in traffic load, then how much buffering is required. The rule of thumb [RFC 3439] for buffer sizing was that the amount of buffering(B) should be equal to an average round-trip time(RTT) times the link capacity(C).
  • With output port queuing, a packet scheduler must choose one packet among those queued for transmission(Chapter 7).
  • If there is not enough memory to buffer an incoming packet, a decision must be made to either drop the arriving packet(a policy known as drop-tail) or remove one or more already-queued packets to make room for the newly arrived packet. In some cases, it is advantageous to drop(or mark the header of) a packet before the buffer is full in order to provide a congestion signal to the sender.
  • Under RED, a weighted average is maintained for the length of the output queue.
  • If the average queue length is less than a minimum threshold(minth), when a packet arrives, the packet is admitted to the queue.
  • If the queue is full or the average queue length is greater than a maximum threshold(maxth), when a packet arrives, the packet is marked or dropped.
  • If the packet arrives to find an average queue length in the interval [minth, maxth], the packet is marked or dropped with a probability that is a function of the average queue length, minth, and maxth.
  • If the switch fabric is not fast enough to transfer all arriving packets through the fabric without delay, then packet queuing can occur at the input ports. Consider a crossbar switching fabric and suppose

  • Figure 4.11 shows an example in which two packets(darkly shaded) at the front of their input queues are destined for the same upper-right output port. Suppose that the switch fabric chooses to transfer the packet from the front of the upper-left queue. In this case, the darkly shaded packet in the lower-left queue must wait. This leads to the lightly shaded packet wait even though there is no contention for the middle-right output port. This phenomenon is known as head-of-the-line(HOL) blocking in an input-queued switch: a queued packet in an input queue must wait for transfer through the fabric(even though its output port is free) because it is blocked by another packet at the head of the line.

4.3.5 The Routing Control Plane

  • Figure 4.6. The network-wide routing control plane is decentralized: with different pieces(e.g., of a routing algorithm) executing at different routers and interacting by sending control messages to each other. Today’s Internet routers and the routing algorithms in Section 4.6 operate in this manner.
  • Recently, some researchers have begun exploring new router control plane architectures in which part of the control plane is implemented in the routers along with the data plane, and part of the control plane can be implemented externally to the router. A well-defined API dictates how these two parts interact and communicate with each other.

4.4 The Internet Protocol(IP): Forwarding and Addressing in the Internet

  • Figure 4.12: Internet’s network layer has three major components.
  • IP protocol.
  • Routing component, which determines the path a datagram follows from source to destination. Routing protocols(Section 4.6) compute the forwarding tables that are used to forward packets through the network.
  • The Internet’s network-layer error and information-reporting protocol(the Internet Control Message Protocol(ICMP)): a facility to report errors in datagrams and respond to requests for certain network-layer information.

4.4.1 Datagram Format

  • Version number(4 bits): specify the IP protocol version of the datagram. By looking at the version number, the router can determine how to interpret the remainder of the IP datagram.
  • Header length(4 bits). Because an IPv4 datagram can contain a variable number of options(which are included in the IPv4 datagram header), these 4 bits are needed to determine where in the IP datagram the data actually begins. Without options, the IP datagram has a 20-byte header.
  • Type of service(8 bits). The type of service(TOS) bits allow different types of IP datagrams(E.g., datagrams particularly requiring low delay, high throughput, or reliability) to be distinguished from each other.
  • Datagram length(16 bits). This is the total length of the IP datagram(header + data) in bytes.
  • Identifier(16 bits), flags(3 bits), fragmentation offset(13 bits). These three fields relate with IP fragmentation. IPv6 does not allow for fragmentation at routers.
  • Time-to-live(8 bits). The time-to-live(TTL) field ensure that datagrams do not circulate forever in the network. This field is decremented by one each time the datagram is processed by a router. If the TTL field reaches 0, the datagram must be dropped.
  • Protocol(8 bits). This field is used only when an IP datagram reaches its final destination. The value of this field indicates the specific transport-layer protocol to which the data portion of this IP datagram should be passed. 6 indicates that the data portion is passed to TCP, 17 indicates UDP.
  • Header checksum(16 bits). The header checksum is computed by treating each 2 bytes in the header as a number and summing these numbers using 1s complement arithmetic. Section 3.3: the 1s complement of this sum, known as the Internet checksum, is stored in the checksum field. A router computes the header checksum for each received IP datagram and detects an error condition if the checksum carried in the datagram header does not equal the computed check sum. Routers typically discard datagrams for which an error has been detected.
  • Only the IP header is checksummed at the IP layer, while the TCP/UDP checksum is computed over the entire TCP/UDP segment.
  • TCP/UDP and IP do not necessarily both have to belong to the same protocol stack.
  • Source and destination IP addresses(32 bits). When a source creates a datagram, it inserts its IP address into the source IP address field and inserts the address of the destination into the destination IP address field. The source host determines the destination address via a DNS lookup(Chapter 2).
  • Options. The options fields allow an IP header to be extended. But the existence of options does complicate matters: since datagram headers can be of variable length, one cannot determine a priori where the data field will start. Since some datagrams may require options processing and others may not, the amount of time needed to process an IP datagram at a router can vary greatly. IP options were dropped in the IPv6 header(Section 4.4.4).
  • Data(payload). The data field contains the transport-layer segment(TCP or UDP), ICMP messages(Section 4.4.3) or others. If the datagram carries a TCP segment, then each nonfragmented datagram carries a total of 40 bytes of header(20 bytes of IP header + 20 bytes of TCP header) along with the application-layer message.

IP Datagram Fragmentation

  • Chapter 5: Not all link-layer protocols can carry network-layer packets of the same size. Ethernet frames can carry up to 1500 bytes of data. The maximum amount of data that a link-layer frame can carry is called the maximum transmission unit(MTU). The MTU of the link-layer protocol places a hard limit on the length of an IP datagram. Problem is that each of the links along the route between sender and destination can use different link-layer protocols, and each of these protocols can have different MTUs.
  • How to squeeze over-sized IP datagram into the payload field of the link-layer frame? Solution is to fragment the data in the IP datagram into two or more smaller IP datagrams(referred as fragment), encapsulate each of these smaller IP datagrams in a separate link-layer frame; and send these frames over the outgoing link.
  • Fragments need to be reassembled before they reach the transport layer at the destination. IPv4 put the job of datagram reassembly in the end systems not in network routers.
  • When a destination host receives a series of datagrams from the same source, it needs to determine whether any of these datagrams are fragments of some original datagram. If some datagrams are fragments, it must determine when it has received the last fragment and how the fragments it has received should be pieced back together to form the original datagram.
  • IPv4 put identification, flag, and fragmentation offset fields in the IP datagram header. 1. All fragments of same original datagram have same identification number. The sending host increments the identification number for each datagram it sends.
  • The last fragment has a flag bit = 0, all other fragments have flag bit = 1.
  • The offset field specifies where the fragment fits within the original IP datagram. The amount of original payload data in all but the last fragment be a multiple of 8 bytes, and that the offset value be specified in units of 8-byte chunks.

  • At the destination, the payload of the datagram is passed to the transport layer only after the IP layer has fully reconstructed the original IP datagram. If one or more of the fragments does not arrive at the destination, the incomplete datagram is discarded and not passed to the transport layer. If TCP is being used at the transport layer, TCP will recover from this loss by having the source retransmit the data in the original datagram.
  • Fragmentation’s costs.
  • Complicates routers and end systems, which need to accommodate datagram fragmentation and reassembly.
  • Fragmentation can be used to create lethal DoS attacks, whereby the attacker sends a series of bizarre and unexpected fragments. Example is the Jolt2 attack, where the attacker sends a stream of small fragments to the target host, none of which has an offset of zero. The target can collapse as it attempts to rebuild datagrams out of the degenerate packets.

4.4.2 IPv4 Addressing

  • A host typically has only a single link into the network; when IP in the host wants to send a datagram, it does so over this link. The boundary between the host and the physical link is called an interface. A router has two or more links to which it is connected and the boundary between the router and any one of its links is also called an interface. IP address is associated with an interface, rather than with the host or router containing that interface.
  • Each IP address is 32 bits long and it is written in dotted-decimal notation, in which each byte of the address is written in its decimal form and is separated by a period(.) from other bytes in the address.

  • Figure 4.15: The three hosts in the upper-left portion, and the router interface to which they are connected, all have an IP address of the form 223.1.1.xxx. This network interconnecting three host interfaces and one router interface forms a subnet. IP addressing assigns an address to this subnet: 223.1.1.0/24, where the /24 notation known as a subnet mask, indicates that the leftmost 24 bits define the subnet address. Figure 4.16 illustrates the three IP subnets present in Figure 4.15.

  • Figure 4.17: 6 subnets: 223.1.1.0/24, 223.1.2.0/24, 223.1.3.0/24, 223.1.9.0/24, 223.1.8.0/24, 223.1.7.0/24. To determine the subnets for an interconnected system of routers and hosts, detach each interface from its host or router, creating islands of isolated networks, with interfaces terminating the end points of the isolated networks. Each of these isolated networks is called a subnet.
  • The Internet’s address assignment strategy is Classless Inter-Domain Routing(CIDR). The 32-bit IP address is divided into two parts and has the dotted-decimal form a.b.c.d/x.
  • The x leftmost bits constitute the network portion, and are referred to as the prefix or network prefix of the address. The Internet’s BGP routing protocol in Section 4.6: only x prefix bits are considered by routers outside the organization’s network. This reduces the size of the forwarding table in routers.
  • The remaining 32-x bits are different within the organization and they will be considered when forwarding packets at routers within the organization. These bits may have an additional subnetting structure.

  • Figure 4.18: Suppose that the Fly-By-Night-ISP advertises to the internet that it should be sent any datagrams whose first 20 address bits match 200.23.16.0/20. The internet need not know that there are eight other organizations within the address block 200.23.16.0/20. This ability to use a single prefix to advertise multiple networks is referred to as address aggregation or route aggregation or route summarization.
  • What happens when addresses are not allocated in a hierarchical manner: if Fly-By-Night-ISP acquires ISPs-R-Us and then has Organization 1 connect to the Internet through its subsidiary ISPs-R-Us?

  • Figure 4.19: Solution is that ISPs-R-Us also advertises the block of addresses for Organization 1, 200.23.18.0/23. When other routers in the Internet see the address blocks 200.23.16.0/20(from Fly-By-Night-ISP) and 200.23.18.0/23(from ISPs-R-Us) and want to route to an address in the block 200.23.18.0/23, they will use longest prefix matching and route toward ISPs-R-Us.
  • Classful addressing was used before CIDR: the network portions of an IP address were constrained to be 8, 16, or 24 bits in length and subnets with 8-, 16-, and 24-bit subnet addresses were known as class A, B, and C networks, respectively.
  • The IP broadcast address 255.255.255.255. When a host sends a datagram with destination address 255.255.255.255, the message is delivered to all hosts on the same subnet. Routers optionally forward the message into neighboring subnets (though usually don’t).
  • How an organization gets a block of addresses for its devices? How a device is assigned an address from within the organization’s block of addresses?

Obtaining a Block of Addresses

  • In order to obtain a block of IP addresses for use within an organization’s subnet, a network administrator contact its ISP, which provides addresses from a larger block of addresses that had already been allocated to the ISP. E.g., the ISP may has the address block 200.23.16.0/20.
  • The ISP could divide its address block into eight equal-sized contiguous address blocks and give one of these address blocks out to each of up to eight organizations that are supported by this ISP, as shown below.

  • IP addresses are managed under the authority of the Internet Corporation for Assigned Names and Numbers(ICANN) based on guidelines set forth in [RFC 2050]. This organization also manages the DNS root servers that assigning domain names and resolving domain name disputes.

Obtaining a Host Address: the Dynamic Host Configuration Protocol

  • Once an organization has obtained a block of addresses, it can assign individual IP addresses to the host and router interfaces in its organization. A system administrator will typically manually configure the IP addresses into the router. Host addresses can be configured manually, but often this task is done using the Dynamic Host Configuration Protocol(DHCP) [RFC 2131].
  • DHCP allows a host to obtain(be allocated) an IP address automatically. A network administrator can configure DHCP so that a given host receives the same or different IP address each time it connects to the network. DHCP also allows a host to learn additional information: its subnet mask, the address of its first-hop router(called the default gateway), and the address of its local DNS server.
  • DHCP is often referred to as a plug-and-play protocol. It is useful for the network administrator, in residential Internet access networks and in wireless LANs, where hosts join and leave the network frequently.
  • DHCP is a client-server protocol. A client is a newly arriving host wanting to obtain network configuration information(including an IP address). In the simplest case, each subnet will have a DHCP server. If no server is present on the subnet, a DHCP relay agent(typically a router) that knows the address of a DHCP server for that network is needed.

  • Figure 4.20 shows a DHCP server attached to subnet 223.1.2/24, with the router serving as the relay agent for arriving clients attached to subnets 223.1.1/24 and 223.1.3/24.
  • The DHCP protocol is a four-step process for a newly arriving host, as shown in Figure 4.21 for the network setting shown in Figure 4.20. yiaddr(your Internet address) indicates the address being allocated to the newly arriving client.

  1. DHCP server discovery.
  2. DHCP server offer(s).
  3. DHCP request.
  4. DHCP ACK.
  5. Once the client receives the DHCP ACK, the interaction is complete and the client can use the DHCP-allocated IP address for the lease duration. Since a client may want to use its address beyond the lease’s expiration, DHCP also provides a mechanism that allows a client to renew its lease on an IP address.

Network Address Translation(NAT)

  • Figure 4.22 shows the operation of a NAT-enabled router.
  • When generating a new source port number, the NAT router can select any source port number that is not currently in the NAT translation table. Because a port number field is 16 bits long, the NAT protocol can support over 60000 simultaneous connections with a single WAN-side IP address for the router.
  • NAT in the router adds an entry to its NAT translation table.
  • When datagram arrives at the NAT router, the router indexes the NAT translation table using the destination IP address and destination port number to obtain the real IP address and destination port number for user.
  • The router then rewrites the datagram’s destination address and destination port number, and forwards the datagram into the home network.
  • The address space 10.0.0.0/8 is one of three portions of the IP address space that is reserved in RFC 1918 for a private network or a realm with private addresses(i.e., a network whose addresses only have meaning to devices within that network).
  • The router gets its address from the ISP’s DHCP server, and the router runs a DHCP server to provide addresses to computers within the NAT-DHCP-router-controlled home network’s address space.
  • One problem with NAT is that it interferes with P2P applications(P2P file-sharing applications and P2P Voice-over-IP applications). Chapter 2: In a P2P application, any participating Peer A should be able to initiate a TCP connection to any other participating Peer B. The essence of the problem is that if Peer B is behind a NAT, it cannot act as a server and accept TCP connections.

UPnP

  • UPnP allows external hosts to initiate communication sessions to NATed hosts, using either TCP or UDP.
  • UPnP requires that both the host and the NAT be UPnP compatible. With UPnP, an application running in a host can request a NAT mapping between (private IP address, private port number) and (public IP address, public port number) for some requested public port number. If the NAT accepts the request and creates the mapping, then nodes from the outside can initiate TCP connections to (public IP address, public port number), i.e., to (private IP address, private port number) after NAT translation. And UPnP lets the application know the value of (public IP address, public port number), so that the application can advertise it to the outside world.

4.4.3 Internet Control Message Protocol(ICMP)

  • ICMP, specified in [RFC 792], is used by hosts and routers to communicate network-layer information to each other. The typical use of ICMP is for error reporting.
  • Architecturally ICMP lies just above IP, as ICMP messages are carried as IP payload. ICMP messages have a type and a code field, and contain the header and the first 8 bytes of the IP datagram that caused the ICMP message to be generated in the first place(so that the sender can determine the datagram that caused the error).
  • Selected ICMP message types are shown in Figure 4.23.

  • The ping program sends an ICMP type 8 code 0 message to the specified host. The destination host, seeing the echo request, sends back a type 0 code 0 ICMP echo reply.
  • The traceroute program allows us to trace a route from a host to any other host in the world.
  • Traceroute in the source sends a series of IP datagrams to the destination. Each of these datagrams carries a UDP segment with an unlikely UDP port number. The first of these datagrams has a TTL of 1, the second of 2, the third of 3, and so on. The source also starts timers for each of the datagrams.(The standard traceroute program sends sets of three packets with the same TTL, so the traceroute output provides three results for each TTL.)
  • When the nth datagram arrives at the nth router, the nth router observes that the TTL of the datagram has just expired, so the router discards the datagram and sends an ICMP warning message that includes the name of the router and its IP address to the source (type 11 code 0).
  • When the ICMP warning message arrives back at the source, the source obtains the round-trip time from the timer and the name and IP address of the nth router from the ICMP message.
  • When one of the datagrams eventually make it all the way to the destination host, source stops sending UDP segments. Because this datagram contains a UDP segment with an unlikely port number, the destination host sends a port unreachable ICMP message(type 3 code 3) back to the source.
  • When the source host receives this message, it does not need to send additional probe packets.
  • Two popular defense mechanisms to malicious packet attacks are firewalls and intrusion detection systems(IDSs).
  • Firewalls inspect the datagram and segment header fields, denying suspicious datagrams entry into the internal network. E.g., a firewall may be configured to block all ICMP echo request packets, thereby preventing an attacker from doing a ping sweep across your IP address range. Firewalls can also block packets based on source and destination IP addresses and port numbers. Additionally, firewalls can be configured to track TCP connections, granting entry only to datagrams that belong to approved connections.
  • An IDS, typically situated at the network boundary, performs “deep packet inspection”, examining not only header fields but also the payloads in the datagram (including application-layer data). An IDS has a database of packet signatures that are known to be part of attacks. This database is automatically updated as new attacks are discovered. As packets pass through the IDS, the IDS attempts to match header fields and payloads to the signatures in its signature database. If such a match is found, an alert is created. An intrusion prevention system(IPS) is similar to an IDS, except that it actually blocks packets in addition to creating alerts(Chapter 8).

4.4.4 IPv6

IPv6 Datagram Format

  • The format of the IPv6 datagram is shown in Figure 4.24. The most important changes introduced in IPv6 are:
  • Expanded addressing capabilities. IPv6 address is 128 bits. In addition to unicast and multicast addresses, IPv6 introduce a new type of address, called an anycast address, which allows a datagram to be delivered to any one of a group of hosts. This feature could be used, e.g., to send an HTTP GET to the nearest of a number of mirror sites that contain a given document.
  • A streamlined 40-byte header. A number of IPv4 fields have been dropped or optional. The resulting 40-byte fixed-length header allows for faster processing of the IP datagram. A new encoding of options allows for more flexible options processing.
  • Flow labeling and priority. IPv6 has an elusive definition of a flow. RFC 1752 and RFC 2460 state that this allows “labeling of packets belonging to particular flows for which the sender requests special handling, such as a nondefault quality of service or real-time service”. E.g., audio and video transmission might be treated as a flow; the traditional applications, such as file transfer and e-mail, might not be treated as flows. The IPv6 header has an 8-bit traffic class field that can be used to give priority to certain datagrams within a flow, or it can be used to give priority to datagrams from certain applications(e.g., ICMP) over datagrams from other applications(e.g., network news).
  • Version(4-bit): identifies the IP version number. IPv6 carries a value of 6 in this field. Putting 4 in this field does not create a valid IPv4 datagram.
  • Traffic class(8-bit): similar in spirit to the TOS field in IPv4.
  • Flow label(20-bit): used to identify a flow of datagrams.
  • Payload length(16-bit): treated as an unsigned integer giving the number of bytes in the IPv6 datagram following the fixed-length, 40-byte datagram header.
  • Next header(8-bit): identifies the protocol to which the contents(data field) of this datagram will be delivered(e.g., to TCP or UDP). The field uses the same values as the protocol field in the IPv4 header.
  • Hop limit(8-bit): decremented by one by each router that forwards the datagram. If the hop limit count reaches zero, the datagram is discarded.
  • Source and destination addresses(128-bit): described in RFC 4291.
  • Data: the payload portion of the IPv6 datagram. When the datagram reaches its destination, the payload will be removed from the IP datagram and passed on to the protocol specified in the next header field.
  • Several fields in the IPv4 datagram are no longer present in the IPv6 datagram:
  • Fragmentation/Reassembly.
  • Header checksum.
  • Options.
  • New version of ICMP has been defined for IPv6 in RFC 4443. In addition to reorganizing the existing ICMP type and code definitions, ICMPv6 added new types and codes required by the new IPv6 functionality. These include the “Packet Too Big” type, and an “unrecognized IPv6 options” error code. ICMPv6 also subsumes the functionality of the Internet Group Management Protocol(IGMP, which is used to manage a host’s joining and leaving of multicast groups, Section 4.7).

How will the public Internet based on IPv4 be transitioned to IPv6?

  • One way is to introduce IPv6-capable nodes is a dual-stack approach, where IPv6 nodes also have an IPv4 implementation. Such a node, referred to as an IPv6/IPv4 node in RFC 4213, has the ability to send and receive both IPv4 and IPv6 datagrams. When interoperating with an IPv4 node, it can use IPv4 datagrams; when interoperating with an IPv6 node, it can speak IPv6.
  • IPv6/IPv4 nodes must have both IPv6 and IPv4 addresses. They must be able to determine whether another node is IPv6-capable or IPv4-only. This problem can be solved using the DNS, which can return an IPv6 address if the node name being resolved is IPv6-capable, or otherwise return an IPv4 address. If the node issuing the DNS request is only IPv4-capable, the DNS returns only an IPv4 address.
  • In the dual-stack approach, if either the sender or the receiver is only IPv4 capable, an IPv4 datagram must be used. It is possible that two IPv6 capable nodes can end up sending IPv4 datagrams to each other. This is illustrated in Figure 4.25.

  • The data field of the IPv6 datagram can be copied into the data field of the IPv4 datagram and appropriate address mapping can be done. But there will be IPv6-specific fields in the IPv6 datagram(e.g., the flow identifier field) that have no counterpart in IPv4 and their information will be lost.
  • Another way is tunneling. Suppose two IPv6 nodes want to interoperate using IPv6 datagrams but are connected to each other by intervening IPv4 routers. We refer to the intervening set of IPv4 routers between two IPv6 routers as a tunnel, as illustrated in Figure 4.26.

    1. The IPv6 node on the sending side takes the entire IPv6 datagram and puts it in the data field of an IPv4 datagram. This IPv4 datagram is then addressed to the IPv6 node on the receiving side of the tunnel and sent to the first node in the tunnel.
    2. The intervening IPv4 routers in the tunnel route this IPv4 datagram among themselves, unaware that the IPv4 datagram itself contains a complete IPv6 datagram.
    3. The IPv6 node on the receiving side of the tunnel eventually receives the IPv4 datagram, determines that the IPv4 datagram contains an IPv6 datagram, extracts the IPv6 datagram, and then routes the IPv6 datagram exactly as it would if it had received the IPv6 datagram from a directly connected IPv6 neighbor.

4.4.5 A Brief Foray into IP Security

  • One of the network-layer protocols that provide a variety of security services is IPsec (detail in Chapter 8).
  • IPsec is compatible with IPv4 and IPv6. In order to use IPsec, we don’t need to replace the protocol stacks in all the routers and hosts in the Internet. E.g., using the transport mode(one of two IPsec “modes”), if two hosts want to securely communicate, IPsec needs to be available only in those two hosts. All other routers and hosts can continue to run vanilla IPv4.
  • In IPsec’s transport mode, two hosts first establish an IPsec session between themselves(IPsec is connection oriented) and then all TCP and UDP segments sent between the two hosts enjoy the security services provided by IPsec.
  • The services provided by an IPsec session include:
  • Cryptographic agreement. Mechanisms that allow the two communicating hosts to agree on cryptographic algorithms and keys.
  • Encryption of IP datagram payloads. When the sending host receives a segment from the transport layer, IPsec encrypts the payload. The payload can only be decrypted by IPsec in the receiving host.
  • Data integrity. IPsec allows the receiving host to verify that the datagram’s header fields and encrypted payload were not modified while the datagram was en route from source to destination.
  • Origin authentication. When a host receives an IPsec datagram from a trusted source with a trusted key, the host is assured that the source IP address in the datagram is the actual source of the datagram.
  • When two hosts have an IPsec session established between them, all TCP and UDP segments sent between them will be encrypted and authenticated. IPsec therefore provides blanket coverage, securing all communication between the two hosts for all network applications.

4.5 Routing Algorithms

  • Typically a host is attached directly to the default router for the host(also called the first-hop router). Whenever a host sends a packet, the packet is transferred to its default router. We refer to the default router of the source host as the source router and the default router of the destination host as the destination router. The problem of routing a packet from source host to destination host is the problem of routing the packet from source router to destination router.
  • The purpose of a routing algorithm is that given a set of routers, with links connecting the routers, a routing algorithm finds a good path(one that has the least cost) from source router to destination router.
  • A graph G = (N, E) is a set N of nodes(routers) and a collection E of edges(the physical links between these routers). Figure 4.27.

  • For any edge(x,y) in E, c(x,y) is the cost of the edge between nodes x and y. If the pair(x,y) does not belong to E, c(x,y) = ∞.
  • Classify routing algorithms according to whether they are global or decentralized.
  • A global routing algorithm computes the least-cost path between a source and destination using complete, global knowledge about the network. This requires that the algorithm obtain this information before performing the calculation. Algorithms with global state information are referred to as link-state(LS) algorithms.
  • In a decentralized routing algorithm, the calculation of the least-cost path is carried out in an iterative, distributed manner. Each node begins with only the knowledge of the costs of its own directly attached links. Through an iterative process of calculation and exchange of information with its neighboring nodes, a node gradually calculates the least-cost path to destination. The decentralized routing algorithm is called a distance-vector(DV) algorithm because each node maintains a vector of estimates of the costs(distances) to all other nodes in the network.
  • A second way to classify routing algorithms is according to whether they are static or dynamic.
  • In static routing algorithms, routes change very slowly over time, often as a result of human intervention(e.g., a human manually editing a router’s forwarding table).
  • Dynamic routing algorithms change the routing paths as the network traffic loads or topology change. A dynamic algorithm can be run either periodically or in direct response to topology or link cost changes. While dynamic algorithms are more responsive to network changes, they are also more susceptible to problems such as routing loops and oscillation in routes.
  • A third way to classify routing algorithms is according to whether they are load-sensitive or load-insensitive.
  • In a load-sensitive algorithm, link costs vary dynamically to reflect the current level of congestion in the underlying link. If a high cost is associated with a link that is currently congested, a routing algorithm will tend to choose routes around such a congested link.
  • For load-insensitive algorithm, a link’s cost does not explicitly reflect its current(or recent past) level of congestion.
  • In a link-state algorithm, the network topology and all link costs are available as input to the LS algorithm. In practice this is accomplished by having each node broadcast link-state packets to all other nodes in the network, with each link-state packet containing the identities and costs of its attached links(a link-state broadcast algorithm, Section 4.7). The result of the nodes’ broadcast is that all nodes have an identical and complete view of the network. Each node can then run the LS algorithm and compute the same set of least-cost paths as every other node.
  • Dijkstra’s algorithm computes the least-cost path from one node(the source u) to all other nodes in the network. It is iterative and has the property that after the kth iteration of the algorithm, the least-cost paths are known to k destination nodes, and these k paths will have the k smallest costs among the least-cost paths to all destination nodes. Define the following notation:
  • D(v): cost of the least-cost path from the source node to destination v as of this iteration of the algorithm.
  • p(v): previous node(neighbor of v) along the current least-cost path from the source to v.
  • NЈ : subset of nodes; v is in NЈ if the least-cost path from the source to v is definitively known.
  • The global routing algorithm consists of an initialization step followed by a loop. The number of times the loop is executed is equal to the number of nodes in the network. Upon termination, the algorithm will have calculated the shortest paths from the source node u to every other node in the network.

  • Consider the network in Figure 4.27 and compute the least-cost paths from u to all possible destinations. A summary of the Dijkstra’s algorithm’s computation is shown in Table 4.3, where each line in the table gives discussion and example the values of the algorithm’s variables at the end of the iteration.

  • Figure 4.28 shows the resulting least-cost paths and forwarding table in u for the network in Figure 4.27.

  • Given n nodes, how much computation must be done in the worst case to find the least-cost paths from the source to all destinations?

  • Consider a pathology that can arise. Figure 4.29 shows a network topology where link costs are equal to the load carried on the link. Link costs are not symmetric: c(u,v) equals c(v,u) only if the load carried on both directions on the link(u,v) is the same.
  • In this example, node z originates a unit of traffic destined for w, node x also originates a unit of traffic destined for w, and node y injects an amount of traffic equal to e, also destined for w. The initial routing is shown in Figure 4.29(a) with the link costs corresponding to the amount of traffic carried.
  • When the LS algorithm is next run, node y determines that the clockwise path to w has a cost of 1, while the counterclockwise path to w(current using) has a cost of 1+e. Hence y’s least-cost path to w is now clockwise. Similarly, x determines that its new least-cost path to w is also clockwise, resulting in costs shown in Figure 4.29(b).
  • When the LS algorithm is run next, nodes x, y, and z all detect a zero-cost path to w in the counterclockwise direction, and all route their traffic to the counterclockwise routes. The next time the LS algorithm is run, x, y, and z all then route their traffic to the clockwise routes.
  • One solution is to ensure that not all routers run the LS algorithm at the same time. Researchers have found that routers in the Internet can self-synchronize among themselves: even though they initially execute the algorithm with the same period but at different instants of time, the algorithm execution instance can eventually become and remain synchronized at the routers. One way to avoid such self synchronization is for each router to randomize the time it sends out a link advertisement.

4.5.2 The Distance-Vector(DV) Routing Algorithm

  • The distance vector(DV) algorithm is:
  • distributed: each node receives some information from one or more of its directly attached neighbors, performs a calculation, and then distributes the results of its calculation back to its neighbors.
  • iterative: this process continues on until no more information is exchanged between neighbors.
  • asynchronous: it doesn’t require all nodes to operate in lockstep with each other.
  • Let dx(y) be the cost of the least-cost path from node x to node y. Then the least costs are related by the celebrated Bellman-Ford equation:
  • The solution to the Bellman-Ford equation provides the entries in node x’s forwarding table. Let v* be any neighboring node that achieves the minimum in Equation 4.1. Then, if node x wants to send a packet to node y along a least-cost path, it should first forward the packet to node v*. So, node x’s forwarding table would specify node v* as the next-hop router for the ultimate destination y.
  • The Bellman-Ford equation also suggests the form of the neighbor-to-neighbor communication that will take place in the DV algorithm. The idea is as follows.
  • Each node x begins with Dx(y), an estimate of the cost of the least-cost path from itself to node y, for all nodes in N. Let Dx = [Dx(y): y in N] be node x’s distance vector, which is the vector of cost estimates from x to all other nodes, y, in N. Each node x maintains the following routing information:
  • For each neighbor v, the cost c(x,v) from x to directly attached neighbor, v;
  • Node x’s distance vector, Dx = [Dx(y): y in N], containing x’s estimate of its cost to all destinations, y, in N;
  • The distance vectors of each of its neighbors, i.e., Dv = [Dv(y): y in N] for each neighbor v of x.
  • In the distributed, asynchronous algorithm, each node sends a copy of its distance vector to each of its neighbors from time to time. When a node x receives a new distance vector from any of its neighbors v, it saves v’s distance vector, and then uses the Bellman-Ford equation to update its own distance vector as follows:
  • If node x’s distance vector has changed as a result of this update, node x will then send its updated distance vector to each of its neighbors, which can in turn update their own distance vectors. As long as all the nodes continue to exchange their distance vectors in an asynchronous fashion, each cost estimate Dx(y) converges to dx(y), the actual cost of the least-cost path from node x to node y.

  • In the DV algorithm, a node x updates its distance-vector estimate when it either sees a cost change in one of its directly attached links or receives a distance vector update from some neighbor. To update its own forwarding table for a given destination y, what node x really needs to know is the neighboring node v*(y) that is the next-hop router along the shortest path to y, i.e., the neighbor v that achieves the minimum in Line 14 of the DV algorithm.(If there are multiple neighbors v that achieve the minimum, then v*(y) can be any of the minimizing neighbors.) So, in Lines 13-14, for each destination y, node x also determines v*(y) and updates its forwarding table for destination y.
  • The only information a node will have is the costs of the links to its directly attached neighbors and information it receives from these neighbors. Each node waits for an update from any neighbor(Lines 10-11), calculates its new distance vector when receiving an update(Line 14), and distributes its new distance vector to its neighbors(Lines 16-17). DV-like algorithms are used in the Internet’s RIP and BGP.

  • Figure 4.30 illustrates the operation of the DV algorithm for the network shown at the top of the figure.
  • The process of receiving updated distance vectors from neighbors, recomputing routing table entries, and informing neighbors of changed costs of the least-cost path to a destination continues until no update messages are sent. At this point, the algorithm will enter a quiescent state(Lines 10-11). The algorithm remains in the quiescent state until a link cost changes.

Distance-Vector Algorithm: Link-Cost Changes and Link Failure

  • When a node running the DV algorithm detects a change in the link cost from itself to a neighbor(Lines 10-11), it updates its distance vector(Lines 13-14) and, if there’s a change in the cost of the least-cost path, informs its neighbors(Lines 16-17) of its new distance vector.

  • Figure 4.31(a): the link cost from y to x changes from 4 to 1. We focus only on y’ and z’s distance table entries to destination x. The DV algorithm causes the following events to occur:
  • At time t0, y detects the link-cost change from 4 to 1, updates its distance vector, and informs its neighbors of this change since its distance vector has changed.
  • At time t1, z receives the update from y and updates its table. It computes a new least cost to x(decrease from 5 to 2) and sends its new distance vector to its neighbors.
  • At time t2, y receives z’s update and updates its distance table. y’s least costs do not change and hence y does not send any message to z. So, only two iterations are required for the DV algorithm to reach a quiescent state.
  • The good news about the decreased cost has propagated quickly through the network.
  • Figure 4.31(b): the link cost between x and y increases from 4 to 60. Before the link cost changes, Dy(x) = 4, Dy(z) = 1, Dz(y) = 1, and Dz(x) = 5.
  • At time t0, y detects the link-cost change from 4 to 60. y computes its new minimum-cost path to x to have a cost of
  • At time t1, we have a routing loop: y routes through z, and z routes through y. A routing loop is like a black hole: a packet destined for x arriving at y or z as of t1 will bounce back and forth between these two nodes forever(or until the forwarding tables are changed). Since node y has computed a new minimum cost to x, it informs z of its new distance vector at time t1.
  • At time t2, z receives y’s new distance vector, which indicates that y’s minimum cost to x is 6. z knows it can get to y with a cost of 1 and computes a new least cost to x of
  • After receiving z’s new distance vector, y determines Dy(x) = 8 and sends z its distance vector. z then determines Dz(x) = 9 and sends y its distance vector, and so on. The loop will persist until z eventually computes the cost of its path via y to be greater than 50. At this point, z will determine that its least-cost path to x is via its direct connection to x. y will then route to x via z.
  • The result of the bad news about the increase in link cost traveled slowly. This problem is referred to as the count-to-infinity problem.

Distance-Vector Algorithm: Adding Poisoned Reverse

  • The looping scenario can be avoided using “poisoned reverse”: if z routes through y to get to x, then z will advertise to y that its distance to x is infinity, i.e., z will advertise to y that Dz(x) = ∞(even though z knows Dz(x) = 5 in truth). z will continue telling this to y as long as it routes to x via y. Since y believes that z has no path to x, y will never attempt to route to x via z as long as z continues to route to x via y.

  • For Figure 4.31(b):
  • Because y’s distance table indicates Dz(x) = ∞, when the cost of (x, y) link changes from 4 to 60 at time t0, y updates its table and continues to route directly to x, and informs z of its new cost to x, i.e., Dy(x) = 60.
  • After receiving the update at t1, z shifts its route to x to be via the (z, x) link at a cost of 50. Since this is a new least-cost path to x, and since the path no longer passes through y, z now informs y that Dz(x) = 50 at t2.
  • After receiving the update from z, y updates its distance table with Dy(x) = 51. Since z is now on y’s least-cost path to x, y poisons the reverse path from z to x by informing z that Dy(x) = ∞(even though y knows Dy(x) = 51 in truth).
  • However, loops involving three or more nodes(rather than two immediately neighboring nodes) will not be detected by the poisoned reverse technique.

A Comparison of LS and DV Routing Algorithms

  • The DV and LS algorithms take complementary approaches towards computing routing.
  • DV algorithm: each node talks to only its directly connected neighbors, but it provides its neighbors with least-cost estimates from itself to all the nodes(that it knows about) in the network.
  • LS algorithm: each node talks with all other nodes via broadcast, but it tells them only the costs of its directly connected links.
  • Comparison some of their attributes(N is the set of routers and E is the set of links):
  • Message complexity.
  • Speed of convergence.
  • Robustness. What can happen if a router fails?

Other Routing Algorithms

  • The LS and DV algorithms are the only routing algorithms used in practice today in the Internet.
  • A broad class of routing algorithms is based on viewing packet traffic as flows between sources and destinations in a network. In this approach, the routing problem can be formulated mathematically as a constrained optimization problem known as a network flow problem.
  • Another set of routing algorithms are derived from the telephony world. These circuit-switched routing algorithms are of interest to packet-switched data networking in cases where per-link resources(buffers…) are to be reserved for each connection that is routed over the link.

4.5.3 Hierarchical Routing

  • In LS and DV algorithms, we view the network as a collection of interconnected routers. One router was same as another in the sense that all routers executed the same routing algorithm to compute routing paths through the entire network. This model is simplistic for at least two reasons:
  • Scale. As the number of routers becomes large, the overhead involved in computing, storing, and communicating routing information becomes prohibitive.
  • Administrative autonomy. An organization should be able to run and administer its network as it wishes, while still being able to connect its network to other outside networks.
  • Both problems can be solved by organizing routers into autonomous systems(ASs), with each AS consisting of a group of routers that are typically under the same administrative control(e.g., operated by the same ISP or belonging to the same company network).
  • Routers within the same AS all run the same routing algorithm and have information about each other. The routing algorithm running within an AS is called an intra-autonomous system routing protocol that is necessary to connect ASs to each other. Some routers in an AS that have the added task of forwarding packets to destinations outside the AS are called gateway routers.

  • Figure 4.32: The heavy lines represent direct link connections between pairs of routers. The thinner lines hanging from the routers represent subnets that are directly connected to the routers.
  • AS1 has four routers(1a, 1b, 1c, and 1d) which run the intra-AS routing protocol used within AS1. Each of these four routers knows how to forward packets along the optimal path to any destination within AS1. The intra-AS routing protocols running in AS1, AS2, and AS3 need not be the same. The routers 1b, 1c, 2a and 3a are gateway routers.
  • How does a router in some AS know how to route a packet to a destination that is outside the AS?
  • If the AS has only one gateway router that connects to only one other AS.
  • If the source AS has two or more links that lead outside the AS.
  • Suppose that AS2 and AS3 connect to other ASs and AS1 learns from the inter-AS routing protocol that subnet x is reachable both from AS2(via gateway 1b) and from AS3(via gateway 1c). AS1 would then propagate this information to all its routers. In order to configure forwarding table, router 1d would have to determine to which gateway router it should direct packets that are destined for subnet x.
  • One approach which is often employed in practice is to use hot-potato routing. In hot-potato routing, the AS gets rid of the packet(the hot potato) as inexpensively as possible. This is done by having a router send the packet to the gateway router that has the smallest router-to-gateway cost among all gateways with a path to the destination. In this example, hot-potato routing running in 1d would use information from the intra-AS routing protocol to determine the path costs to 1b and 1c, and then choose the path with the least cost. Once this path is chosen, router 1d adds an entry for subnet x in its forwarding table. Figure 4.33 summarizes the actions taken at router 1d for adding the new entry for x to the forwarding table.

  • When an AS learns about a destination from a neighboring AS, the AS can advertise this routing information to some of its other neighboring ASs. E.g., suppose AS1 learns from AS2 that subnet x is reachable via AS2. AS1 could then tell AS3 that x is reachable via AS1. So, if AS3 needs to route a packet destined to x, AS3 would forward the packet to AS1, which would in turn forward the packet to AS2.
  • The Internet consists of a hierarchy of interconnected ISPs, many ISPs partition their network into multiple ASs. E.g., some tier-1 ISPs use one AS for their entire network; others break up their ISP into tens of interconnected ASs.
  • Summary: the problems of scale and administrative authority are solved by defining autonomous systems. Within an AS, all routers run the same intra-AS routing protocol; among themselves, the ASs run the same inter-AS routing protocol.
  • Scale problem is solved because an intra-AS router need only know about routers within its AS.
  • Administrative authority problem is solved since an organization can run whatever intra-AS routing protocol it chooses; but each pair of connected ASs needs to run the same inter-AS routing protocol to exchange reachability information.
  • In the following section, we’ll examine two intra-AS routing protocols(RIP and OSPF) and the inter-AS routing protocol(BGP) that are used in today’s Internet.

4.6 Routing in the Internet

4.6.1 Intra-AS Routing in the Internet: RIP

  • An intra-AS routing protocol(also known as interior gateway protocols) is used to determine how routing is performed within an autonomous system(AS). Two intra-AS routing protocol are used in the Internet: Routing Information Protocol(RIP) and Open Shortest Path First(OSPF).
  • RIP is a distance-vector protocol that operates like the idealized DV protocol in Section 4.5.2. The version of RIP specified in RFC 1058 uses hop count as a cost metric(i.e., each link has a cost of 1). In RIP and OSPF, costs are from source router to a destination subnet. RIP uses the term hop, which is the number of subnets traversed along the shortest path from source router to destination subnet, including the destination subnet. Figure 4.34 illustrates an AS with six leaf subnets. The table indicates the number of hops from the source A to each of the leaf subnets.

  • The maximum cost of a path is limited to 15, So limiting the use of RIP to autonomous systems that are fewer than 15 hops in diameter. The distance vector for any one router is the current estimate of the shortest path distances from that router to the subnets in the AS. In RIP, routing updates are exchanged between neighbors about every 30 seconds using a RIP response message(also known as RIP advertisements). The response message sent by a router or host contains a list of up to 25 destination subnets within the AS and the sender’s distance to each of those subnets.

  • Figure 4.35: Lines connecting the routers denote subnets; dotted lines indicate that the AS continues on.
  • Each router maintains a RIP table known as a routing table that includes both the router’s distance vector and the router’s forwarding table. Figure 4.36 shows the routing table for router D.

  • Routing table has three columns:
  • The first column is for the destination subnet.
  • The second column indicates the identity of the next router along the shortest path to the destination subnet.
  • The third column indicates the number of hops(i.e., the number of subnets that have to be traversed, including the destination subnet) to get to the destination subnet along the shortest path.
  • In principle, a routing table will have one row for each subnet in the AS, although RIP version 2 allows subnet entries to be aggregated using route aggregation techniques similar to those examined in Section 4.4.

  • Suppose that 30 seconds later, router D receives from router A the advertisement shown in Figure 4.37. Router D then merges this advertisement with the old routing table, as shown in Figure 4.38.

  • Recall that RIP routers exchange advertisements approximately every 30 seconds, if a router does not hear from its neighbor at least once every 180 seconds, that neighbor is considered to be no longer reachable. When this happens, RIP modifies the local routing table and then sends this information to its neighboring routers that are still reachable. A router can also request information about its neighbor’s cost to a given destination using RIP’s request message.
  • Routers send RIP request and response messages to each other over UDP using port number 520. The UDP segment is carried between routers in a standard IP datagram.

  • Figure 4.39 sketches how RIP is typically implemented in a UNIX system. A process called routed executes RIP, maintains routing information and exchanges messages with routed processes running in neighboring routers. Because RIP is implemented as an application-layer process, it can send and receive messages over a standard socket and use a standard transport protocol. RIP is implemented as an application-layer protocol running over UDP.

4.6.2 Intra-AS Routing in the Internet: OSPF(Open Shortest Path First)

  • OSPF is typically deployed in upper-tier ISPs whereas RIP is deployed in lower-tier ISPs and enterprise networks.
  • OSPF is a link-state protocol that uses flooding of link-state information and a Dijkstra least-cost path algorithm. With OSPF, a router constructs a complete topological map of the entire autonomous system. The router then locally runs Dijkstra’s shortest-path algorithm to determine a shortest-path tree to all subnets, with itself as the root node. Individual link costs are configured by the network administrator.
  • With OSPF, a router broadcasts link-state information to all other routers in the autonomous system whenever there is a change in a link’s state. It also broadcasts a link’s state periodically(at least once every 30 minutes) even if the link’s state has not changed.
  • OSPF advertisements are contained in OSPF messages that are carried directly by IP, with an upper-layer protocol of 89 for OSPF. So, the OSPF protocol must itself implement functionality such as reliable message transfer and link-state broadcast. It also checks that links are operational(via a HELLO message sent to an attached neighbor) and allows an OSPF router to obtain a neighboring router’s database of network-wide link state.
  • Some advances in OSPF:
  • Security. Exchanges between OSPF routers(e.g., link-state updates) can be authenticated. With authentication, only trusted routers can participate in the OSPF protocol within an AS, so preventing malicious intruders from injecting incorrect information into router tables.
  • Multiple same-cost paths. When multiple paths to a destination have the same cost, OSPF allows multiple paths to be used(i.e., a single path need not be chosen for carrying all traffic when multiple equal-cost paths exist).
  • Integrated support for unicast and multicast routing. Multicast OSPF(MOSPF) [RFC 1584] provides extensions to OSPF to provide for multicast routing(Section 4.7.2). MOSPF uses the existing OSPF link database and adds a new type of link-state advertisement to the existing OSPF link-state broadcast mechanism.
  • Support for hierarchy within a single routing domain. OSPF can structure an autonomous system hierarchically(advantages of hierarchical routing structures in Section 4.5.3).
  • An OSPF autonomous system can be configured hierarchically into areas.
  • Each area runs its own OSPF link-state routing algorithm, with each router in an area broadcasting its link state to all other routers in that area.
  • Within each area, one or more area border routers are responsible for routing packets outside the area.
  • Exactly one OSPF area in the AS is configured to be the backbone area that is to route traffic between the other areas in the AS. The backbone always contains all area border routers in the AS and may contain non-border routers as well. Inter-area routing within the AS requires that the packet be first routed to an area border router(intra-area routing), then routed through the backbone to the area border router in the destination area, and then routed to the final destination.

4.6.3 Inter-AS Routing: BGP

  • The Border Gateway Protocol version 4(referred to as BGP4 or BGP), specified in RFC 4271(also [RFC 4274), is the standard inter-AS routing protocol in today’s Internet. BGP provides each AS a means to
  • Obtain subnet reachability information from neighboring ASs.
  • Propagate the reachability information to all routers internal to the AS.
  • Determine good routes to subnets based on the reachability information and on AS policy.
  • BGP allows each subnet to advertise its existence to the rest of the Internet. A subnet says “I exist and I am here” and BGP makes sure that all the ASs in the Internet know about the subnet and how to get there.

BGP Basics

  • In BGP, pairs of routers exchange routing information over semi-permanent TCP connections using port 179. The semi-permanent TCP connections for the network in Figure 4.32 are shown in Figure 4.40.

  • For each TCP connection, the two routers at the end of the connection are called BGP peers, and the TCP connection along with all the BGP messages sent over the connection is called a BGP session. A BGP session that spans two ASs is called an external BGP(eBGP) session, and a BGP session between routers in the same AS is called an internal BGP(iBGP) session.
  • BGP allows each AS to learn which destinations are reachable via its neighboring ASs. In BGP, destinations are CIDRized prefixes, with each prefix representing a subnet or a collection of subnets. E.g., suppose there are four subnets attached to AS2: 138.16.64/24, 138.16.65/24, 138.16.66/24, and 138.16.67/24, then AS2 could aggregate the prefixes for these four subnets and use BGP to advertise the single prefix to 138.16.64/22 to AS1.
    1. Using the eBGP session between the gateway routers, AS-x sends AS-y the list of prefixes that are reachable from AS-x; and AS-y sends AS-x the list of prefixes that are reachable from AS-y.
    2. When a gateway router receives eBGP-learned prefixes, the gateway router uses its iBGP sessions to distribute the prefixes to the other routers in the AS.

Path Attributes and BGP Routes

  • In BGP, an autonomous system is identified by its globally unique autonomous system number(ASN) that are assigned by ICANN regional registries.(Technically, not every AS has an ASN. A stub AS that carries only traffic for which it is a source or destination will not have an ASN; we ignore this in our discussion.)
  • When a router advertises a prefix across a BGP session, it includes with the prefix a number of BGP attributes. In BGP jargon, a prefix along with its attributes is called a route. Two of important attributes are AS-PATH and NEXT-HOP:
  • AS-PATH. This attribute contains the ASs through which the advertisement for the prefix has passed. When a prefix is passed into an AS, the AS adds its ASN to the AS-PATH attribute.
  • The NEXT-HOP is the router interface that begins the AS-PATH.

  • Figure 4.41 illustrates another situation where the NEXT-HOP is needed. In this figure, AS1 and AS2 are connected by two peering links. A router in AS1 could learn about two different routes to the same prefix x. These two routes could have the same AS-PATH to x, but could have different NEXT-HOP values corresponding to the different peering links. Using the NEXT-HOP values and the intra-AS routing algorithm, the router can determine the cost of the path to each peering link, and then apply hot-potato routing(Section 4.5.3) to determine the appropriate interface.
  • When a gateway router receives a route advertisement, it uses its import policy to decide whether to accept or filter the route and whether to set certain attributes. The import policy may filter a route because the AS may not want to send traffic over one of the ASs in the route’s AS-PATH or because it already knows of a preferable route to the same prefix.

BGP Route Selection

  • BGP uses eBGP and iBGP to distribute routes to all the routers within ASs. From this distribution, a router may learn about more than one route to any one prefix, so the router must select one of the possible routes.
  • The input into this route selection process is the set of all routes that have been learned and accepted by the router. If there are two or more routes to the same prefix, then BGP sequentially invokes the following rules until one route remains:
  • Routes are assigned a local preference value as one of their attributes. The local preference of a route could have been set by the router or could have been learned by another router in the same AS. This is a policy decision left up to the AS’s network administrator. The routes with the highest local preference values are selected.
  • From the remaining routes(all with the same local preference value), the route with the shortest AS-PATH is selected. If this rule were the only rule for route selection, then BGP would be using a DV algorithm for path determination, where the distance metric uses the number of AS hops rather than the number of router hops.
  • From the remaining routes(all with the same local preference value and the same AS-PATH length), the route with the closest NEXT-HOP router is selected. Closest means the router for which the cost of the least-cost path, determined by the intra-AS algorithm, is the smallest. In Section 4.5.3, this process is called hot-potato routing.
  • If more than one route still remains, the router uses BGP identifiers to select the route.

How does an entry get into a router’s forwarding table?

  • An entry in a router’s forwarding table consists of a prefix and a corresponding router output port. When a packet arrives to the router, the packet’s destination IP address is compared with the prefixes in the forwarding table to find the one with the longest prefix match. The packet is then forwarded to the router port associated with that prefix.
  • Let’s summarize how a routing entry(prefix, associated port) gets entered into a forwarding table. Assume that the prefix does not belong to the router’s AS but to some other AS.
  • In order for a prefix to get entered into the router’s forwarding table, the router has to first become aware of the prefix(corresponding to a subnet or an aggregation of subnets) via a BGP route advertisement. Such an advertisement may be sent to it over an eBGP session(from a router in another AS) or over an iBGP session(from a router in the same AS).
  • After the router becomes aware of the prefix, it needs to determine the output port to which datagrams destined to that prefix will be forwarded. If the router receives more than one route advertisement for this prefix, the router uses the BGP route selection process to find the route for the prefix. Such a route has been selected and it includes a NEXT-HOP attribute, which is the IP address of the first router outside the router’s AS along this best route.
  • The router then uses its intra-AS routing protocol to determine the shortest path to the NEXT-HOP router.
  • The router finally determines the port number to associate with the prefix by identifying the first link along that shortest path. The router can then enter the prefix-port pair into its forwarding table.
  • The forwarding table computed by the routing processor(Figure 4.6) is then pushed to the router’s input port line cards.

Routing Policy

  • Figure 4.42 shows six interconnected ASs: A, B, C, W, X, and Y. Assume that W, X and Y are stub(customer) networks and A, B and C are provider networks. Also assume that A, B, and C, all peer with each other, and provide full BGP information to their customer networks.(A stub AS carries only traffic for which it is a source or destination.)
  • All traffic entering a stub network must be destined for that network, and all traffic leaving a stub network must have originated in that network. X is a multihomed stub network since it is connected to the rest network via two different providers. Like W and Y, X itself must be the source/destination of all traffic leaving/entering X.
  • How will this stub network behavior be implemented and enforced(prevent X from forwarding traffic between B and C)?
  • Suppose that B has learned from A that A has a path AW to W. B can install the route BAW into its routing information base. B also wants to advertise the path BAW to its customer, X, so that X knows that it can route to W via B. But should B advertise the path BAW to C? If it does so, then C could route traffic to W via CBAW. If A, B, and C are all backbone providers, then B should not have to shoulder the burden of carrying transit traffic between A and C.
  • There are currently no official standards that govern how backbone ISPs route among themselves. A rule of thumb is that any traffic flowing across an ISP’s backbone network must have either a source or a destination(or both) in a network that is a customer of that ISP; otherwise the traffic would be getting a free ride on the ISP’s network.

Why are different inter-AS and intra-AS routing protocols used?

  • The answer is the heart of the differences between the goals of routing within an AS and among ASs.
  • Policy.
  • Scale.
  • Performance.

4.7 Broadcast and Multicast Routing

  • Broadcast routing: the network layer delivers a packet sent from a source node to all other nodes in the network.

4.7.1 Broadcast Routing Algorithms

  • Figure 4.43(a): the sending node sends a separate copy of the packet to each destination.
  • Inefficiency. If the source node is connected to the rest of the network via a single link, then N separate copies of the same packet will traverse this single link. It is more efficient for the network nodes themselves rather than just the source node to create duplicate copies of a packet(Figure 4.43(b)).
  • How are the broadcast recipients’ addresses known to the sender? Most likely, additional protocol mechanisms(such as a destination-registration protocol) would be required. This would add complexity to a protocol that initially simple
  • A final drawback relates to the purposes for which broadcast is to be used. Link-state routing protocols(Section 4.5) use broadcast to disseminate the link-state information that is used to compute unicast routes. So, in situations where broadcast is used to create and update unicast routes, it would be unwise to rely on the unicast routing infrastructure to achieve broadcast.

Uncontrolled Flooding

  • The source node sends a copy of the packet to all of its neighbors. When a node receives a broadcast packet, it duplicates the packet and forwards it to all of its neighbors except the neighbor from which it received the packet. If the graph is connected, this scheme will eventually deliver a copy of the broadcast packet to all nodes in the graph.
  • Fatal flaw: if the graph has cycles, then one or more copies of each broadcast packet will cycle indefinitely.
  • What’s worse, when a node is connected to more than two other nodes, it will create and forward multiple copies of the broadcast packet, each of which will create multiple copies of itself at other nodes with more than two neighbors, and so on. This broadcast storm, resulting from the endless multiplication of broadcast packets, would eventually result in so many broadcast packets being created that the network would be rendered useless.

Controlled Flooding

  • The key to avoiding a broadcast storm is for a node to choose when to flood a packet and when not to flood a packet. In practice, this can be done in one of several ways.
  • Sequence-number-controlled flooding:
  • A source node puts its address and a broadcast sequence number into a broadcast packet, then sends the packet to all of its neighbors.
  • Each node maintains a list of the source address and sequence number of each broadcast packet it has already received, duplicated, and forwarded. When a node receives a broadcast packet, it first checks whether the packet is in this list. If so, the packet is dropped; if not, the packet is duplicated and forwarded to all the node’s neighbors(except the node from which the packet has just been received).
  • Reverse path forwarding(RPF) or Reverse path broadcast(RPB).

  • Figure 4.44 illustrates RPF. Suppose that the links drawn with thick lines represent the least-cost paths from the receivers to the source(A).

Spanning-Tree Broadcast

  • While sequence-number-controlled flooding and RPF avoid broadcast storms, they do not avoid the transmission of redundant broadcast packets. Ideally, every node should receive only one copy of the broadcast packet.

  • A spanning tree of a graph G =(N,E) is a graph GЈ =(N,EЈ) such that EЈ is a subset of E, GЈ is connected, GЈ contains no cycles, and GЈ contains all the original nodes in G. If each link has an associated cost and the cost of a tree is the sum of the link costs, then a spanning tree whose cost is the minimum of all of the graph’s spanning trees is called a minimum spanning tree.
    1. Construct a spanning tree.
    2. A source node send the broadcast packet out on all of the incident links that belong to the spanning tree.
    3. A node receiving a broadcast packet then forwards the packet to all its neighbors in the spanning tree(except the neighbor from which it received the packet).
  • Advantages:
  • Eliminate redundant broadcast packets.
  • Can be used by any node to begin a broadcast(Figures 4.45(a) and (b)).
  • In the center-based approach to building a spanning tree, a center node(also known as a rendezvous point or a core) is defined. Nodes then unicast tree-join messages addressed to the center node. A tree-join message is forwarded using unicast routing toward the center until it either arrives at a node that already belongs to the spanning tree or arrives at the center. In either case, the path that the tree-join message has followed defines the branch of the spanning tree between the edge node that initiated the tree-join message and the center. One can think of this new path as being grafted onto the existing spanning tree.

  • Figure 4.46 illustrates the construction of a center-based spanning tree. Suppose that node E is selected as the center of the tree. Suppose that node F first joins the tree and forwards a tree-join message to E. The single link EF becomes the initial spanning tree. Node B then joins the spanning tree by sending its tree-join message to E. Suppose that the unicast path route to E from B is via D. In this case, the tree-join message results in the path BDE being grafted onto the spanning tree. Node A next joins the spanning group by forwarding its tree-join message towards E. If A’s unicast path to E is through B, then since B has already joined the spanning tree, the arrival of A’s tree-join message at B will result in the AB link being immediately grafted onto the spanning tree. Node C joins the spanning tree next by forwarding its tree-join message directly to E. Finally, because the unicast routing from G to E must be via node D, when G sends its tree-join message to E, the GD link is grafted onto the spanning tree at node D.

4.7.2 Multicast

  • In multicast service, a multicast packet is delivered to a subset of network nodes. Some network applications require the delivery of packets from one or more senders to a group of receivers: bulk data transfer(e.g., the transfer of a software upgrade from the software developer to users needing the upgrade), shared data applications (e.g., a whiteboard or teleconferencing application that is shared among many distributed participants).
  • In multicast communication, we faced with two problems: how to identify the receivers of a multicast packet and how to address a packet sent to these receivers.
  • In the Internet, a multicast packet is addressed using address indirection. That is, a single identifier is used for the group of receivers, and a copy of the packet that is addressed to the group using this single identifier is delivered to all of the multicast receivers associated with that group.
  • The single identifier that represents a group of receivers is a class D multicast IP address. The group of receivers associated with a class D address is referred to as a multicast group. Figure 4.47.

  • Four hosts(shown in shaded color) are associated with the multicast group address of 226.17.30.197 and will receive all datagrams addressed to that multicast address. The difficulty is that each host has a unique IP unicast address that is independent of the address of the multicast group in which it is participating. For the Internet, the solution involve the Internet Group Management Protocol [RFC 3376].

Internet Group Management Protocol

  • The IGMP protocol version 3 [RFC 3376] operates between a host and its directly attached router(the first-hop router that a host would see on a path to any other host outside its own local network, or the last-hop router on any path to that host), shown in Figure 4.48.

  • Figure 4.48 shows three first-hop multicast routers, each connected to its attached hosts via one outgoing local interface. This local interface is attached to a LAN in this example, and while each LAN has multiple attached hosts, at most a few of these hosts will typically belong to a given multicast group at any given time.
  • IGMP provides the means for a host to inform its attached router that an application running on the host wants to join a specific multicast group. Given that the IGMP interaction is limited to a host and its attached router, another protocol is required to coordinate the multicast routers(including the attached routers) throughout the Internet, so that multicast datagrams are routed to their final destinations. This latter functionality is accomplished by network-layer multicast routing algorithms. So, Network-layer multicast in the Internet consists of two complementary components: IGMP and multicast routing protocols.
  • IGMP has only three message types. Like ICMP, IGMP messages are encapsulated within an IP datagram, with an IP protocol number of 2.
  • The membership_query message is sent by a router to all hosts on an attached interface(e.g., to all hosts on a local area network) to determine the set of all multicast groups that have been joined by the hosts on that interface.
  • Hosts respond to membership_query message with an IGMP membership_report message. This messages can also be generated by a host when an application first joins a multicast group without waiting for a membership_query message from the router.
  • The final type of IGMP message is the leave_group message. This message is optional. But if it is optional, how does a router detect when a host leaves the multicast group? The answer is that the router infers that a host is no longer in the multicast group if it no longer responds to a membership_query message with the given group address.
  • This is an example of what is called soft state in an Internet protocol. In a soft state protocol, the state(in this case of IGMP, the fact that there are hosts joined to a given multicast group) is removed via a timeout event(in this case, via a periodic membership_query message from the router) if it is not explicitly refreshed(in this case, by a membership_report message from an attached host).
  • Soft-state protocols result in simpler control than hard-state protocols, which not only require state to be explicitly added and removed, but also require mechanisms to recover from the situation where the entity responsible for removing state has terminated prematurely or failed.

Multicast Routing Algorithms

  • The multicast routing problem is illustrated in Figure 4.49. Hosts joined to the multicast group and their immediately attached router are shaded in color. Only routers A, B, E, and F need to receive the multicast traffic.
  • The goal of multicast routing is to find a tree of links that connects all of the routers that have attached hosts belonging to the multicast group. Multicast packets will then be routed along this tree from the sender to all of the hosts belonging to the multicast tree. The tree may contain routers that do not have attached hosts belonging to the multicast group.
  • Two approaches are adopted for determining the multicast routing tree. They differ according to whether a single group-shared tree is used to distribute the traffic for all senders in the group, or whether a source-specific routing tree is constructed for each individual sender.
  • Multicast routing using a group-shared tree. As in spanning-tree broadcast, multicast routing over a group-shared tree is based on building a tree that includes all edge routers with attached hosts belonging to the multicast group.
  • Multicast routing using a source-based tree. This approach constructs a multicast routing tree for each source in the multicast group. An RPF algorithm(with source node x) is used to construct a multicast forwarding tree for multicast datagrams originating at source x.

  • The RPF broadcast algorithm we studied earlier requires tweaking for use in multicast. Consider router D in Figure 4.50. Under broadcast RPF, it would forward packets to router G, even though router G has no attached hosts that are joined to the multicast group. While this is not so bad for this case where D has only a single downstream router, G, imagine what would happen if there were thousands of routers downstream from D. Each of these thousands of routers would receive unwanted multicast packets.
  • The solution to the problem of receiving unwanted multicast packets under RPF is known as pruning. A multicast router that receives multicast packets and has no attached hosts joined to that group will send a prune message to its upstream router. If a router receives prune messages from each of its downstream routers, then it can forward a prune message upstream.

4.8 Summary

Please indicate the source: http://blog.csdn.net/gaoxiangnumber1

Welcome to my github: https://github.com/gaoxiangnumber1

最后

以上就是听话飞机为你收集整理的4-The Network Layer的全部内容,希望文章能够帮你解决4-The Network Layer所遇到的程序开发问题。

如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。

本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
点赞(56)

评论列表共有 0 条评论

立即
投稿
返回
顶部