## ICMP version 4

Sometimes a router or the destination host has to inform the sender of the packet of a problem that occurred while processing that packet. In the TCP/IP protocol suite, this reporting is done by the Internet Control Message Protocol (ICMP). How are these messages generated by the ICMP?

##### 5.2.2 ICMP version 4

It is sometimes necessary for intermediate routers or the destination host to inform the sender of the packet of a problem that occurred while processing a packet. In the TCP/IP protocol suite, this reporting is done by the Internet Control Message Protocol (ICMP). ICMP is defined in RFC 792. ICMP messages are carried as the payload of IP packets (the protocol value reserved for ICMP is 1). An ICMP message is composed of an 8 byte header and a variable length payload that usually contains the first bytes of the packet that triggered the transmission of the ICMP message.

Figure 5.31: ICMP version 4 ( RFC 792)

In the ICMP header, the Type and Code fields indicate the type of problem that was detected by the sender of the ICMP message. The Checksum protects the entire ICMP message against transmission errors and the Data field contains additional information for some ICMP messages.

The main types of ICMP messages are:

• Destination unreachable: a Destination unreachable ICMP message is sent when a packet cannot be delivered to its destination due to routing problems. Different types of unreachability are distinguished:
• Network unreachable: this ICMP message is sent by a router that does not have a route for the subnet containing the destination address of the packet
• Host unreachable: this ICMP message is sent by a router that is attached to the subnet that contains the destination address of the packet, but this destination address cannot be reached at this time
• Protocol unreachable: this ICMP message is sent by a destination host that has received a packet, but does not support the transport protocol indicated in the packet’s Protocol field
• Port unreachable: this ICMP message is sent by a destination host that has received a packet destined to a port number, but no server process is bound to this port
• Fragmentation needed: this ICMP message is sent by a router that receives a packet with the Don’t Fragment flag set that is larger than the MTU of the outgoing interface
• Redirect: this ICMP message can be sent when there are two routers on the same LAN. Consider a LAN with one host and two routers: R1 and R2. Assume that R1 is also connected to subnet 130.104.0.0/16 while R2 is connected to subnet 138.48.0.0/16. If a host on the LAN sends a packet towards 130.104.1.1 to R2, R2 needs to forward the packet again on the LAN to reach R1. This is not optimal as the packet is sent twice on the same LAN. In this case, R2 could send an ICMP Redirect message to the host to inform it that it should have sent the packet directly to R1. This allows the host to send the other packets to 130.104.1.1 directly via R1.
• Parameter problem: this ICMP message is sent when a router or a host receives an IP packet containing an error (e.g. an invalid option)

Figure 5.32: ICMP redirect

• Source quench: a router was supposed to send this message when it had to discard packets due to congestion. However, sending ICMP messages in case of congestion was not the best way to reduce congestion and since the inclusion of a congestion control scheme in TCP, this ICMP message has been deprecated.
• Time Exceeded: there are two types of Time Exceeded ICMP messages
• TTL exceeded: a TTL exceeded message is sent by a router when it discards an IPv4 packet because its TTL reached 0.
• Reassembly time exceeded: this ICMP message is sent when a destination has been unable to reassemble all the fragments of a packet before the expiration of its reassembly timer.
• Echo request and Echo reply: these ICMP messages are used by the ping(8) network debug- ging software.

Note: Redirection attacks

ICMP redirect messages are useful when several routers are attached to the same LAN as hosts. However, they should be used with care as they also create an important security risk. One of the most annoying attacks in an IP network is called the man in the middle attack. Such an attack occurs if an attacker is able to receive, process, possibly modify and forward all the packets exchanged between a source and a destination. As the attacker receives all the packets it can easily collect passwords or credit card numbers or even inject fake information in an established TCP connection. ICMP redirects unfortunately enable an attacker to easily perform such an attack. In the figure above, consider host H that is attached to the same LAN as A and R1. If H sends to A an ICMP redirect for prefix 138.48.0.0/16, A forwards to H all the packets that it wants to send to this prefix. H can then forward them to R2. To avoid these attacks, hosts should ignore the ICMP redirect messages that they receive.

ping(8) is often used by network operators to verify that a given IP address is reachable. Each host is supposed 10 to reply with an ICMP Echo reply message when its receives an ICMP Echo request message. A sample usage of ping(8) is shown below.

ping 130.104.1.1PING 130.104.1.1 (130.104.1.1): 56 data bytes64 bytes from 130.104.1.1: icmp_seq=0 ttl=243 time=19.961 ms64 bytes from 130.104.1.1: icmp_seq=1 ttl=243 time=22.072 ms64 bytes from 130.104.1.1: icmp_seq=2 ttl=243 time=23.064 ms64 bytes from 130.104.1.1: icmp_seq=3 ttl=243 time=20.026 ms64 bytes from 130.104.1.1: icmp_seq=4 ttl=243 time=25.099 ms--- 130.104.1.1 ping statistics ---5 packets transmitted, 5 packets received, 0% packet lossround-trip min/avg/max/stddev = 19.961/22.044/25.099/1.938 ms

Another very useful debugging tool is traceroute(8). The traceroute man page describes this tool as “print the route packets take to network host”. traceroute uses the TTL exceeded ICMP messages to discover the inter- mediate routers on the path towards a destination. The principle behind traceroute is very simple. When a router receives an IP packet whose TTL is set to 1 it decrements the TTL and is forced to return to the sending host a TTL exceeded ICMP message containing the header and the first bytes of the discarded IP packet. To discover all routers on a network path, a simple solution is to first send a packet whose TTL is set to 1, then a packet whose TTL is set to 2, etc. A sample traceroute output is shown below.

traceroute www.ietf.orgtraceroute to www.ietf.org (64.170.98.32), 64 hops max, 40 byte packets1 CsHalles3.sri.ucl.ac.be (192.168.251.230) 5.376 ms 1.217 ms 1.137 ms2 CtHalles.sri.ucl.ac.be (192.168.251.229) 1.444 ms 1.669 ms 1.301 ms3 CtPythagore.sri.ucl.ac.be (130.104.254.230) 1.950 ms 4.688 ms 1.319 ms4 fe.m20.access.lln.belnet.net (193.191.11.9) 1.578 ms 1.272 ms 1.259 ms5 10ge.cr2.brueve.belnet.net (193.191.16.22) 5.461 ms 4.241 ms 4.162 ms6 212.3.237.13 (212.3.237.13) 5.347 ms 4.544 ms 4.285 ms7 ae-11-11.car1.Brussels1.Level3.net (4.69.136.249) 5.195 ms 4.304 ms 4.329 ms8 ae-6-6.ebr1.London1.Level3.net (4.69.136.246) 8.892 ms 8.980 ms 8.830 ms9 ae-100-100.ebr2.London1.Level3.net (4.69.141.166) 8.925 ms 8.950 ms 9.006 ms10 ae-41-41.ebr1.NewYork1.Level3.net (4.69.137.66) 79.590 ms            ae-43-43.ebr1.NewYork1.Level3.net (4.69.137.74) 78.140 ms            ae-42-42.ebr1.NewYork1.Level3.net (4.69.137.70) 77.663 ms11 ae-2-2.ebr1.Newark1.Level3.net (4.69.132.98) 78.290 ms 83.765 ms 90.006 ms12 ae-14-51.car4.Newark1.Level3.net (4.68.99.8) 78.309 ms 78.257 ms 79.709 ms13 ex1-tg2-0.eqnwnj.sbcglobal.net (151.164.89.249) 78.460 ms 78.452 ms 78.292 ms14 151.164.95.190 (151.164.95.190) 157.198 ms 160.767 ms 159.898 ms15 ded-p10-0.pltn13.sbcglobal.net (151.164.191.243) 161.872 ms 156.996 ms 159.425 ms16 AMS-1152322.cust-rtr.swbell.net (75.61.192.10) 158.735 ms 158.485 ms 158.588 ms17 mail.ietf.org (64.170.98.32) 158.427 ms 158.502 ms 158.567 ms

The above traceroute(8) output shows a 17 hops path between a host at UCLouvain and one of the main IETF servers. For each hop, traceroute provides the IPv4 address of the router that sent the ICMP message and the measured round-trip-time between the source and this router. traceroute sends three probes with each TTL value. In some cases, such as at the 10th hop above, the ICMP messages may be received from different addresses. This is usually because different packets from the same source have followed different paths 11 in the network.

Another important utilisation of ICMP messages is to discover the maximum MTU that can be used to reach a destination without fragmentation. As explained earlier, when an IPv4 router receives a packet that is larger than the MTU of the outgoing link, it must fragment the packet. Unfortunately, fragmentation is a complex operation and routers cannot perform it at line rate [KM1995]. Furthermore, when a TCP segment is transported in an IP packet that is fragmented in the network, the loss of a single fragment forces TCP to retransmit the entire segment (and thus all the fragments). If TCP was able to send only packets that do not require fragmentation in the network, it could retransmit only the information that was lost in the network. In addition, IP reassembly causes several challenges at high speed as discussed in RFC 4963. Using IP fragmentation to allow UDP applications to exchange large messages raises several security issues [KPS2003].

ICMP, combined with the Don’t fragment (DF) IPv4 flag, is used by TCP implementations to discover the largest MTU size that is allowed to reach a destination host without causing network fragmentation. This is the Path MTU discovery mechanism defined in RFC 1191. A TCP implementation that includes Path MTU discovery (most do) requests the IPv4 layer to send all segments inside IPv4 packets having the DF flag set. This prohibits intermediate routers from fragmenting these packets. If a router needs to forward an unfragmentable packet over a link with a smaller MTU, it returns a Fragmentation needed ICMP message to the source, indicating the MTU of its outgoing link. This ICMP message contains in the MTU of the router’s outgoing link in its Data field. Upon reception of this ICMP message, the source TCP implementation adjusts its Maximum Segment Size (MSS) so that the packets containing the segments that it sends can be forwarded by this router without requiring fragmentation.

Interactions between IPv4 and the datalink layer

As mentioned in the first section of this chapter, there are three main types of datalink layers: point-to-point links, LANs supporting broadcast and multicast and NBMA networks. There are two important issues to be addressed when using IPv4 in these types of networks. The first issue is how an IPv4 device obtains its IPv4 address. The second issue is how IPv4 packets are exchanged over the datalink layer service.

On a point-to-point link, the IPv4 addresses of the communicating devices can be configured manually or by using a simple protocol. IPv4 addresses are often configured manually on point-to-point links between routers. When point-to-point links are used to attach hosts to the network, automatic configuration is often preferred in order to avoid problems with incorrect IPv4 addresses. For example, the PPP, specified in RFC 1661, includes an IP network control protocol that can be used by the router in the figure below to send the IPv4 address that the attached host must configure for its interface. The transmission of IPv4 packets on a point-to-point link will be discussed in chapter chap:lan.

Figure 5.33: IPv4 on point-to-point links

Figure 5.34: A simple LAN

In a simple network such as the one shown above, it could be possible to manually configure the mapping between the IPv4 addresses of the hosts and the corresponding datalink layer addresses. However, in a larger LAN this is impossible. To ease the utilisation of LANs, IPv4 hosts must be able to automatically obtain the datalink layer address corresponding to any IPv4 address on the same LAN. This is the objective of the Address Resolution Protocol (ARP) defined in RFC 826. ARP is a datalink layer protocol that is used by IPv4. It relies on the ability of the datalink layer service to easily deliver a broadcast frame to all devices attached to the same LAN.

Note: Security issues with the Address Resolution Protocol

ARP is an old and widely used protocol that was unfortunately designed when security issues were not a concern. ARP is almost insecure by design. Hosts using ARP can be subject to several types of attack. First, a malicious host could create a denial of service attack on a LAN by sending random replies to the received ARP queries. This would pollute the ARP cache of the other hosts on the same LAN. On a fixed network, such attacks can be detected by the system administrator who can physically remove the malicious hosts from the LAN. On a wireless network, removing a malicious host is much more difficult.

A second type of attack are the man-in-the-middle attacks. This name is used for network attacks where the attacker is able to read and possibly modify all the messages sent by the attacked devices. Such an attack is possible in a LAN. Assume, in the figure above, that host 10.0.1.9 is malicious and would like to receive and modify all the packets sent by host 10.0.1.22 to host 10.0.1.8. This can be achieved easily if host 10.0.1.9 manages, by sending fake ARP replies, to convince host 10.0.1.22 (resp. 10.0.1.8) that its own datalink layer address must be used to reach 10.0.1.8 (resp. 10.0.1.22).

ARP is used by all devices that are connected to a LAN and implement IPv4. Both routers and endhosts implement ARP. When a host needs to send an IPv4 packet to a destination outside of its local subnet, it must first send the packet to one of the routers that reside on this subnet. Consider for example the network shown in the figure below. Each host is configured with an IPv4 address in the 10.0.1.0/24 subnet and uses 10.0.1.1 as its default router. To send a packet to address 1.2.3.4, host 10.0.1.8 will first need to know the datalink layer of the default router. It will thus send an ARP request for 10.0.1.1. Upon reception of the ARP reply, host 10.0.1.8 updates its ARP table and sends its packet in a frame to its default router. The router will then forward the packet towards its final destination.

Figure 5.35: A simple LAN with a router

In the early days of the Internet, IP addresses were manually configured on both hosts and routers and almost never changed. However, this manual configuration can be complex 14 and often causes errors that are sometimes diffi- cult to debug. Recent TCP/IP implementations are able to detect some of these misconfigurations. For example, if two hosts are attached to the same subnet with the same IPv4 address they will be unable to communicate. To detect this problem hosts send an ARP request for their configured address each time their addressed is changed RFC 5227. If they receive an answer to this ARP request, they trigger an alarm or inform the system administrator.

In an NBMA network, the interactions between IPv4 and the datalink layer are more complex as the ARP protocol cannot be used as in a LAN. Such NBMA networks use special servers that store the mappings between IP ad- dresses and the corresponding datalink layer address. Asynchronous Transfer Mode (ATM) networks for example can use either the ATMARP protocol defined in RFC 2225 or the NextHop Resolution Protocol (NHRP) defined in RFC 2332. ATM networks are less frequently used today and we will not describe the detailed operation of these servers.

Operation of IPv4 devices

At this point of the description of IPv4, it is useful to have a detailed look at how an IPv4 implementation sends, receives and forwards IPv4 packets. The simplest case is when a host needs to send a segment in an IPv4 packet. The host performs two operations. First, it must decide on which interface the packet will be sent. Second it must create the corresponding IP packet(s).

To simplify the discussion in this section, we ignore the utilisation of IPv4 options. This is not a severe limitation as today IPv4 packets rarely contain options. Details about the processing of the IPv4 options may be found in the relevant RFCs, such as RFC 791. Furthermore, we also assume that only point-to-point links are used. We defer the explanation of the operation of IPv4 over Local Area Networks until the next chapter.

An IPv4 host having n datalink layer interfaces manages n + 1 IPv4 addresses:

• the 127.0.0.1/32 IPv4 address assigned by convention to its loopback address
• one A.B.C.D/p IPv4 address assigned to each of its n datalink layer interfaces

Such a host maintains a routing table containing one entry for its loopback address and one entry for each subnet identifier assigned to its interfaces. Furthermore, the host usually uses one of its interfaces as the default interface when sending packets that are not addressed to a directly connected destination. This is represented by the default route: 0.0.0.0/0 that is associated to one interface.

When a transport protocol running on the host requests the transmission of a segment, it usually provides the IPv4 destination address to the IPv4 layer in addition to the segment 16. The IPv4 implementation first performs a longest prefix match with the destination address in its routing table. The lookup returns the identification of the interface that must be used to send the packet. The host can then create the IPv4 packet containing the segment. The source IPv4 address of the packet is the IPv4 address of the host on the interface returned by the longest prefix match. The Protocol field of the packet is set to the identification of the local transport protocol which created the segment. The TTL field of the packet is set to the default TTL used by the host. The host must now choose the packet’s Identification. This Identification is important if the packet becomes fragmented in the network, as it ensures that the destination is able to reassemble the received fragments. Ideally, a sending host should never send a packet twice with the same Identification to the same destination host, in order to ensure that all fragments are correctly reassembled by the destination. Unfortunately, with a 16 bits Identification field and an expected MSL of 2 minutes, this implies that the maximum bandwidth to a given destination is limited to roughly 286 Mbps. With a more realistic 1500 bytes MTU, that bandwidth drops to 6.4 Mbps RFC 4963 if fragmentation must be possible 17. This is very low and is another reason why hosts are highly encouraged to avoid fragmentation. If; despite all of this, the MTU of the outgoing interface is smaller than the packet’s length, the packet is fragmented. Finally, the packet’s checksum is computed before transmission.

When a host receives an IPv4 packet destined to itself, there are several operations that it must perform. First, it must check the packet’s checksum. If the checksum is incorrect, the packet is discarded. Then, it must check whether the packet has been fragmented. If yes, the packet is passed to the reassembly algorithm described earlier. Otherwise, the packet must be passed to the upper layer. This is done by looking at the Protocol field (6 for TCP, 17 for UDP). If the host does not implement the transport layer protocol corresponding to the received Protocol field, it sends a Protocol unreachable ICMP message to the sending host. If the received packet contains an ICMP message (Protocol field set to 1), the processing is more complex. An Echo-request ICMP message triggers the transmission of an ICMP Echo-reply message. The other types of ICMP messages indicate an error that was caused by a previously transmitted packet. These ICMP messages are usually forwarded to the transport protocol that sent the erroneous packet. This can be done by inspecting the contents of the ICMP message that includes the header and the first 64 bits of the erroneous packet. If the IP packet did not contain options, which is the case for most IPv4 packets, the transport protocol can find in the first 32 bits of the transport header the source and destination ports to determine the affected transport flow. This is important for Path MTU discovery for example.

When a router receives an IPv4 packet, it must first check the packet’s checksum. If the checksum is invalid, it is discarded. Otherwise, the router must check whether the destination address is one of the IPv4 addresses assigned to the router. If so, the router must behave as a host and process the packet as described above. Although routers mainly forward IPv4 packets, they sometimes need to be accessed as hosts by network operators or network management software. If the packet is not addressed to the router, it must be forwarded on an outgoing interface according to the router’s routing table. The router first decrements the packet’s TTL. If the TTL reaches 0, a TTL Exceeded ICMP message is sent back to the source. As the packet header has been modified, the checksum must be recomputed. Fortunately, as IPv4 uses an arithmetic checksum, a router can incrementally update the packet’s checksum as described in RFC 1624. Then, the router performs a longest prefix match for the packet’s destination address in its forwarding table. If no match is found, the router must return a Destination unreachable ICMP message to the source. Otherwise, the lookup returns the interface over which the packet must be forwarded. Before forwarding the packet over this interface, the router must first compare the length of the packet with the MTU of the outgoing interface. If the packet is smaller than the MTU, it is forwarded. Otherwise, a Fragmentation needed ICMP message is sent if the DF flag was sent or the packet is fragmented if the DF was not set.

Note: Longest prefix match in IP routers

Performing the longest prefix match at line rate on routers requires highly tuned data structures and algorithms. Consider for example an implementation of the longest match based on a Radix tree on a router with a 10 Gbps link. On such a link, a router can receive 31,250,000 40 bytes IPv4 packets every second. To forward the packets at line rate, the router must process one IPv4 packet every 32 nanoseconds. This cannot be achieved by a software implementation. For a hardware implementation, the main difficulty lies in the number of memory accesses that are necessary to perform the longest prefix match. 32 nanoseconds is very small compared to the memory accesses that are required by a naive longest prefix match implement. Additional information about faster longest prefix match algorithms may be found in [Varghese2005].