IP traceback is any method for reliably determining the origin of a packet on the Internet. The IP protocol does not provide for the authentication of the source IP address of an IP packet, enabling the source address to be falsified in a strategy called IP address spoofing, and creating potential internet security and stability problems.

Use of false source IP addresses allows denial-of-service attacks (DoS) or one-way attacks (where the response from the victim host is so well known that return packets need not be received to continue the attack). IP traceback is critical for identifying sources of attacks and instituting protection measures for the Internet. Most existing approaches to this problem have been tailored toward DoS attack detection. Such solutions require high numbers of packets to converge on the attack path(s).

Probabilistic packet marking

Savage et al. suggested probabilistically marking packets as they traverse routers through the Internet. They propose that the router mark the packet with either the router’s IP address or the edges of the path that the packet traversed to reach the router.

For the first alternative, marking packets with the router's IP address, analysis shows that in order to gain the correct attack path with 95% accuracy as many as 294,000 packets are required. The second approach, edge marking, requires that the two nodes that make up an edge mark the path with their IP addresses along with the distance between them. This approach would require more state information in each packet than simple node marking but would converge much faster. They suggest three ways to reduce the state information of these approaches into something more manageable.

Accordingly, Song and Perrig propose the following traceback scheme: instead of encoding the IP address interleaved with a hash, they suggest encoding the IP address into an 11-bit hash and maintain a 5-bit hop count, both stored in the 16-bit fragment ID field. This is based on the observation that a 5-bit hop count (32 max hops) is sufficient for almost all Internet routes. Further, they suggest that two different hashing functions be used so that the order of the routers in the markings can be determined. Next, if any given hop decides to mark it first checks the distance field for a 0, which implies that a previous router has already marked it. If this is the case, it generates an 11-bit hash of its own IP address and then XORs it with the previous hop. If it finds a non-zero hop count it inserts its IP hash, sets the hop count to zero and forwards the packet on. If a router decides not to mark the packet it merely increments the hop count in the overloaded fragment id field.

These bounds were improved by Adler, Edmonds, and Matoušek. For a single attack path, they proved that any memoryless PPM scheme requires at least <math>\Omega\!\big(2^{b}\cdot 2^{\,2n/(2^{b}-1)}\big)</math> packets. They also gave a near-matching (“quasi-optimal”) upper bound: in a model with <math>m</math> routers where each router chooses one of <math>s</math> possible messages (<math>n = m\cdot \log_{2}(s)</math>), they show that <math>O\big(m\cdot 2^{2m\log_{2}(s)/(2^{b}-1)}\big)</math> packets suffice. Thus, the true bound is roughly the square root of the upper bound originally provided by Adler, and roughly the square of his lower bound. The paper also presents a new protocol and analysis for multi-path attacks, generalizing the scenarios in which PPM remains effective.

Deterministic packet marking

Belenky and Ansari, outline a deterministic packet marking scheme. They describe a more realistic topology for the Internet – that is composed of LANs and ASs with a connective boundary – and attempt to put a single mark on inbound packets at the point of network ingress. Their idea is to put, with random probability of .5, the upper or lower half of the IP address of the ingress interface into the fragment id field of the packet, and then set a reserve bit indicating which portion of the address is contained in the fragment field. By using this approach they claim to be able to obtain 0 false positives with .99 probability after only 7 packets.

Rayanchu and Barua provide another spin on this approach (called DERM). Their approach is similar in that they wish to use and encoded IP address of the input interface in the fragment id field of the packet. Where they differ from Belenky and Ansari is that they wish to encode the IP address as a 16-bit hash of that IP address. Initially they choose a known hashing function. They state that there would be some collisions if there were greater than 2^16 edge routers doing the marking.

They attempt to mitigate the collision problem by introducing a random distributed selection of a hash function from the universal set, and then applying it to the IP address. In either hashing scenario, the source address and the hash are mapped together in a table for later look-up along with a bit indicating which portion of the address they have received. Through a complicated procedure and a random hash selection, they are capable of reducing address collision. By using a deterministic approach they reduce the time for their reconstruction procedure for their mark (the 16-bit hash). However, by encoding that mark through hashing they introduce the probability of collisions, and thus false-positives.

S. Majumdar, D. Kulkarni and C. Ravishankar proposes a new method to traceback the origin of DHCP packets in ICDCN 2011. Their method adds a new DHCP option that contains the MAC address and the ingress port of the edge switch which had received the DHCP packet. This new option will be added to the DHCP packet by the edge switch. This solution follows DHCP RFCs. Previous IP traceback mechanisms have overloaded IP header fields with traceback information and thus are violating IP RFCs. Like other mechanisms, this paper also assumes that the network is trusted. The paper presents various performance issues in routers/switches that were considered while designing this practical approach. However, this approach is not applicable to any general IP packet.

Router-based approach

With router-based approaches, the router is charged with maintaining information regarding packets that pass through it. For example, Sager proposes to log packets and then data mine them later. This has the benefit of being out of band and thus not hindering the fast path.

Snoeren et al. propose marking within the router. The idea proposed in their paper is to generate a fingerprint of the packet, based upon the invariant portions of the packet (source, destination, etc.) and the first 8 bytes of payload (which is unique enough to have a low probability of collision). More specifically, m independent simple hash functions each generate an output in the range of 2n-1. A bit is then set at the index generated to create a fingerprint when combined with the output of all other hash functions. All fingerprints are stored in a 2n bit table for later retrieval. The paper shows a simple family of hash functions suitable for this purpose and present a hardware implementation of it.

The space needed at each router is limited and controllable (2n bits). A small n makes the probability of collision of packet hashes (and false identification) higher. When a packet is to be traced back, it is forwarded to originating routers where fingerprint matches are checked. As time passes, the fingerprint information is “clobbered” by hashes generated by other packets. Thus, the selectivity of this approach degrades with the time that has passed between the passage of the packet and the traceback interrogation.

To help mitigate the problem of storage limitations they use Snoeren’s hashing approach and implementation (SPIE) – modifying it to accept their information for hashing. They admit their algorithm is slow (O(N2)) and with only 3.3 million packet hashes being stored the approximate time before the digest tables are invalid is 1 minute. This dictates that any attack response must be real-time – a possibility only on single-administrative LAN domains.

The traceback problem is complicated because of spoofed packets. Thus, a related effort is targeted towards preventing spoofed packets, known as ingress filtering. Ingress Filtering restricts spoofed packets at ingress points to the network by tracking the set of legitimate source networks that can use this router.

Park and Lee present an extension of Ingress Filtering at layer 3. They present a means of detecting false packets, at least to the subnet, by essentially making use of existing OSPF routing state to have routers make intelligent decisions about whether or not a packet should be routed.

References