spacer

Sysresccd-Networking-EN-Destination-port-routing

History


Contents

About destination-port routing

In normal cases, routing on TCP/IP is based on the destination IP address. This article focuses on how to make routing choices based on the destination port. It can be used to split the traffic between several links. It's very useful on a network with important loads. For instance you may want to route the SSH traffic using one ADSL link, and the web traffic on another ADSL link. It may also prevent the interactive sessions from becoming unresponsive. If all the outgoing packets are routed to a single ADSL link with no traffic control, your ssh/telnet/vnc session may become unresponsive as soon as someone is downloading a large file on your network. With this technique you can route all the interactive traffic to a dedicated line in order to keep quick packet transmissions with the server.

You don't need anything special in order to get the advanced routing to work. It should work on all flavours of Linux that come with a 2.6 kernel (not sure if it works with linux-2.4). All the features that we use are in the mainline kernel, so you don't need a specific kernel patch. You must be careful with the networking options only if you compile your own kernel. You will also need the iproute2 package and the basic networking tools to be installed, but that is the case with all of the major linux distributions.

Example of a networking environment

To explain how we do destination-port routing, we will use the following network environment:

  • There are three machines in the network: saturn, jupiter, neptune.
  • Jupiter is the main router of this network. All the networking setup is done on that machine
  • There are two links between Jupiter and Neptune. In this example we only use simple ethernet links for testing. These links could be external links (Adsl, cable, ...) if we used public IP addresses at their ends. The routing setup on Jupiter would be the same.
  • Saturn and Neptune are located at the ends of the network. We want Saturn to send packets to Neptune using the two links.
  • The purpose of this article is to make all the network configuration on Jupiter. All the routing decisions are made on these machines. It will not be necessary to make advanced routing on either Saturn or Neptune.
  • We want one kind of packet to Neptune to be sent through the first link (the ssh traffic for instance, so all the TCP packets with dport=22), and another kind of packets to Neptune to be sent through the second link (let's say the web traffic, so all the TCP packets with dport=80).
  • We want the routing to be symmetric. This means that, once a TCP connection is established between Saturn and Neptune, we want all of the packets that belong to that TCP stream to go through the same link. We don't want to use asymmetric routing. In asymmetric routing the packets from Saturn to Neptune could use link1, and the replies to these packets could be sent through the second link. It could work if the Reverse Path Filtering is turned off, but it would mean that the packets from Neptune to Saturn do not respect the routing that the administrator wants them to have.
  • All of the tests are made with Saturn being the client and Neptune acting as a server. I mean that all of the TCP connections are established by Saturn.
  • To make this network environment simpler, we configured a dummy0 interface on Neptune with 172.16.1.100. The purpose is to split the traffic from Saturn (source address is 192.168.157.3) to Neptune (destination address is 172.16.1.100) and to spread it between the two links. In real life you will probably have a lot of machines behind Neptune. The dummy0 interface allows us to have an IP address on Neptune that is not related to the two links. It gives the illusion that 172.16.1.100 is an address of a remote computer behind Neptune and behind the two links.

dport-routing-01.png


Overview of the routing

Now, let's see how we can implement destination port routing. The old route and the recent iproute2 tools provide no option to select the packets using the destination port. But ip rule comes with an interesting option that uses the attributes of a packet to decide how to route it.

We will use the fwmark attribute to do that. This attribute does not belong to the IP header packet. This attribute is only stored in the memory of the local machine which works on the packet. This means that it will just be dropped as soon as the packet leaves the router. Anyway it's all we need since this attribute can be used by both netfilter and iproute2. The first thing to do is to use the advanced packet matching options provided with iptables and netfilter to mark the packets. Once the packet is marked we can use iproute2 to make policy routing and use this attribute.

Obviously the packet must have been marked by netfilter before it reaches the routing code. That's why it's important to remember when netfilter works on the packets. Netfilter has five hooks in the kernel network stack. This means that there are five places where the netfilter functions can work on the packets. These are the kinds of packets that can be seen by each of the five hooks:

  • PREROUTING: all the incoming packets whatever the destination address is
  • POSTROUTING: all the outgoing packets whatever the source address is
  • FORWARD: all the packets that are routed
  • INPUT: all the packets that are sent to the local machine
  • OUTPUT: all the packets that are sent by the local machine

So if we mark the packets at POSTROUTING, the routing code will not see the mark and the advanced routing will have no effect. That's why we must work in the PREROUTING hook for incoming routed packets, and in the OUTPUT hook if we want to route the packets sent by the router itself.


dport-routing-02.png


Marking the packets with iptables

Netfilter and iptables work with three tables:

  • filter: the most popular table, it's mostly used for firewalling, to accept or reject packets
  • nat: it's used for Network Address Translation
  • mangle: it's mostly used to modify network packets

We will work with the mangle table since we want to change an attribute of a packet. We want to split the traffic between the two links. Let's consider we want to route the ssh traffic through the first link and the web traffic through the second link. We will have to mark the TCP packets having dport=22 (destination port) with mark=1 and the TCP packets having dport=80 with mark=2:

iptables -t mangle -A PREROUTING -i eth0 -p tcp -m tcp --dport 22 -j MARK --set-mark 1
iptables -t mangle -A PREROUTING -i eth0 -p tcp -m tcp --dport 80 -j MARK --set-mark 2

Here is the complete code that cleans the table first, and that does logging:

iptables -t mangle -F
iptables -t mangle -X
iptables -t mangle -N LOG_FWMARK1
iptables -t mangle -A LOG_FWMARK1 -j LOG --log-prefix 'iptables-mark1: ' --log-level info
iptables -t mangle -A LOG_FWMARK1 -j MARK --set-mark 1
iptables -t mangle -N LOG_FWMARK2
iptables -t mangle -A LOG_FWMARK2 -j LOG --log-prefix 'iptables-mark2: ' --log-level info
iptables -t mangle -A LOG_FWMARK2 -j MARK --set-mark 2
iptables -t mangle -A PREROUTING -i eth0 -p tcp -m tcp --dport 22 -j LOG_FWMARK1
iptables -t mangle -A PREROUTING -i eth0 -p tcp -m tcp --dport 80 -j LOG_FWMARK2

Routing the marked packets

To route the packets using the mark attribute, we have to use the ip rule command. It's named policy routing. We have to create secondary routing tables that will be used when the mark attribute of a packet matches a rule.

Create new routing tables

First, we have to create these two routing tables by editing /etc/iproute2/rt_tables. Here is the code that automatically creates two tables called rt_link1 and rt_link2.

if ! cat /etc/iproute2/rt_tables | grep -q '^251'
then
        echo '251     rt_link1' >> /etc/iproute2/rt_tables
fi
if ! cat /etc/iproute2/rt_tables | grep -q '^252'
then
        echo '252     rt_link2' >> /etc/iproute2/rt_tables
fi

Here is the list of the routing tables you should have on Jupiter:

# -----------/etc/iproute2/rt_tables------------
# reserved values
255     local
254     main
253     default
0       unspec
# custom routes
252     rt_link2
251     rt_link1

Now we must populate these two routing tables. The best thing to do is just to add one default route in each table. Each default route drives the packet to the ethernet card where the link to use is connected. That way, when a packet with dport=22 follows the default route written in rt_link1, it will be sent to Neptune through device eth1 (Link1). We also use ip route flush to be sure that the table is empty.

ip route flush table rt_link1
ip route add table rt_link1 default dev eth1
ip route flush table rt_link2
ip route add table rt_link2 default dev eth2

Use the new tables with policy routing

Now we have to use the ip rule command to say what to do with the marked packets. The following lines say that the packets having the mark fwmark=1 must follow the routing instructions of the routing table named rt_link1, and the packets with the second mark must use rt_link2. At the end we flush the routing cache to be sure that the new rules are taken into account.

ip rule del from all fwmark 2 2>/dev/null
ip rule del from all fwmark 1 2>/dev/null
ip rule add fwmark 1 table rt_link1
ip rule add fwmark 2 table rt_link2
ip route flush cache

Here is the list of all rules after these commands are executed:

# ip rule show
0:      from all lookup local
32764:  from all fwmark 0x2 lookup rt_link2
32765:  from all fwmark 0x1 lookup rt_link1
32766:  from all lookup main
32767:  from all lookup default

Linux network parameters

There are two network parameters that have to be checked if you want your router to behave as expected. First we want to be sure that the kernel running on Jupiter is configured to route the packets. To enable routing on IPv4 you must set ip_forward to 1 (1 means enabled, 0 means disabled).

echo 1 >| /proc/sys/net/ipv4/ip_forward

You must also disable Reverse Path Filtering. It's an option enabled by default that increases the security and prevents ip spoofing by checking that the source address of the incoming packets match the routing table on the local machine. Since we are doing a complex setup, this option would lead to dropping our packets, so it must be disabled.

echo 0 >| /proc/sys/net/ipv4/conf/all/rp_filter

These changes will be lost if you reboot your server. You can either ensure that is automatically executed by a script at boot time, or you can edit your network configuration files to be sure that these changes will be kept after reboot. On Gentoo and Redhat you have to edit /etc/sysctl.conf:

# /etc/sysctl.conf
# 
# Enables packet forwarding
net.ipv4.ip_forward = 1
# Disable reverse path filtering
net.ipv4.conf.all.rp_filter = 0

Source Network-Address-Translation (SNAT)

Now the packets from Saturn to Neptune should be routed as expected. But there is still one problem to solve. The replies sent by Neptune to Saturn will ignore the advanced routing and will always be sent through the same link, the one that matches the route to 192.168.157.3 that is configured on Neptune. When Neptune receives packets from Saturn, the source address is 192.168.157.3. Since there is no advanced routing configured on Neptune, the packets to Saturn just follow the normal route.

This is a case of asymmetric routing. The packets from Saturn to Neptune having dport=80 are routed through the second link because of the advanced routing on Jupiter. And the replies to these packets are sent through the first link just because it's normal routing. One solution to this problem would be to configure the advanced routing on Neptune as well as Jupiter. But we wanted to keep the configuration as simple as possible and we only want to configure advanced routing on Neptune.

The best thing to do is to configure SNAT (Source Network-Address-Translation) on Jupiter so that all packets sent through link1 or link2 come with a rewritten source address. We want the source address of the packets from link1 to be 10.37.1.253 and the source address of the packets from link2 will be 10.37.2.253. That way Neptune will receive packets with a source address that matches the link from which they come. When Neptune replies to the requests coming from link1 or link2 it will just use the source address seen in these packets as the new destination address.

You will also see that the SNAT involves an implicit DNAT (Destination Network-Address-Translation). When Jupiter receives a packet on eth2 (the interface where the second link is connected), it works because the destination address is 10.37.2.253. This is a reply to a packet from Saturn (192.168.157.3), so we want Jupiter to change the destination address, and to forward it to Saturn. This is done by the implicit DNAT.

It's important to notice that the Source address NAT is executed in POSTROUTING. That way it's executed after the routing, which is the place where we drive each packet to the correct device (either eth1 or eth2 on Jupiter). The SNAT iptable rule uses the "outgoing device" match to determine what source address must be written in the packet header.

In case you are using ADSL links between Jupiter and Neptune, you will be forced to use public IP addresses outside of your local network. Most modems can do NAT for you. In that case you don't have to worry about that.

Here is the code to configure SNAT on Jupiter:

iptables -t nat -F
iptables -t nat -X
iptables -t nat -A POSTROUTING -o eth1 -j SNAT --to-source 10.37.1.253
iptables -t nat -A POSTROUTING -o eth2 -j SNAT --to-source 10.37.2.253

Troubleshooting

Here is what you can do in case it does not work:

Check your firewall

In this article we considered the packet filtering as not enabled on your router and on your network. In case you are using iptables already, you will have to check that it's consistent with the new iptables rules involved in the destination port routing.

Logging packets with iptables

You can use iptables to log the interesting packets using -j LOG in your rules. Don't forget to install syslog on your machine. Here is an example of what you can get:

iptables-mark1: IN= OUT=eth1 SRC=10.37.1.100 DST=10.37.3.101 LEN=60 TOS=0x00 PREC=0x00 TTL=64 
                ID=53236 DF PROTO=TCP SPT=44443 DPT=22 WINDOW=5840 RES=0x00 SYN URGP=0
iptables-mark1: IN= OUT=eth1 SRC=10.37.1.100 DST=10.37.3.101 LEN=60 TOS=0x00 PREC=0x00 TTL=64 
                ID=53237 DF PROTO=TCP SPT=44443 DPT=22 WINDOW=5840 RES=0x00 SYN URGP=0

To enable logging, you can replace a simple iptables action (such as MARK) with a customized chain (such as LOG_FWMARK). Everytime a packet is marked you will also have a messages written in your logs. For instance, you can replace this simple iptables command:

iptables -t mangle -A PREROUTING -i eth0 -p tcp -m tcp --dport 80 -j MARK --set-mark 2

With the following chain:

iptables -t mangle -N LOG_FWMARK2
iptables -t mangle -A LOG_FWMARK2 -j LOG --log-prefix 'iptables-mark2: ' --log-level info
iptables -t mangle -A LOG_FWMARK2 -j MARK --set-mark 2
iptables -t mangle -A PREROUTING -i eth0 -p tcp -m tcp --dport 80 -j LOG_FWMARK2

Use a network sniffer

You can use a sniffer such as tcpdump (console) or wireshark (graphical mode) to check what packets are transmitted and with which attributes.

Routing configuration on Saturn

Even if 95% of the networking configuration has to be done on the router (Jupiter) don't forget to set a route to Neptune on Saturn. It may be necessary if Jupiter is not the default gateway on Saturn. Here is what to do on Saturn:

ip route add 176.16.1.100 via 192.168.157.253

  #           (dummy0)         (Jupiter eth0)

Detailed journey of a routed packet

To have a better understanding of how this advanced routing configuration works, let's take an example of a networking packet sent from Saturn to Neptune. We consider the user on Saturn wants to connect to Neptune via ssh. In our example the ssh packets are supposed to be routed through the first link on Jupiter. (Link1)

  • On Saturn, a user runs ssh 176.16.1.100 to connect to Neptune.
  • Saturn finds that the packets to 176.16.1.100 must be routed via 192.168.157.253 so the link layer on Saturn sends the packet to Jupiter. This packet contains ip.src=192.168.157.3, ip.dst=172.16.1.100, tcp.dport=22.
  • On Jupiter, the packet is first processed in PREROUTING. The instruction we added in mangle with "-i eth0 -p tcp -m tcp --dport 22 -j MARK --set-mark 1" is executed and the mark=1 attribute is written in the packet information in the memory of Jupiter.
  • On Jupiter the routing code finds that the packet is not sent to its local address so the packet is routed. It follows the rules involved in policy routing, and the packet hits the rule that says "packets having the attribute mark=1 must follow use the routing table named rt_link1". Since the default route in this routing table is "default dev eth1" the packet is sent to the device named eth1.
  • On Jupiter, in POSTROUTING, the packet matches "-i eth0 -o eth1" so "-j SNAT --to-source 10.37.1.253" is executed. The SNAT code rewrites the source address in the packet:
    • BEFORE SNAT: ip.src=192.168.157.3, ip.dst=172.16.1.100, tcp.dport=22
    • AFTER SNAT: ip.src=10.37.1.253, ip.dst=172.16.1.100, tcp.dport=22
  • The packet is sent from Jupiter to Neptune with ip.src=10.37.1.253, ip.dst=172.16.1.100, tcp.dport=22
  • On Neptune the packet is delivered to the local ssh service because the destination ip is 172.16.1.100 and it is a local ip address. The ssh server accepts the connection and replies with another packet. The reply is sent to the address that was the source address in the query packet, so the new packet is sent to 10.37.1.253. The reply packet is sent with the following attributes: ip.src=172.16.1.100, ip.dst=10.37.1.253, tcp.sport=22
  • On Neptune, the routing code finds that 10.37.1.253 belongs to the subnet of eth1. So The packet is sent to Jupiter through eth1. (Link1)
  • Jupiter receives the reply from Neptune on eth1. The packet hits the implicit DNAT code that is executed when there is a SNAT. The Destination NAT replaces the destination address in the reply packet with the address that was in the header before it's rewritten by the SNAT code:
    • BEFORE DNAT: ip.src=172.16.1.100, ip.dst=10.37.1.253, tcp.sport=22
    • AFTER DNAT: ip.src=172.16.1.100, ip.dst=192.168.157.3, tcp.sport=22
  • On Jupiter the routing code knows that the packet to 192.168.157.0/24 must be delivered through eth0
  • Jupiter sends the reply from Neptune to Saturn through eth0 with ip.src=172.16.1.100, ip.dst=192.168.157.3, tcp.sport=22
  • Saturn receives the reply from Neptune and Saturn sends a new TCP packet, ...

Complete script that configures the router

Here is the script to execute on Jupiter to configure the destination port routing:

#!/bin/bash

echo 1 >| /proc/sys/net/ipv4/ip_forward
echo 0 >| /proc/sys/net/ipv4/conf/all/rp_filter

iptables -t mangle -F
iptables -t mangle -X
iptables -t mangle -A PREROUTING -i eth0 -p tcp -m tcp --dport 22 -j MARK --set-mark 1
iptables -t mangle -A PREROUTING -i eth0 -p tcp -m tcp --dport 80 -j MARK --set-mark 2

iptables -t nat -F
iptables -t nat -X
iptables -t nat -A POSTROUTING -o eth1 -j SNAT --to-source 10.37.1.253
iptables -t nat -A POSTROUTING -o eth2 -j SNAT --to-source 10.37.2.253

if ! cat /etc/iproute2/rt_tables | grep -q '^251'
then
        echo '251     rt_link1' >> /etc/iproute2/rt_tables
fi
if ! cat /etc/iproute2/rt_tables | grep -q '^252'
then
        echo '252     rt_link2' >> /etc/iproute2/rt_tables
fi

ip route flush table rt_link1
ip route add table rt_link1 default dev eth1
ip route flush table rt_link2
ip route add table rt_link2 default dev eth2

ip rule del from all fwmark 2 2>/dev/null
ip rule del from all fwmark 1 2>/dev/null
ip rule add fwmark 1 table rt_link1
ip rule add fwmark 2 table rt_link2
ip route flush cache
spacer