Using IPv6 Dynamic GU Addresses in Nested Subnets

Abstract

NAT shall not be used with IPv6, but providers continue to supply customers with globally routed (GU) addresses that change dynamically. While this may work for a single endpoint or even a single flat LAN, as soon as our site is structured into different subnets, connected by various routers, changing all the addresses dynamically becomes somehow difficult.

We draw a possible solution that manages to adjust all the dynamic GU IPv6 addresses throughout a site network with hierarchical subnetting within less than 25 seconds, utilizing the router advertisement daemon and the Roy-Marples-implementation of the DHCP client.

All configurations are based on contemporary FreeBSD (Release 12.3 and 13.1 at the time of writing).

Intended audience: engineers designing an IPv6 layout for multi-hop structured LANs.

1. PPP

Our provider supplies a PPPoE connection that can be decoded with ppp from base. 

ppp can handle IPv6 when enable ipv6cp is set (this is the default). After creating the tun device, ppp puts a link-local address on it. As the tun device does not have a MAC address, the one from the first interface on the system is used to compute the EUI-64 for SLAAC. This is done by code within ppp, so there is no need to set AUTO_LINKLOCAL.

Next, a globally routeable address should be obtained via router advertisement from upstream. Therefore ACCEPT_RTADV is needed on the tun interface. But ppp creates the interface ad-hoc, and even if we would pre-create it to set that option, ppp insists on destroying it on termination, so at a next start the option would be gone. 
We can, however, set that option from the ppp.linkup script, which is run before the final stage of IPv6 configuration. (We could also set it at a later time, but then we might have to wait some ten minutes until the next router advertisement comes in.)

provider:
  shell /sbin/ifconfig INTERFACE inet6 accept_rtadv

INTERFACE is a keyword here - ppp will substitute the devicename used.

We will also need a default-route; this can be set in ppp.conf alongside with the IPv4 defaultroute:

provider:
  add default HISADDR
  add default HISADDR6

With this we have a functioning IPv6 configuration on our router. We could now, for instance, remove the -4 option from a named installation on the router, and named would happily start to do queries via IPv6. (It seems that this already fixes some occasional resolver failures with one of my cloud hosts.)

The Detail Problem

What happens if our uplink does occasionally fail and ppp needs to reconnect? Or if we send sigINT to make ppp disconnect and reconnect? In that case the tun interface does not get destroyed, and the IPv6 addresses continue to exist. After reconnect the upstream router advertisement will provide a new routeable address - but nobody will remove the old one! This would be the task of the upstream route advertiser, to keep track and invalidate the former routes, but it doesn't work that way with largescale providers. 
So that address(es) will linger on, and be in no way different than the current one, and applications will choose one of them, the one that works or the other, and services will suffer.

So we need to clean up the mess. Since ppp.linkup runs before the IPv6 address is assigned, we can do it from there. (But don't delete the linklocal address. ppp will be angry when it is missing!)

provider:
 shell /sbin/ifconfig INTERFACE inet6 accept_rtadv
 shell /etc/ppp/ipv6unconfig INTERFACE
#! /bin/sh
# 
# ggfs. alte ipv6 löschen
INTERFACE=$1
ifconfig $INTERFACE inet6 | \
           awk '$1 == "inet6" && ! match($2, /^fe[8abc]/) { print $2 }' | \
   while read ADDR; do
       ifconfig $INTERFACE inet6 $ADDR delete
   done

2. Prefix Delegation

In the following discussion the descriptions will be based on this sketch of the site layout:

                            --------------------      ------------------------
  other subnets         ----|             tap0 |------| vtnet0          tun0 |---- ISP
  with hosts (servers       | backbone         |      |        outbound      |
  and clients) and      ----|  router          |      |         router       |
  potentially other         |                  |      |                      |
  routers               ----|                  |      ------------------------
                            --------------------

The IPv6 addresses configured on tun0 have a netmask (named prefixlen in IPv6) of /64. This is standard, in IPv6 the addresses do normally have a 64 bit host part. This gives room for lots of trillions of machines per network, and we could certainly split this up and supply our whole environment with subnets made from it. But that wouldn't be clean, it would be against the default, and it would need an explicit configuration in a lot of tools.
Furthermore, since we get dynamic addresses, we would then have three parts of an address: the highest 64 bits which dynamically change after a new dialout, a second part of, say 48 bit that is our internal network and should stay constant, and the final 16 bits for the host. To make the routing between subnets work we would therefore need a prefixlen /104 on the interface, while at the same time we would need to dynamically distribute new prefixes with a prefixlen of /64 - and this does just not work with rtadvd: when it distributes a /64 prefix, it also sets the prefixlen on the interface to 64.
So there would be a lot of pain and hackery involved.

Instead, the provider supplies us with another prefix of /56 (independent from the GU address received for tun0), which is also routed to us (and which does also change dynamically). And this can be split into 256 subnets for our needs, in the fashion as this is commonly done.

This additional prefix can be obtained with DHCP. (I do not know if there is an alternative way, like some json API or similar, to obtain it.) Usually a DHCP client would request that prefix, obtain it just like it obtains a lease, then automatically split it into subnets (according to some configuration), and configure these subnets onto the local interfaces. 
And that would make you going: the subnets then have a network address accoding to the current prefix delegation, and can talk to each other and the outside.

Only, in our layout the subnets are not on the outbound router, they are on the nexthop, the backbone router! There is only one subnet to configure on the outbound router, vtnet0, that connects the nexthop.

Site layouts may be different or more complex, but the general problem is always the same: how do we move these prefixes onward to the place where we need them?

But, first things first, lets see how the vtnet0 interface gets configured.

3. DHCPv6

There is a couple of DHCPv6 clients available in the ports tree:

  • net/dhcp6
    This is the KAME Implementation, and it can be configured according to this tutorial, and that does work, but with flaws:
    • When stopping the client daemon, it takes a long time and apparently hits some timeout.
    • When you send a sigINT to ppp (and so make it disconnect and reconnect), and subsequently send sigHUP to dhcp6c (to make it renew its lease), then a new (and different) /56 prefix will be obtained, and the IPv6 address on vtnet0 will be changed accordingly - so far this is as expected - but: the provider does not route that address! Data is sent into nirvana. 
      When you now entirely stop and start the dhcp6c daemon, the very same prefix will be obtained again, but now the provider does route it.
      Something seems to go wrong here. probably in the dialog with the provider. I didn't bother to find out what exactly, and tried the next option.
  • net/isc-dhcp44-client
    This is the ISC implementation, and it seems to be most widely used, and has an abundance of options which one might need for what.ever It is a combined DHCPv4/v6 client.
    When starting, it just reports Unsupported device type 23 for "tun0". This is understandable - tun0 is a point-to-point device, and DHCP (v4, that is) does make sense only on broadcast devices. So, end of story - next one please.
  • net/dhcpcd
    This is the Roy-Marples-implementation, and it works, starting from this template. But be careful! If you just start that daemon, without specific config and without precisely limiting what it is allowed to do, it will do as much as possible, and try to modify interfaces for IPv4, the resolv.conf, and who knows what else.

Let's look at the config:

ipv6only
nohook resolv.conf, hostname, ntp.conf, test
duid
require dhcp_server_identifier
persistent
slaac hwaddr
allowinterfaces tun0 vtnet0
noipv6rs
interface tun0
  option rapid_commit
  ia_pd 1 vtnet0/255/64/1

ipv6only - by default interfaces are supplied with IPv4 addresses, too. Set this if you don't want strange 169.x.x.x addresses to appear.

nohook - a bunch of scripts is run by default, which do tamper with hostnames, DNS configs and similar things. Better to switch them off.

duid - that is the unique identifier for our host on the server. It should be derived from /etc/hostid.

require - dhcp_server_identifier is a mandatory component of DHCP messages. Ignore messages without it.

persistent - not required, and it is a difficult decision:

  • without persistent, when dhcpcd is stopped, the assigned addresses on the interfaces will be removed. So we will loose connectivity when only restarting dhcpcd.
  • with persistent the assigned addresses do stay. So when we restart dhcpcd and ppp, we get new addresses, but the old ones will still linger around (and make connectivity to the new actual owner of these addresses fail)

Neither is optimal - but then there is no way to milk the cow and eat it.

allowinterfaces - by default all interfaces are processed. Limit this here to the one that is connecting to the provider, plus those that shall get configured.

noipv6rs - by default router advertisements are asked for on all allowed interfaces, and the interfaces are configured accordingly. (Since this conflicts with the kernel already doing the same when ACCEPT_RTADV is set on the interface, that option will also be removed.) This config stance disables it and can be set per interface and/or globally.

interface - this is our outbound interface where we query for prefix delegation

option rapid_commit - tells the server to reply immediately

ia_pd 1 vtnet0/255/64/1 - request prefix delegation, 1 is a number that must be different for other requested prefixes. The third pattern means: configure the 255th subnet with prefixlen 64 onto vtnet0 and use 1 as the host suffix. Since we receive a /56 prefix, the 255th subnet with /64 means the highest subnet, i.e. when the received prefix would read 1234:5678:90ab:cd00::/56, then the address would become 1234:5678:90ab:cdff::1. Using 0 as the host part would enable SLAAC/EUI-64. Further patterns like this one can follow if there are more interfaces present which should get an address.

With this config the DHCP client will only contact the upstream DHCP server, obtain the delegated prefix, configure an address from the highest subnet onto vtnet0, then store the obtained prefix as a lease and renew that from time to time. And when the prefix changes, it will change the configured address accordingly. It will not do any other things, specifically not configure any routes.

Signaling

To become active, the DHCP client needs to be signaled from ppp when a new connection is established. This can be done ffrom ppp.linkup, but ppp runs this before completing IPV6CP. Therefore we run a shell script with bg (background) to wait until an IPv6 address actually appears on the tun interface:

provider:
  ...
  bg /etc/ppp/ipconfig INTERFACE MYADDR
#! /bin/sh
#
# Wait for new IPv6, then inform involved parties

INTERFACE=$1
MYADDR4=$2

while test -z "$MYADDR6"; do
    # check for GU addresses only (old ones have been removed by ipv6unconfig)
    MYADDR6=`ifconfig $INTERFACE inet6 | \
             awk '$1 == "inet6" && ! match($2, /^fe[8abc]/) { print $2 }'`
    sleep 0.5
done

if test -f /var/run/dhcpcd/pid; then
    /usr/local/sbin/dhcpcd -N
fi

# do other things here, like reconfigure firewall, DNS etc.

Used Ports

While the router advertisements (that provide our address on tun0) work with the ICMP6 protocol, DHCPv6 uses UDP, ports 546 and 547, over the linklocal/multicast addresses. As this is specific outbound/inbound traffic, it may need to be enabled in firewall rules.

4. Delegating Further On

After having assigned the first network to vtnet0, there remain 255 other networks available from the delegated prefix. But these are of no use on the outbound router and must be moved further into the LAN. The tasks are:

  • communicate the remaining available subnet prefixes to the backbone router
  • have the backbone router distribute&configure these subnets onto it's attached network interfaces
  • install routes onto vtnet0 for those subnets to route them inbound
  • do this reliably every time the uplink is established, and
  • hold to proper housekeeping, i.e. make sure the old ressources are properly disposed.

DHCP Server

While there are probably various means to achieve this, we considered it most straightforward to hold on to the already established way and use DHCP again. Using net/dhcp6 to install a DHCP server on the outbound router was, while not bugfree, sufficiently successful this time.
dhcp6s reads a simple config file to know about the prefixes to distribute; this file has to be rewritten and dhcp6s reloaded when a new prefix is received from upstream.

A possible downside is that dhcp6s can be given only one interface to work on. If there were more interfaces attached to the outbound router, it might be possible to run multiple instances. These would then need to use different ports for their control interface, so this should be configurable in our scripts.

The Detail Problem

The dhcp6s config file /usr/local/etc/dhcp6s.conf supports include statements. So we could include our dynamically rewritten config into the main config file and do not need to change data within /usr/local/etc (which is not a good thing). But sadly this does not work: when sending a dhcp6ctl -S reload command, the config does not get properly reloaded, but instead, when hitting the first include statement, dhcp6s produces a coredump.

We need to write the entire config file into /var/db, and configure dhcp6s accordingly in /etc/rc.conf:

dhcp6s_enable="YES"
dhcp6s_interface="vtnet0"
dhcp6s_config="/var/db/dhcp.provider-prefixe"

Splitting the prefix

The obtained /56 prefix can be split into smaller prefixes in any desired way, just like usual subnetting, e.g. 2x /57 or 256x /64. While the latter may seem the most generic way, it has a downside: DHCP uses a pool concept (first come first serve), and so the same internal subnet may not always get the same SLA number in subsequent runs, depending on subtle timing differences. And when the subnets are laid out in some structure (e.g. into different security zones), then having bigger chunks and letting the subsequent routers distribute them further is easier to administer.
Since one subnet is already consumed for vtnet0, we chose this simple algorithm (which could also be used for a /48 or /52 prefix):

  • 3x /64
  • 3x /62
  • 3x /60
  • 3x /58

5. The Hook Script 

The DHCP client dhcpcd does run external hook scripts at every event. We have disabled the default hook scripts that are included in the distribution, but we can write a custom hook script and place it into /usr/local/etc/dhcpcd.enter-hook, and that will also be run.
There we must select the proper kind of event (prefix delegation), detect when our delegated prefix has changed, and then

  • split it up
  • write a new dhcp6s config file from that
  • signal dhcp6s to reload it's config
  • change the extra routes on vtnet0

In order to remove the old routes on vtnet0, we need to know the old prefix. dhcpcd should provide to hook scripts the old and new prefix in the environment - but see below.

Detail Problem 1

It is not enough to make dhcp6s reload the config file: after dhcp6s has given out a prefix to a client, it stores that binding internally in memory, and even after reloading the config, the old binding will continue to be confirmend to the client, i.e. the client will receive the old prefix. It is necessary to remove those bindings as well, and to identify them we need the client's duid (which is written into the config file also) and the correct iaid number used by the client, which is configured in the client's config file. The configuration of the client and the information used in our hook script must therefore match. 
Reconfiguring is done with the dhcp6ctl command (see manpage).

Detail Problem 2

After ppp has established (or re-established) the uplink, the DHCP client dhcpcd gets signaled to fetch a new delegated prefix from upstream. There are two possible options to do that, -N or -n (there is a third option -g which appears to do nothing at all).
Sadly, both do not work properly. With -N a BOUND6 event occurs and the environment contains only the new prefix - but without knowing the old prefix, we cannot remove the old routes (and neither detect if it has changed at all). And when sending -n, we get a EXPIRE6 and subseqent REPLY6 event and can properly remove and recreate things accordingly. But in this case dhcpcd fails to remove the old address from vtnet0 (for whatever reason or maybe a flaw?), and these old addresses will continue to linger on. (This could be workarounded by sending -N first, and then after a second sending -n alongside.)

We therefore chose a different approach: instead of relying on dhcpcd to properly provide old and new prefixes, we store the current prefix into the firewall.
The ipfw firewall has a concept called TABLES. These are storage space designed to store things like temporary IP addresses. They are easily accessed, they are not persisted to disk, we do not have to bother with possible file read or disk full issues, they can even be accessed atomically - and they are supposed to be used with firewall rules, where we might need exactly that prefix information anyway!

Routing

When we delegate sub-prefixes further into our LAN, we also need to add routes on vtnet0 pointing these addresses to the inside. To create these routes, the nexthop gateway address is required.

In IPv4 when you configure a network route, you tell the route command the nexthop gateway to which packets should be sent. This is just the same with IPv6, but with a twist:
With IPv6 every interface has a linklocal address that starts with fe80:. These serve a similar purpose as broadcasts on Ethernet: packets sent via them can only traverse to the directly connected hosts, they are never routed onwards. Since IPv6 does not use ARP, it uses these addresses to figure out the MAC adresses of neighboring hosts, and similar tasks.
And these can also be used for routing: When you know that your nexthop gateway has a linklocal address of fe80::2, it is perfectly valid to configure a route like this:

route -6 add -net default -gateway fe80::2%vtnet0

The %vtnet0 is necessary for linklocal addresses: since they are linklocal, the same address could exist on different directly attached networks, and we need to specify to which network this should go. (It is also visible in the output from netstat -rn.)
This construct is specifically of advantage when using dynamic addresses: The routeable address on the interface may change, but the linklocal one will not, and when configuring it that way we do not need to change the route alongside.

There is a possible issue with this: when you have a GU address configured only on one side of a link, that side might send solicitations from that address - but the other side cannot know that GU address or it's network, and will therefore just drop the request (for security). There is a sysctl option net.inet6.icmp6.nd6_onlink_ns_rfc4861 to change this and always accept the solicitations.  

So we can use the linklocal address of tap0 on the backbone router for the nexthop of our routes. Instead of figuring out what EUI-64 number that interface might get (as it is a tap interface, it has an artificial MAC not in hardware - but if it were in hardware, it would change when you replace the hardware), we use a simple number 2 as the host part. This can be done in the normal way with ifconfig tap0 inet6 fe80::2%tap0, from rc.conf (or whereever the interface gets created).

Scripting

Since some configuration information is required for the prefixes to be distributed, we split this into a configuration part and a code part. The configuration part is installed as the hook script /usr/local/etc/dhcpcd.enter-hook, while the code part is then included from there.

The code part can be retrieved here, and the config part looks like this:

# dhcpcd.enter-hook script

# BEWARE: multiple ia_pd lines (with different numbers) can be given in
#         dhcpcd.conf in the same interface block, and they will be separately
#         requested from the server, but here in the environment all the
#         prefixes will be collected under the LAST one of the numbers!
#         (Maybe a bug - consequentially the separation via the "from_iaid"
#         below does currently not work.)

IAIDs="1"             # ia_pd IDs requested from server (but see comment above)
DO_ROUTES=1           # create routes for delegated prefixes
DO_DHCP6S=1           # create/update config for dhcp6s
RENEW_ON_RA=1         # fetch new PDs when our address changes

# Filename of the generated config for dhcp6s
DHCPSCFG=/var/db/dhcp.provider-prefixe

# content of the dhcp6s configfile to write (leave empty if not required)
#  - host: name of our client, for reference only
#  - interface: our interface where the client connects us
#  - route: our nexthop gateway to the client (needed to create a route)
#  - controlport: where we can reach dhcp6s (different for multipe interfaces)
#  - from_iaid: ia_pd number we use against our server (but see comment above)
#  - iaid: ia_pd number our client uses against us
#  - prefixlen: the length of this prefix
#  - expiry in seconds
#  - duid: duid that our client uses to identify (usually their /etc/hostid
#          prefixed with 00:04:)
# 
DIST='
default 
    interface vtnet0
    controlport 5547
    route fe80::2%vtnet0
    from_iaid 1
    duid 00:04:e4:79:62:c7:2e:bc:81:de:49:7a:00:e0:cd:f4:15:4b
    prefixlen 60
    time 14400
host backbone
    iaid 1
host backbone
    iaid 2
host wlanr
    duid 00:04:a4:79:42:d1:2e:ab:81:de:33:7a:00:e0:a1:f4:42:51
    prefixlen 64
    iaid 1
    time 7200        
'

DO_IPFWTBL=1           # fill extra tables for ipfw
SLA_MAX=8              # bit-offset for iaid in ipfw-table values
TBL_DGET=ip6dgd        # name of ipfw table used for storage
TBL_DPUT=ip6dgg        # name of ipfw table used for storage
TBL_UPLK=ip6dup        # name of ipfw table used for storage
TBL_BASE=baseifs       # name of ipfw table used for storage

# NPTv6 and it's sentinel-interface (iaid/sla/prefixlen/suffix)
DO_NPTV6="vtnet1"
DELEG_vtnet1="1/1/64/1"

# actual code gets included from that file (adjust path as appropriate)
. /ext/libexec/dhcpcd.enter-hook

This will be invoked automatically when placed into /usr/local/etc/dhcpcd.enter-hook.

6. The Subordinate Router

Having the outbound router configured, we can now install and run dhcpcd on the backbone router in the same way as we did before.

But herenow appears the core problem of the whole endeavour: how does this downstream router get notion when the prefix has been changed upstream?

With DHCP, while occasionally asked for, this seems not to be a supported function. Usually it is suggested that one should make the lease time very short to have the client frequently renew the delegation. But then this would still be in the range of minutes. There is also RFC3203 that describes a FORCERENEW method to realize the function, but, according to this discussion, no implemenation of that is known. The bottomline is: DHCP is pull-only.

However, there is another option. After a new prefix has been received by the outbound router, that DHCP client will also put a new IP address onto it's vtnet0 interface. When a rtadvd is running on the outbound router, it will automatically pick up that change, and will message the new subnet, as a router advertisement, to the machines on the local link - which here is just the backbone router. And in consequence the IP address on the tap0 interface of the backbone router will also automatically change.

This acting upon reception of router advertisements is normally done by the kernel, according to the ACCEPT_RTADV option on the interface. But it can as well be done by an extra program. Specifically, the Roy-Marples-implementation of the DHCP client does already contain code to do that - the only thing required is to remove the ACCEPT_RTADV option from the interface and instead activate the ipv6rs option in the interface section of dhcpcd.

Then, dhcpcd does invoke hook scripts on events, and this does includes the ROUTERADVERT event. So the only thing we need to do is, add handling for the ROUTERADVERT event to our hook script, and there detect when the address has actually changed, and then trigger dhcpcd to renew the lease. (Renewing the lease will make dhcpcd send a router solicitation, and consequentially receiving another router advertisement, so care must be taken to trigger this only when the IP address has actually changed. Otherwise an endless loop is created.)

For this to work, rtadvd must be enabled to run on the outbound router in /etc/rc.conf:

rtadvd_enable="YES"
rtadvd_interfaces="vtnet0"

On the backbone router, our hook script does again store the received address into an ipfw table, since on invocation of the hook script the address on the interface may already have changed and the old value no longer available for comparison.

dhcpcd.conf is the same as before, with the only change that while still having noipv6rs globally, we set ipv6rs explicitely in the receiving interface section.

Routing Back

There is one more issue to be solved: how does the backbone router get a proper default route pointing back to the outbound router?

On the outbound router the default route gets configured from the ppp configuration. We could configure it here manually, but that does not even seem necessary: it seems that the router advertisements received from the outbound router do already contain that route, supplied by rtadvd! It just magically appears and there seems to be no issue.

Subordinate Subordinate Routers

Extending the setup to a third or further level of nexthop routers is then all the same again, but you probably do not need to setup another DHCP server. The KAME implementation contains a program dhcp6relay, which should be able to pass requests on to the next subnet. So all requests could be sent to and handled by the first DHCP server on the outbound router.

7. The Clients

Finally activating IPv6 on the clients is the simplest part, because there is almost nothing to do.

If you have the option AUTO_LINKLOCAL configured on the interface (which is usually done by the sysctl option net.inet6.ip6.auto_linklocal which is usually enabled), and if rtadvd runs on the nextmost router, the client will automatically get a GU address. You should, however, enable rtsold in /etc/rc.conf to make this more reliable:

rtsold_enable="YES"

To make IPv6 the default, ip6addrctl is used. It should detect a configured IPv6 automatically and switch to that, but that does not always work, especially if you do not create your interface in the usual way from /etc/rc.conf. Then the preference can be explicitely set in /etc/rc.conf

ip6addrctl_policy=ipv6_prefer

This seems to change the order of replies produced by getaddrinfo, and therefore should work for all programs that use this. (It is not related to the sequence of queries to the nameserver or the output of the host command.)

Local Firewalls

If you have a firewall on the client, there is an option firewall_client_net_ipv6 for /etc/rc.conf, and this should get the network address of your LAN, so that traffic to/from the LAN does not get filtered. This is now a bit difficult as this is a dynamic address. There are two ways to solve this:

  • Usually for traffic within the LAN you will not want to use the dynamic addresses, but instead supply the LAN with additional private static addresses from the fd::/7 range. These stay constant and can be configured here (as they can also be configured into a LAN DNS or anywhere in the LAN where an address is needed). See below for a few gotchas with these.
  • Alternatively, if you want to have LAN internal traffic via the dynamic addresses, you can just install dhcpcd and the hook script on the client also. It will then put the dynamic address as received from router advertisement into an ipfw table - and you can refer that table in the option.

 

8. Reboot

Finally we must assure that all the daemons will start up nicely when rebooting. In general this works, but it may spit out a bunch of error messages.

  • dhcpcd will automatically set linklocal addresses on all the interfaces it does manage. This works when the interface is present when dhcpcd starts. It does not work in case the interface appears later on (e.g. jail or guest interfaces that are created ad-hoc), the error messagis is error adding slaac to prefix_len 64. The solution is to have AUTO_LINKLOCAL enabled for these interfaces - then the kernel will create the linklocal address.
  • rtadvd expects a list of interfaces configured where it should work with. If these do not (yet) exist it will complain. Since jail is started in rc.d only after LOGIN and rtadvd before LOGIN, it is probably best to pre-create the interfaces to be used by jails. ngbridge is a rc.d script that does this (for netgraph users).
  • During startup of the jails rtadvd tends to complain - probably because interfaces are disappearing into jails via the ifconfig vnet function. This is probably not avoidable.
  • The startup position of dhcpcd is not defined in the rc.d script, and so rcorder may put it anywhere. We want to start this before ntpdate, so that ntpdate can utilize IPv6.
    Since it is not a good idea to change the rc.d script itself (the change will be deleted when updating the pkg), we instead create two markers to control the startup sequence: IPV6READY and IPV6UP.

9. Static Addresses

Having dynamic IPv6 addresses all over the LAN is good for the systems to reach any IPv6 (-only) site on the Internet. It is not good for communication within the LAN, because the dynamic addresses are subject to change at any time, and cannot be put into config files.
So, in addition to dynamic GU addresses, we may want to also configure static private IPv6 addresses for communication within the LAN. But this is not mandatory as the internal communication can as well be done in IPv4. And there are some issues.

Disappearing routes and 'no buffer space available'

A problem with these private addresses is that their routes may strangely disappear when dhcpcd is started.

Normally when we configure an address on an interface (with ifconfig or from rc.conf), we can see the address on the interface, and also two entries in the routing table:

tap0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1492
        inet6 fd00::101 prefixlen 120
fd00::100/120                     link#10                       U          tap0
fd00::101                         link#10                       UHS         lo0

The first line is in accordance with the prefixlen; it tells the routing to send addresses from that subnet out through the specific interface - and that one might disappear when dhcpcd starts up. (The second line is the address itself, and is connected to lo0. It tells the routing that this address is local to the system and shall be sent through lo0.)

The problem appears for all interfaces, even for those that are not configured in dhcpcd at all, and it happens as soon as any interface has ipv6rs option configured (like we did above). Consequentially then nothing will be routed to that subnet, and so, nothing works. Worse, when there is traffic on the system, the system may now not have any means to get rid of the packets, and the buffers will fill up. The system will then soon run into error 55 ("no buffer space available"), and then nothing at all works anymore. Still worse, this condition will not resolve itself, even after the required route gets configured again. It will only resolve after the concerned addresses on the interfaces get removed and reinstalled (or when the interfaces are taken down and up).

It may be difficult to notice and pinpoint the issue - usually one does only perceive that traffic does not get answered although the required services are running. One can usually check it with ping6

$ ping6 test
PING6(56=40+8+8 bytes) fd00::102 --> fd00::101
ping6: sendmsg: No buffer space available
ping6: wrote test.daemon.contact 16 chars, ret=-1

For a workaround, at first I trued to have dhcpcd maintain these static addresses as well. This can be done in the dhcpcd.conf file with the static option, like so:

interface tap0
	static ip6_address=fd00::101/120

This does usually work - and watching closely, one can see that dhcpcd still deletes the routes after start, and then, shortly after, reinserts them. But then I experienced the static IPv6 traffic stall when the outbound (nexthop) router (bhyve) is rebooted. dhcpcd will notice that the outbound interface has disappeared and unconfigure that interface, and then reinitialize it when it reappears. During that reinitialization again the routes are removed, but it may then take 10 seconds to obtain (or not obtain) a new lease, and only after that timeout the routes are reinserted. And this is enough time for the internal traffic to stall entirely. and render the LAN nonfunctional.

The matter is, dhcpcd does remove the routes intentionally, with the SIOCSPFXFLUSH_IN6 ioctl, in if-bsd.c:if_setup_inet6() - only with FreeBSD the syscall is misinterpreted to remove the routes from all interfaces instead only the one given in the syscall (this is already discussed in https://githubmemory.com/repo/rsmarples/dhcpcd/issues/59). That doesn't make matters worse because it would remove the routes from the interfaces it delegates prefixes to, anyway, and it is these interfaces we want to put static addresses on.

The argument in the code comment is, that the routes should be flushed because the kernel might otherwise expire those that dhcpcd tries to manage: I don't understand this, I could not find any mention that the kernel would even be able to do such:
There is indeed an expire metric in the route entries, which is poorly documented, and there was a statement by Kevin Oberman in 2008 that it is no longer used. Besides that, there appear to be the pltime and vltime values as provided by the upstream dhcps. These appear in the address when delegated onto an interface (visible with ifconfig -L). When these values do expire, the address gets removed from the interface, and then the associated routing entries will disappear alongside, as is to be expected. And anyway, rtadvd does ignore these values and propagate the prefix onwards with default lifetime (7 rsp. 30 days), unless otherwise configured. 

All this should be independent from flushing any routes beforehand. So, from my understanding, under usual circumstances that flushing is problematic, for the given reason. It may well be useful for general cleanliness and housekeeping, to remove stale stuff from undefined operations. But it definitely does more harm than good here, and so I recommend to patch it out of the code, and then have the static routes configured in conventional fashion.

Adjusting ip6addrctl

When configuring the static private addresses (from the fc00::/7 prefix), one may notice that they are still not used. This is because of the default ip6addrctl setting for IPv6, where site-local IPv6 has a lower precedence (3) than IPv4 (35):

$ ip6addrctl
Prefix                          Prec Label      Use
::1/128                           50     0        0
::/0                              40     1        0
::ffff:0.0.0.0/96                 35     4        0
2002::/16                         30     2        0
2001::/32                          5     5        0
fc00::/7                           3    13        0
::/96                              1     3        0
fec0::/10                          1    11        0
3ffe::/16                          1    12        0

We can change this by providing a custom table with some lines added to give site-local IPv4 a lower precedence:

::ffff:10.0.0.0/104                2     4        0
::ffff:172.16.0.0/108              2     4        0
::ffff:192.168.0.0/112             2     4        0

10. NPTv6

With site-local static IPv6 addresses configured, NPTv6 is an alternative way of connecting to the Internet. The disadvantage of it is that it has to be configured into the firewall. The advantage is that such connections can persist even while the dynamic prefix changes. For instance, an UDP tunnel (like openvpn) can continue to run with one endpoint having changed it's IP address, and the traffic inside the tunnel will not even notice the change.

NPTv6 can be enabled as a component of ipfw. It works similar to NAT, but does only change the address prefix (usually from a site-local one to the current dynamic GU prefix), not the individual host suffix and not the port.

When using a dynamic prefix, it obviousely cannot be placed literally into the configuration. Instead, some interface must be configured which carries an address with the desired prefix, and NPTv6 will monitor that interface for changes. The simple approach would then be to have dhcpcd provide a prefix delegation onto that interface. Usually we will also need to allow that prefix in some firewall rules, and therefore put it into an ipfw table via the hook scripts. But a few fancies are to be consider then.

dhcpcd runs the hook-script after changing the interface

This induces a race condition: when the interface gets changed, NTPv6 will use the new address, but the ipfw rules will still react on the old address from the table, and traffic might get rejected. So to do this correctly, we need to disable the prefix delegation for that interface in dhcpcd, and then in the hook script

  • add the new address to the interface as an alias (not yet to be used) and to the ipfw table
  • then remove the old address from the interface
  • and then remove the old prefix from the ipfw table.

NTPv6 does not simply change the prefix

Only changing the prefix would also change the IP header checksum and probably invalidate the packet. Therefore an algorithm is used in NTPv6 to also change the host part of the address in a way so the checksum stays the same, as documented in RFC 6296.

In the firewall rules it may be necessary to match a packet before NTPv6 (to decide that it should go to NTPv6) and after NTPv6 (to do forwarding or similar). The IP addresses are then different, and stateful rules must also be separate ones. To make this work with dynamic prefixes, this approach does work:

  • when loading the firewall rules, collect all the internal IP addresses that are used with NPTv6 and put them into a table with sequential numbering.
  • whenever the dynamic prefix changes, compute the external representation of these addresses with the new prefix, according to the algorithm from RFC 6296. Put these in another table with the same sequential numbering.
  • In the rules after NPTv6 has changed the addresses, instead of the IP addresses do lookups into that second table for the respective sequential numbers and retrieve the correct external representation of the addresses.

This approach does not work when the respective addresses are already managed in tables.

Appendix A: Helpful Commands

ifconfig and netstat -rn

These work as with IPv4, showing the interface configurations and route tables.

ndp

This is the replacement for ARP. Since IPv6 does use neighbour discovery instead of ARP, this command can be used to show and manipulate the current cache of MAC address translations. The cache is shown with -a, or an individual address can be stored with -s

ifmcstat

This command shows the configured IPv6 multicast groups.

ip6addrctl

Steers the preference with which various types of addresses may be used for a connection. The preference is described in a so-called "policy table". This also controls whether the machine will prefer IPv4 or IPv6.

Appendix B: ipfw Tables

baseifs  -> IPs on local interfaces (only host addresses)
  10    ip6 private             (manual)
  11    ip6 dynamic             (per hook script)
  12    ip6 linklocal           (per hook script)
ip6pref  -> prefixe for this router (only networks)
  0     distributed downstream
  IA_ID received from upstream

Appendix C: Timeline

Timestamp System Event
22:59:00 outbound kill -INT `cat /var/run/tun0.pid`
22:59:00 outbound ppp[424]: tun0: Phase: Caught signal 2, abort connection(s)
22:59:00 outbound ppp[424]: tun0: IPV6CP: deflink: LayerDown: fe80::41d:92ff:fe41:cd12
22:59:00 outbound ppp[424]: tun0: Phase: deflink: Enter pause (3) for redialing.
22:59:03 outbound ppp[424]: tun0: Phase: deflink: Connected!
22:59:04 outbound ppp[424]: tun0: Command: dsl: shell /sbin/ifconfig INTERFACE inet6 accept_rtadv
22:59:04 outbound ppp[424]: tun0: IPV6CP: myaddr fe80::41d:92ff:fe41:cd12 hisaddr = fe80::ee13:dbff:fe17:2b3d
22:59:05 outbound dhcpcd[21126]: received SIGUSR1, renewing
22:59:05 backbone dhcpcd[20079]: tap0: deleting address 2003:e7:1744:80ff::2/64
22:59:08 outbound dhcpcd[21126]: tun0: delegated prefix 2003:e7:1744:c300::/56
22:59:09 backbone dhcpcd[20079]: tap0: adding address 2003:e7:1744:c3ff::2/64
22:59:09 outbound dhcpcd-run-hooks[21320]: tun0: dhcpcd-pd write dhcps-config
22:59:10 backbone dhcpcd-run-hooks[59289]: tap0: dhcpcd-ra ipfw add 2003:e7:1744:c3ff::/64 0
22:59:11 backbone dhcpcd[20079]: received SIGUSR1, renewing
22:59:12 backbone dhcpcd[20079]: tap0: delegated prefix 2003:e7:1744:c3e0::/60
22:59:13 backbone dhcpcd-run-hooks[59329]: tap0: dhcpcd-pd ipfw add 2003:e7:1744:c3e0::/60 1
22:59:13 client inet6 2003:e7:1744:80e0::23 prefixlen 64 deprecated autoconf
22:59:15 client inet6 2003:e7:1744:c3e0::23 prefixlen 64 tentative autoconf
22:59:16 client inet6 2003:e7:1744:c3e0::23 prefixlen 64 autoconf