NAT shall not be used with IPv6, but providers continue to supply customers with globally routed (GU) addresses that change dynamically. While this may work for a single endpoint or even a single flat LAN, as soon as our site is structured into different subnets, connected by various routers, changing all the addresses dynamically becomes somehow difficult.
We draw a possible solution that manages to adjust all the dynamic GU IPv6 addresses throughout a site network with hierarchical subnetting within less than 25 seconds, utilizing the router advertisement daemon and the Roy-Marples-implementation of the DHCP client.
All configurations are based on contemporary FreeBSD (Release 12.3 and 13.1 at the time of writing).
Intended audience: engineers designing an IPv6 layout for multi-hop structured LANs.
Our provider supplies a PPPoE connection that can be decoded with ppp
from base.
ppp can handle IPv6 when enable ipv6cp
is set (this is the default). After creating the tun
device, ppp puts a link-local address on it. As the tun
device does not have a MAC address, the one from the first interface on the system is used to compute the EUI-64 for SLAAC. This is done by code within ppp
, so there is no need to set AUTO_LINKLOCAL
.
Next, a globally routeable address should be obtained via router advertisement from upstream. Therefore ACCEPT_RTADV
is needed on the tun
interface. But ppp
creates the interface ad-hoc, and even if we would pre-create it to set that option, ppp
insists on destroying it on termination, so at a next start the option would be gone.
We can, however, set that option from the ppp.linkup
script, which is run before the final stage of IPv6 configuration. (We could also set it at a later time, but then we might have to wait some ten minutes until the next router advertisement comes in.)
provider:
shell /sbin/ifconfig INTERFACE inet6 accept_rtadv
INTERFACE
is a keyword here - ppp
will substitute the devicename used.
We will also need a default-route; this can be set in ppp.conf
alongside with the IPv4 defaultroute:
provider:
add default HISADDR
add default HISADDR6
With this we have a functioning IPv6 configuration on our router. We could now, for instance, remove the -4
option from a named
installation on the router, and named
would happily start to do queries via IPv6. (It seems that this already fixes some occasional resolver failures with one of my cloud hosts.)
What happens if our uplink does occasionally fail and ppp
needs to reconnect? Or if we send sigINT to make ppp
disconnect and reconnect? In that case the tun
interface does not get destroyed, and the IPv6 addresses continue to exist. After reconnect the upstream router advertisement will provide a new routeable address - but nobody will remove the old one! This would be the task of the upstream route advertiser, to keep track and invalidate the former routes, but it doesn't work that way with largescale providers.
So that address(es) will linger on, and be in no way different than the current one, and applications will choose one of them, the one that works or the other, and services will suffer.
So we need to clean up the mess. Since ppp.linkup
runs before the IPv6 address is assigned, we can do it from there. (But don't delete the linklocal address. ppp will be angry when it is missing!)
provider:
shell /sbin/ifconfig INTERFACE inet6 accept_rtadv
shell /etc/ppp/ipv6unconfig INTERFACE
#! /bin/sh
#
# ggfs. alte ipv6 löschen
INTERFACE=$1
ifconfig $INTERFACE inet6 | \
awk '$1 == "inet6" && ! match($2, /^fe[8abc]/) { print $2 }' | \
while read ADDR; do
ifconfig $INTERFACE inet6 $ADDR delete
done
In the following discussion the descriptions will be based on this sketch of the site layout:
-------------------- ------------------------
other subnets ----| tap0 |------| vtnet0 tun0 |---- ISP
with hosts (servers | backbone | | outbound |
and clients) and ----| router | | router |
potentially other | | | |
routers ----| | ------------------------
--------------------
The IPv6 addresses configured on tun0
have a netmask (named prefixlen in IPv6) of /64
. This is standard, in IPv6 the addresses do normally have a 64 bit host part. This gives room for lots of trillions of machines per network, and we could certainly split this up and supply our whole environment with subnets made from it. But that wouldn't be clean, it would be against the default, and it would need an explicit configuration in a lot of tools.
Furthermore, since we get dynamic addresses, we would then have three parts of an address: the highest 64 bits which dynamically change after a new dialout, a second part of, say 48 bit that is our internal network and should stay constant, and the final 16 bits for the host. To make the routing between subnets work we would therefore need a prefixlen /104
on the interface, while at the same time we would need to dynamically distribute new prefixes with a prefixlen of /64
- and this does just not work with rtadvd
: when it distributes a /64
prefix, it also sets the prefixlen on the interface to 64.
So there would be a lot of pain and hackery involved.
Instead, the provider supplies us with another prefix of /56
(independent from the GU address received for tun0
), which is also routed to us (and which does also change dynamically). And this can be split into 256 subnets for our needs, in the fashion as this is commonly done.
This additional prefix can be obtained with DHCP. (I do not know if there is an alternative way, like some json API or similar, to obtain it.) Usually a DHCP client would request that prefix, obtain it just like it obtains a lease, then automatically split it into subnets (according to some configuration), and configure these subnets onto the local interfaces.
And that would make you going: the subnets then have a network address accoding to the current prefix delegation, and can talk to each other and the outside.
Only, in our layout the subnets are not on the outbound router, they are on the nexthop, the backbone router! There is only one subnet to configure on the outbound router, vtnet0
, that connects the nexthop.
Site layouts may be different or more complex, but the general problem is always the same: how do we move these prefixes onward to the place where we need them?
But, first things first, lets see how the vtnet0
interface gets configured.
There is a couple of DHCPv6 clients available in the ports tree:
net/dhcp6
ppp
(and so make it disconnect and reconnect), and subsequently send sigHUP to dhcp6c
(to make it renew its lease), then a new (and different) /56
prefix will be obtained, and the IPv6 address on vtnet0
will be changed accordingly - so far this is as expected - but: the provider does not route that address! Data is sent into nirvana. dhcp6c
daemon, the very same prefix will be obtained again, but now the provider does route it.net/isc-dhcp44-client
Unsupported device type 23 for "tun0"
. This is understandable - tun0
is a point-to-point device, and DHCP (v4, that is) does make sense only on broadcast devices. So, end of story - next one please.net/dhcpcd
Let's look at the config:
ipv6only
nohook resolv.conf, hostname, ntp.conf, test
duid
require dhcp_server_identifier
persistent
slaac hwaddr
allowinterfaces tun0 vtnet0
noipv6rs
interface tun0
option rapid_commit
ia_pd 1 vtnet0/255/64/1
ipv6only
- by default interfaces are supplied with IPv4 addresses, too. Set this if you don't want strange169.x.x.x
addresses to appear.
nohook
- a bunch of scripts is run by default, which do tamper with hostnames, DNS configs and similar things. Better to switch them off.
duid
- that is the unique identifier for our host on the server. It should be derived from/etc/hostid
.
require
-dhcp_server_identifier
is a mandatory component of DHCP messages. Ignore messages without it.
persistent
- not required, and it is a difficult decision:
- without
persistent
, whendhcpcd
is stopped, the assigned addresses on the interfaces will be removed. So we will loose connectivity when only restartingdhcpcd
.- with
persistent
the assigned addresses do stay. So when we restart dhcpcd and ppp, we get new addresses, but the old ones will still linger around (and make connectivity to the new actual owner of these addresses fail)Neither is optimal - but then there is no way to milk the cow and eat it.
allowinterfaces
- by default all interfaces are processed. Limit this here to the one that is connecting to the provider, plus those that shall get configured.
noipv6rs
- by default router advertisements are asked for on all allowed interfaces, and the interfaces are configured accordingly. (Since this conflicts with the kernel already doing the same whenACCEPT_RTADV
is set on the interface, that option will also be removed.) This config stance disables it and can be set per interface and/or globally.
interface
- this is our outbound interface where we query for prefix delegation
option rapid_commit
- tells the server to reply immediately
ia_pd 1 vtnet0/255/64/1
- request prefix delegation,1
is a number that must be different for other requested prefixes. The third pattern means: configure the 255th subnet with prefixlen 64 ontovtnet0
and use1
as the host suffix. Since we receive a/56
prefix, the 255th subnet with/64
means the highest subnet, i.e. when the received prefix would read1234:5678:90ab:cd00::/56
, then the address would become1234:5678:90ab:cdff::1
. Using0
as the host part would enable SLAAC/EUI-64. Further patterns like this one can follow if there are more interfaces present which should get an address.
With this config the DHCP client will only contact the upstream DHCP server, obtain the delegated prefix, configure an address from the highest subnet onto vtnet0
, then store the obtained prefix as a lease and renew that from time to time. And when the prefix changes, it will change the configured address accordingly. It will not do any other things, specifically not configure any routes.
To become active, the DHCP client needs to be signaled from ppp
when a new connection is established. This can be done ffrom ppp.linkup, but ppp runs this before completing IPV6CP. Therefore we run a shell script with bg
(background) to wait until an IPv6 address actually appears on the tun
interface:
provider:
...
bg /etc/ppp/ipconfig INTERFACE MYADDR
#! /bin/sh
#
# Wait for new IPv6, then inform involved parties
INTERFACE=$1
MYADDR4=$2
while test -z "$MYADDR6"; do
# check for GU addresses only (old ones have been removed by ipv6unconfig)
MYADDR6=`ifconfig $INTERFACE inet6 | \
awk '$1 == "inet6" && ! match($2, /^fe[8abc]/) { print $2 }'`
sleep 0.5
done
if test -f /var/run/dhcpcd/pid; then
/usr/local/sbin/dhcpcd -N
fi
# do other things here, like reconfigure firewall, DNS etc.
While the router advertisements (that provide our address on tun0
) work with the ICMP6 protocol, DHCPv6 uses UDP, ports 546 and 547, over the linklocal/multicast addresses. As this is specific outbound/inbound traffic, it may need to be enabled in firewall rules.
After having assigned the first network to vtnet0
, there remain 255 other networks available from the delegated prefix. But these are of no use on the outbound router and must be moved further into the LAN. The tasks are:
vtnet0
for those subnets to route them inboundWhile there are probably various means to achieve this, we considered it most straightforward to hold on to the already established way and use DHCP again. Using net/dhcp6
to install a DHCP server on the outbound router was, while not bugfree, sufficiently successful this time.dhcp6s
reads a simple config file to know about the prefixes to distribute; this file has to be rewritten and dhcp6s
reloaded when a new prefix is received from upstream.
A possible downside is that dhcp6s
can be given only one interface to work on. If there were more interfaces attached to the outbound router, it might be possible to run multiple instances. These would then need to use different ports for their control interface, so this should be configurable in our scripts.
The dhcp6s
config file /usr/local/etc/dhcp6s.conf
supports include
statements. So we could include our dynamically rewritten config into the main config file and do not need to change data within /usr/local/etc
(which is not a good thing). But sadly this does not work: when sending a dhcp6ctl -S reload
command, the config does not get properly reloaded, but instead, when hitting the first include statement, dhcp6s
produces a coredump.
We need to write the entire config file into /var/db
, and configure dhcp6s
accordingly in /etc/rc.conf
:
dhcp6s_enable="YES"
dhcp6s_interface="vtnet0"
dhcp6s_config="/var/db/dhcp.provider-prefixe"
The obtained /56
prefix can be split into smaller prefixes in any desired way, just like usual subnetting, e.g. 2x /57
or 256x /64
. While the latter may seem the most generic way, it has a downside: DHCP uses a pool concept (first come first serve), and so the same internal subnet may not always get the same SLA number in subsequent runs, depending on subtle timing differences. And when the subnets are laid out in some structure (e.g. into different security zones), then having bigger chunks and letting the subsequent routers distribute them further is easier to administer.
Since one subnet is already consumed for vtnet0
, we chose this simple algorithm (which could also be used for a /48
or /52
prefix):
/64
/62
/60
/58
The DHCP client dhcpcd
does run external hook scripts at every event. We have disabled the default hook scripts that are included in the distribution, but we can write a custom hook script and place it into /usr/local/etc/dhcpcd.enter-hook
, and that will also be run.
There we must select the proper kind of event (prefix delegation), detect when our delegated prefix has changed, and then
dhcp6s
config file from thatdhcp6s
to reload it's configvtnet0
In order to remove the old routes on vtnet0
, we need to know the old prefix. dhcpcd should provide to hook scripts the old and new prefix in the environment - but see below.
It is not enough to make dhcp6s
reload the config file: after dhcp6s
has given out a prefix to a client, it stores that binding internally in memory, and even after reloading the config, the old binding will continue to be confirmend to the client, i.e. the client will receive the old prefix. It is necessary to remove those bindings as well, and to identify them we need the client's duid
(which is written into the config file also) and the correct iaid
number used by the client, which is configured in the client's config file. The configuration of the client and the information used in our hook script must therefore match.
Reconfiguring is done with the dhcp6ctl
command (see manpage).
After ppp
has established (or re-established) the uplink, the DHCP client dhcpcd
gets signaled to fetch a new delegated prefix from upstream. There are two possible options to do that, -N
or -n
(there is a third option -g
which appears to do nothing at all).
Sadly, both do not work properly. With -N
a BOUND6 event occurs and the environment contains only the new prefix - but without knowing the old prefix, we cannot remove the old routes (and neither detect if it has changed at all). And when sending -n
, we get a EXPIRE6 and subseqent REPLY6 event and can properly remove and recreate things accordingly. But in this case dhcpcd
fails to remove the old address from vtnet0
(for whatever reason or maybe a flaw?), and these old addresses will continue to linger on. (This could be workarounded by sending -N
first, and then after a second sending -n
alongside.)
We therefore chose a different approach: instead of relying on dhcpcd
to properly provide old and new prefixes, we store the current prefix into the firewall.
The ipfw
firewall has a concept called TABLES. These are storage space designed to store things like temporary IP addresses. They are easily accessed, they are not persisted to disk, we do not have to bother with possible file read or disk full issues, they can even be accessed atomically - and they are supposed to be used with firewall rules, where we might need exactly that prefix information anyway!
When we delegate sub-prefixes further into our LAN, we also need to add routes on vtnet0
pointing these addresses to the inside. To create these routes, the nexthop gateway address is required.
In IPv4 when you configure a network route, you tell the
route
command the nexthop gateway to which packets should be sent. This is just the same with IPv6, but with a twist:
With IPv6 every interface has a linklocal address that starts withfe80:
. These serve a similar purpose as broadcasts on Ethernet: packets sent via them can only traverse to the directly connected hosts, they are never routed onwards. Since IPv6 does not use ARP, it uses these addresses to figure out the MAC adresses of neighboring hosts, and similar tasks.
And these can also be used for routing: When you know that your nexthop gateway has a linklocal address offe80::2
, it is perfectly valid to configure a route like this:
route -6 add -net default -gateway fe80::2%vtnet0
The
%vtnet0
is necessary for linklocal addresses: since they are linklocal, the same address could exist on different directly attached networks, and we need to specify to which network this should go. (It is also visible in the output fromnetstat -rn
.)
This construct is specifically of advantage when using dynamic addresses: The routeable address on the interface may change, but the linklocal one will not, and when configuring it that way we do not need to change the route alongside.There is a possible issue with this: when you have a GU address configured only on one side of a link, that side might send solicitations from that address - but the other side cannot know that GU address or it's network, and will therefore just drop the request (for security). There is a
sysctl
optionnet.inet6.icmp6.nd6_onlink_ns_rfc4861
to change this and always accept the solicitations.
So we can use the linklocal address of tap0
on the backbone router for the nexthop of our routes. Instead of figuring out what EUI-64 number that interface might get (as it is a tap
interface, it has an artificial MAC not in hardware - but if it were in hardware, it would change when you replace the hardware), we use a simple number 2
as the host part. This can be done in the normal way with ifconfig tap0 inet6 fe80::2%tap0
, from rc.conf
(or whereever the interface gets created).
Since some configuration information is required for the prefixes to be distributed, we split this into a configuration part and a code part. The configuration part is installed as the hook script /usr/local/etc/dhcpcd.enter-hook
, while the code part is then included from there.
The code part can be retrieved here, and the config part looks like this:
# dhcpcd.enter-hook script
# BEWARE: multiple ia_pd lines (with different numbers) can be given in
# dhcpcd.conf in the same interface block, and they will be separately
# requested from the server, but here in the environment all the
# prefixes will be collected under the LAST one of the numbers!
# (Maybe a bug - consequentially the separation via the "from_iaid"
# below does currently not work.)
IAIDs="1" # ia_pd IDs requested from server (but see comment above)
DO_ROUTES=1 # create routes for delegated prefixes
DO_DHCP6S=1 # create/update config for dhcp6s
RENEW_ON_RA=1 # fetch new PDs when our address changes
# Filename of the generated config for dhcp6s
DHCPSCFG=/var/db/dhcp.provider-prefixe
# content of the dhcp6s configfile to write (leave empty if not required)
# - host: name of our client, for reference only
# - interface: our interface where the client connects us
# - route: our nexthop gateway to the client (needed to create a route)
# - controlport: where we can reach dhcp6s (different for multipe interfaces)
# - from_iaid: ia_pd number we use against our server (but see comment above)
# - iaid: ia_pd number our client uses against us
# - prefixlen: the length of this prefix
# - expiry in seconds
# - duid: duid that our client uses to identify (usually their /etc/hostid
# prefixed with 00:04:)
#
DIST='
default
interface vtnet0
controlport 5547
route fe80::2%vtnet0
from_iaid 1
duid 00:04:e4:79:62:c7:2e:bc:81:de:49:7a:00:e0:cd:f4:15:4b
prefixlen 60
time 14400
host backbone
iaid 1
host backbone
iaid 2
host wlanr
duid 00:04:a4:79:42:d1:2e:ab:81:de:33:7a:00:e0:a1:f4:42:51
prefixlen 64
iaid 1
time 7200
'
DO_IPFWTBL=1 # fill extra tables for ipfw
SLA_MAX=8 # bit-offset for iaid in ipfw-table values
TBL_DGET=ip6dgd # name of ipfw table used for storage
TBL_DPUT=ip6dgg # name of ipfw table used for storage
TBL_UPLK=ip6dup # name of ipfw table used for storage
TBL_BASE=baseifs # name of ipfw table used for storage
# NPTv6 and it's sentinel-interface (iaid/sla/prefixlen/suffix)
DO_NPTV6="vtnet1"
DELEG_vtnet1="1/1/64/1"
# actual code gets included from that file (adjust path as appropriate)
. /ext/libexec/dhcpcd.enter-hook
This will be invoked automatically when placed into /usr/local/etc/dhcpcd.enter-hook
.
Having the outbound router configured, we can now install and run dhcpcd
on the backbone router in the same way as we did before.
But herenow appears the core problem of the whole endeavour: how does this downstream router get notion when the prefix has been changed upstream?
With DHCP, while occasionally asked for, this seems not to be a supported function. Usually it is suggested that one should make the lease time very short to have the client frequently renew the delegation. But then this would still be in the range of minutes. There is also RFC3203 that describes a FORCERENEW method to realize the function, but, according to this discussion, no implemenation of that is known. The bottomline is: DHCP is pull-only.
However, there is another option. After a new prefix has been received by the outbound router, that DHCP client will also put a new IP address onto it's vtnet0
interface. When a rtadvd
is running on the outbound router, it will automatically pick up that change, and will message the new subnet, as a router advertisement, to the machines on the local link - which here is just the backbone router. And in consequence the IP address on the tap0
interface of the backbone router will also automatically change.
This acting upon reception of router advertisements is normally done by the kernel, according to the ACCEPT_RTADV
option on the interface. But it can as well be done by an extra program. Specifically, the Roy-Marples-implementation of the DHCP client does already contain code to do that - the only thing required is to remove the ACCEPT_RTADV
option from the interface and instead activate the ipv6rs
option in the interface section of dhcpcd
.
Then, dhcpcd
does invoke hook scripts on events, and this does includes the ROUTERADVERT event. So the only thing we need to do is, add handling for the ROUTERADVERT event to our hook script, and there detect when the address has actually changed, and then trigger dhcpcd
to renew the lease. (Renewing the lease will make dhcpcd
send a router solicitation, and consequentially receiving another router advertisement, so care must be taken to trigger this only when the IP address has actually changed. Otherwise an endless loop is created.)
For this to work, rtadvd
must be enabled to run on the outbound router in /etc/rc.conf
:
rtadvd_enable="YES"
rtadvd_interfaces="vtnet0"
On the backbone router, our hook script does again store the received address into an ipfw
table, since on invocation of the hook script the address on the interface may already have changed and the old value no longer available for comparison.
dhcpcd.conf
is the same as before, with the only change that while still having noipv6rs
globally, we set ipv6rs
explicitely in the receiving interface section.
There is one more issue to be solved: how does the backbone router get a proper default route pointing back to the outbound router?
On the outbound router the default route gets configured from the ppp
configuration. We could configure it here manually, but that does not even seem necessary: it seems that the router advertisements received from the outbound router do already contain that route, supplied by rtadvd
! It just magically appears and there seems to be no issue.
Extending the setup to a third or further level of nexthop routers is then all the same again, but you probably do not need to setup another DHCP server. The KAME implementation contains a program dhcp6relay
, which should be able to pass requests on to the next subnet. So all requests could be sent to and handled by the first DHCP server on the outbound router.
Finally activating IPv6 on the clients is the simplest part, because there is almost nothing to do.
If you have the option AUTO_LINKLOCAL
configured on the interface (which is usually done by the sysctl
option net.inet6.ip6.auto_linklocal
which is usually enabled), and if rtadvd
runs on the nextmost router, the client will automatically get a GU address. You should, however, enable rtsold
in /etc/rc.conf
to make this more reliable:
rtsold_enable="YES"
To make IPv6 the default, ip6addrctl
is used. It should detect a configured IPv6 automatically and switch to that, but that does not always work, especially if you do not create your interface in the usual way from /etc/rc.conf
. Then the preference can be explicitely set in /etc/rc.conf
:
ip6addrctl_policy=ipv6_prefer
This seems to change the order of replies produced by getaddrinfo
, and therefore should work for all programs that use this. (It is not related to the sequence of queries to the nameserver or the output of the host
command.)
If you have a firewall on the client, there is an option firewall_client_net_ipv6
for /etc/rc.conf
, and this should get the network address of your LAN, so that traffic to/from the LAN does not get filtered. This is now a bit difficult as this is a dynamic address. There are two ways to solve this:
fd::/7
range. These stay constant and can be configured here (as they can also be configured into a LAN DNS or anywhere in the LAN where an address is needed). See below for a few gotchas with these.dhcpcd
and the hook script on the client also. It will then put the dynamic address as received from router advertisement into an ipfw
table - and you can refer that table in the option.
Finally we must assure that all the daemons will start up nicely when rebooting. In general this works, but it may spit out a bunch of error messages.
dhcpcd
will automatically set linklocal addresses on all the interfaces it does manage. This works when the interface is present when dhcpcd
starts. It does not work in case the interface appears later on (e.g. jail or guest interfaces that are created ad-hoc), the error messagis is error adding slaac to prefix_len 64
. The solution is to have AUTO_LINKLOCAL
enabled for these interfaces - then the kernel will create the linklocal address.rtadvd
expects a list of interfaces configured where it should work with. If these do not (yet) exist it will complain. Since jail
is started in rc.d
only after LOGIN
and rtadvd
before LOGIN
, it is probably best to pre-create the interfaces to be used by jails. ngbridge is a rc.d
script that does this (for netgraph users).rtadvd
tends to complain - probably because interfaces are disappearing into jails via the ifconfig vnet
function. This is probably not avoidable.dhcpcd
is not defined in the rc.d
script, and so rcorder
may put it anywhere. We want to start this before ntpdate
, so that ntpdate
can utilize IPv6.rc.d
script itself (the change will be deleted when updating the pkg
), we instead create two markers to control the startup sequence: IPV6READY and IPV6UP.Having dynamic IPv6 addresses all over the LAN is good for the systems to reach any IPv6 (-only) site on the Internet. It is not good for communication within the LAN, because the dynamic addresses are subject to change at any time, and cannot be put into config files.
So, in addition to dynamic GU addresses, we may want to also configure static private IPv6 addresses for communication within the LAN. But this is not mandatory as the internal communication can as well be done in IPv4. And there are some issues.
A problem with these private addresses is that their routes may strangely disappear when dhcpcd
is started.
Normally when we configure an address on an interface (with ifconfig
or from rc.conf
), we can see the address on the interface, and also two entries in the routing table:
tap0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1492
inet6 fd00::101 prefixlen 120
fd00::100/120 link#10 U tap0
fd00::101 link#10 UHS lo0
The first line is in accordance with the prefixlen
; it tells the routing to send addresses from that subnet out through the specific interface - and that one might disappear when dhcpcd
starts up. (The second line is the address itself, and is connected to lo0
. It tells the routing that this address is local to the system and shall be sent through lo0
.)
The problem appears for all interfaces, even for those that are not configured in dhcpcd
at all, and it happens as soon as any interface has ipv6rs
option configured (like we did above). Consequentially then nothing will be routed to that subnet, and so, nothing works. Worse, when there is traffic on the system, the system may now not have any means to get rid of the packets, and the buffers will fill up. The system will then soon run into error 55 ("no buffer space available"), and then nothing at all works anymore. Still worse, this condition will not resolve itself, even after the required route gets configured again. It will only resolve after the concerned addresses on the interfaces get removed and reinstalled (or when the interfaces are taken down and up).
It may be difficult to notice and pinpoint the issue - usually one does only perceive that traffic does not get answered although the required services are running. One can usually check it with ping6
:
$ ping6 test
PING6(56=40+8+8 bytes) fd00::102 --> fd00::101
ping6: sendmsg: No buffer space available
ping6: wrote test.daemon.contact 16 chars, ret=-1
For a workaround, at first I trued to have dhcpcd
maintain these static addresses as well. This can be done in the dhcpcd.conf
file with the static
option, like so:
interface tap0
static ip6_address=fd00::101/120
This does usually work - and watching closely, one can see that dhcpcd
still deletes the routes after start, and then, shortly after, reinserts them. But then I experienced the static IPv6 traffic stall when the outbound (nexthop) router (bhyve) is rebooted. dhcpcd
will notice that the outbound interface has disappeared and unconfigure that interface, and then reinitialize it when it reappears. During that reinitialization again the routes are removed, but it may then take 10 seconds to obtain (or not obtain) a new lease, and only after that timeout the routes are reinserted. And this is enough time for the internal traffic to stall entirely. and render the LAN nonfunctional.
The matter is, dhcpcd
does remove the routes intentionally, with the SIOCSPFXFLUSH_IN6
ioctl, in if-bsd.c:if_setup_inet6()
- only with FreeBSD the syscall is misinterpreted to remove the routes from all interfaces instead only the one given in the syscall (this is already discussed in https://githubmemory.com/repo/rsmarples/dhcpcd/issues/59). That doesn't make matters worse because it would remove the routes from the interfaces it delegates prefixes to, anyway, and it is these interfaces we want to put static addresses on.
The argument in the code comment is, that the routes should be flushed because the kernel might otherwise expire those that dhcpcd
tries to manage: I don't understand this, I could not find any mention that the kernel would even be able to do such:
There is indeed an expire
metric in the route entries, which is poorly documented, and there was a statement by Kevin Oberman in 2008 that it is no longer used. Besides that, there appear to be the pltime
and vltime
values as provided by the upstream dhcps
. These appear in the address when delegated onto an interface (visible with ifconfig -L
). When these values do expire, the address gets removed from the interface, and then the associated routing entries will disappear alongside, as is to be expected. And anyway, rtadvd
does ignore these values and propagate the prefix onwards with default lifetime (7 rsp. 30 days), unless otherwise configured.
All this should be independent from flushing any routes beforehand. So, from my understanding, under usual circumstances that flushing is problematic, for the given reason. It may well be useful for general cleanliness and housekeeping, to remove stale stuff from undefined operations. But it definitely does more harm than good here, and so I recommend to patch it out of the code, and then have the static routes configured in conventional fashion.
ip6addrctl
When configuring the static private addresses (from the fc00::/7
prefix), one may notice that they are still not used. This is because of the default ip6addrctl
setting for IPv6, where site-local IPv6 has a lower precedence (3) than IPv4 (35):
$ ip6addrctl
Prefix Prec Label Use
::1/128 50 0 0
::/0 40 1 0
::ffff:0.0.0.0/96 35 4 0
2002::/16 30 2 0
2001::/32 5 5 0
fc00::/7 3 13 0
::/96 1 3 0
fec0::/10 1 11 0
3ffe::/16 1 12 0
We can change this by providing a custom table with some lines added to give site-local IPv4 a lower precedence:
::ffff:10.0.0.0/104 2 4 0
::ffff:172.16.0.0/108 2 4 0
::ffff:192.168.0.0/112 2 4 0
With site-local static IPv6 addresses configured, NPTv6 is an alternative way of connecting to the Internet. The disadvantage of it is that it has to be configured into the firewall. The advantage is that such connections can persist even while the dynamic prefix changes. For instance, an UDP tunnel (like openvpn
) can continue to run with one endpoint having changed it's IP address, and the traffic inside the tunnel will not even notice the change.
NPTv6 can be enabled as a component of ipfw
. It works similar to NAT, but does only change the address prefix (usually from a site-local one to the current dynamic GU prefix), not the individual host suffix and not the port.
When using a dynamic prefix, it obviousely cannot be placed literally into the configuration. Instead, some interface must be configured which carries an address with the desired prefix, and NPTv6 will monitor that interface for changes. The simple approach would then be to have dhcpcd
provide a prefix delegation onto that interface. Usually we will also need to allow that prefix in some firewall rules, and therefore put it into an ipfw table via the hook scripts. But a few fancies are to be consider then.
This induces a race condition: when the interface gets changed, NTPv6 will use the new address, but the ipfw
rules will still react on the old address from the table, and traffic might get rejected. So to do this correctly, we need to disable the prefix delegation for that interface in dhcpcd
, and then in the hook script
Only changing the prefix would also change the IP header checksum and probably invalidate the packet. Therefore an algorithm is used in NTPv6 to also change the host part of the address in a way so the checksum stays the same, as documented in RFC 6296.
In the firewall rules it may be necessary to match a packet before NTPv6 (to decide that it should go to NTPv6) and after NTPv6 (to do forwarding or similar). The IP addresses are then different, and stateful rules must also be separate ones. To make this work with dynamic prefixes, this approach does work:
This approach does not work when the respective addresses are already managed in tables.
ifconfig
and netstat -rn
These work as with IPv4, showing the interface configurations and route tables.
ndp
This is the replacement for ARP. Since IPv6 does use neighbour discovery instead of ARP, this command can be used to show and manipulate the current cache of MAC address translations. The cache is shown with -a
, or an individual address can be stored with -s
.
ifmcstat
This command shows the configured IPv6 multicast groups.
ip6addrctl
Steers the preference with which various types of addresses may be used for a connection. The preference is described in a so-called "policy table". This also controls whether the machine will prefer IPv4 or IPv6.
baseifs -> IPs on local interfaces (only host addresses)
10 ip6 private (manual)
11 ip6 dynamic (per hook script)
12 ip6 linklocal (per hook script)
ip6pref -> prefixe for this router (only networks)
0 distributed downstream
IA_ID received from upstream
Timestamp | System | Event |
22:59:00 | outbound | kill -INT `cat /var/run/tun0.pid` |
22:59:00 | outbound | ppp[424]: tun0: Phase: Caught signal 2, abort connection(s) |
22:59:00 | outbound | ppp[424]: tun0: IPV6CP: deflink: LayerDown: fe80::41d:92ff:fe41:cd12 |
22:59:00 | outbound | ppp[424]: tun0: Phase: deflink: Enter pause (3) for redialing. |
22:59:03 | outbound | ppp[424]: tun0: Phase: deflink: Connected! |
22:59:04 | outbound | ppp[424]: tun0: Command: dsl: shell /sbin/ifconfig INTERFACE inet6 accept_rtadv |
22:59:04 | outbound | ppp[424]: tun0: IPV6CP: myaddr fe80::41d:92ff:fe41:cd12 hisaddr = fe80::ee13:dbff:fe17:2b3d |
22:59:05 | outbound | dhcpcd[21126]: received SIGUSR1, renewing |
22:59:05 | backbone | dhcpcd[20079]: tap0: deleting address 2003:e7:1744:80ff::2/64 |
22:59:08 | outbound | dhcpcd[21126]: tun0: delegated prefix 2003:e7:1744:c300::/56 |
22:59:09 | backbone | dhcpcd[20079]: tap0: adding address 2003:e7:1744:c3ff::2/64 |
22:59:09 | outbound | dhcpcd-run-hooks[21320]: tun0: dhcpcd-pd write dhcps-config |
22:59:10 | backbone | dhcpcd-run-hooks[59289]: tap0: dhcpcd-ra ipfw add 2003:e7:1744:c3ff::/64 0 |
22:59:11 | backbone | dhcpcd[20079]: received SIGUSR1, renewing |
22:59:12 | backbone | dhcpcd[20079]: tap0: delegated prefix 2003:e7:1744:c3e0::/60 |
22:59:13 | backbone | dhcpcd-run-hooks[59329]: tap0: dhcpcd-pd ipfw add 2003:e7:1744:c3e0::/60 1 |
22:59:13 | client | inet6 2003:e7:1744:80e0::23 prefixlen 64 deprecated autoconf |
22:59:15 | client | inet6 2003:e7:1744:c3e0::23 prefixlen 64 tentative autoconf |
22:59:16 | client | inet6 2003:e7:1744:c3e0::23 prefixlen 64 autoconf |