Recently, I transitioned from using Zerotier as VPN gateway on a Mikrotik router to setting up a Tailscale subnet router on an LXC within my LAN. The process was relatively straightforward, with one exception: since the VPN gateway was no longer on the router itself, I had to add a static route to direct traffic for the destination network (192.168.21.0/24
) to the Tailscale LXC (192.168.20.10
).
While this setup worked, I encountered a significant problem: every TCP connection took 8 seconds to initiate.
Using Wireshark, I observed that the TCP SYN + TCP SYN ACK appeared normal in the target network, except that it was repeated three times due to SYN retransmissions. However, on the source side (192.168.20.9/24
), the first two handshake attempts didn’t receive the SYN ACK.
After extensive searching, I came across a forum post that explained the likely root cause:
I’m pretty sure you’re victim of “routing triangle”: when
192.168.20.9/24
host initiates connection towards192.168.21.0/24
, it sends packet to its default gateway (192.168.20.1
).That MT [Mikrotik] takes a note in its connection tracking state and forwards packet to next hop router (WG concentrator [Tailscale subnet router] at
192.168.20.10
). Then the packet proceeds to the destination. Destination replies, packet arrives at WG concentrator which notices that destination address is in directly connected subnet and delivers it directly. Reply packet thus bypasses main router and its connection tracking machine can’t update connection state properly.Next packet, sent from
192.168.20.9/24
host, is thenout of perceived[perceived to be invalid given the] connection state and is dropped due to being invalid.
There are several ways to resolve this issue. The easiest is to allow invalid connections between trusted subnets and interfaces.
/ip firewall filter add \
action=accept \
chain=forward \
src-address=192.168.20.0/24 \
dst-address=192.168.21.0/24 \
in-interface=bridge \
connection-state=invalid,new \
comment="Tailscale - static routing triangle fix, forum 171177"
This leaves one question unanswered - why did it work in the first place (if we ignore the 8 second delay)? And sadly I don’t know the answer. If you do, let me know!