Network Layer binding

Greetings FS community. Need some help understanding how Sofia profiles choose to bind to a network interface address.
This is on FS 1.10.7-5 on OWRT.
So we have a main profile and a “failover” profile (2). Each profile has set its sip-ip param, like:
<param name="sip-ip" value="$${xyz_ip_v4}"/>
(Sofia Configuration Files | FreeSWITCH Documentation)
and that “xyz_ip_v4” is different per profile, whereas for (1) we use the wan IP addr and for (2) we use wan2 (failover) IP addr.
This works in the sense that the SIP headers (Contact, Via, etc) are properly set according to the “xyz_ip_v4” IPs, i.e. it works at the Application layer.
However at the Network layer the IP packet is always using Src IP of the wan. For the 2nd profile we want the IP packet to bind to wan2 regardless of whether wan is connected or disconnected (failover case), so the REGISTER msg (and whichever else) go out on wan2 all the time.
The only thing that seems to influence the choice of one or the other for the Src IP of the IP packet the SIP message is encapsulated is the main routing table. Sofia seems to chose whichever route as defined by the main route table, and since wan has a lower metric that is chosen.
We proved this by for instance temporarilyng deleting the route for the wan interface, then restarting the 2nd Sofia profile, then the SIP Registrations do go via the wan2. But this is a bad hack as solution of course.
The question is whether there is any parameter that can influence the decision to use one or another route.
Note: We use mwan3 for failover, so we could specify one of its route tables if that would be possible.
BTW, we also tried <param name="sip-ip" value="interface:[auto|ipv4|ipv6|]/eth0"/> to no avail.
Many thanks,
Pablo

You have to either set it, or it will default to the one that has internet access. Those variables are auto set from the core, if you want something else you’ll need to specify them in vars.xml or in the profile params directly.

What does ‘global_getvar’ show in its output?

/b

Thanks Brian…
What variables would those be?
here’s the getvar (I blanked some stuff)

hostname=<blanked>
local_ip_v4=75.99.<blanked>
local_mask_v4=255.255.<blanked>
local_ip_v6=::1
base_dir=/usr
recordings_dir=/tmp/freeswitch/recordings
sound_prefix=/mnt/kd/fs/sounds
sounds_dir=/mnt/kd/fs/sounds
conf_dir=/etc/freeswitch
log_dir=/tmp/freeswitch/log
run_dir=/tmp/freeswitch/log
db_dir=/tmp/freeswitch/db
mod_dir=/usr/lib/freeswitch/mod
htdocs_dir=/usr/share/freeswitch/htdocs
script_dir=/mnt/kd/fs/scripts
temp_dir=/tmp
grammar_dir=/usr/share/freeswitch/grammar
fonts_dir=/usr/share/freeswitch/fonts
images_dir=/usr/share/freeswitch/images
certs_dir=/etc/freeswitch/tls
storage_dir=/tmp/freeswitch/storage
cache_dir=/tmp/freeswitch/cache
data_dir=/usr/share/freeswitch
localstate_dir=/var/lib/freeswitch
us-ring=%(2000,4000,440.0,480.0)
sounds_base_dir=/mnt/kd/fs/sounds/
recording_base_dir=%{sounds_base_dir}cprompts/
socket_password=<blanked>
domain=<blanked>
gt=>
lt=<
s2s_ua=<blanked>
s2s_mac=<blanked>
s2s1_ip_v4=75.99.<blanked>
s2s2_ip_v4=146.152.<blanked>
zrtp_enabled=false
core_uuid=59b5f22d-7bb7-435d-9d6f-fb4ce5fd509c

Needless to say we want to use s2s2_ip_v4 as the network-layer IP socket for this failover profile.

This may be due to default routing on the system possibly, I would check at the network layer to ensure that you have your routes set properly and that the wan and failover profiles have their IPs properly set as expected, and that netstat -na shows it listening as intended.

/b

Thanks again @BrianWest-SW
This seems to confirm there is no Sofia configuration to change this behavior and must be handled via layer 3 manipulation (routing tables, etc.)
Or patch the FS code :roll_eyes:
Pablo

Its more than just that… you’d have to dig in and see what exactly you have at the OS level when the fail over takes place, then compare.

/b

@BrianWest-SW I got a breakthrough!
One of the things I noticed is that one of the profiles uses a FQDN for “proxy” in its gateway configuration; that profile actually worked in the sense that the REGISTER msg are bound to the failover (wan2) on main iface failure (Failover).
We have two more but those didn’t work, in that again, the Src IP is always bound to the primary (wan).
I was reviewing that with a colleague who’s very knowledgeable in FS matters and he seemed to recall seeing a bug where the wrong Src binding would happen when instead of a name, an IP address is passed instead.
Sure enough those two “bad” profiles have configured “proxy” as IP addresses!
So I went ahead and created bogus DNS names for those in /etc/host, restarted the Sofia profiles, then caused a failover and voia! Now the REGISTER messages for all of them go the proper route! i.e. the REGISTER msg IP packet Src address is the one of the wan2.
Have you heard of such bug?
Many thanks,
Pablo

Its probably more to do with DNS cache, You may want to contact sales@signalwire.com for FreeSWITCH Enterprise support.

/b

Apparently the trick was passing the “correct” IP address (in this case of the secondary interface) as the GW “from-domain” entry. If you look in (mod_sofia) mod/endpoints/mod_sofia/sofia.c you’ll see e.g. this:

gateway->register_from = switch_core_sprintf(gateway->pool, "<sip:%s@%s>",
                                 from_user, !zstr(from_domain) ? from_domain : proxy);

If the parameter isn’t set, “register_from” will default to “proxy” but the proxy is the remote end-point address, and so it appears that the route it chooses is almost random (I’ve seen it to sometimes get it right); when passing the proper local IP in the parameter the IP route is properly chosen.

Interestingly enough, it uses the proper network-layer route irrespective of whether the primary interface is up or not, which is the behavior we had in the FS 1.6 version (even when we didn’t have the “correct from_domain”).

Best,
Pablo

I’m pretty sure we didn’t change this behavior at all since 1.6.

/b

@BrianWest-SW the code in mod-sofia, as it relates to “from_domain” is really identical between 1.10 and 1.6, so you are right on the $$; however there is clearly a difference in behavior, so I presume the changes are in Sofia itself but I can’t tell what the former sets in the latter.
I mean this is now working fine in 1.10 so I’m happy about that, but it’d be nice to get the full picture, some day :roll_eyes:
Thx,
Pablo

Please file a github and do a PR.

/b

Long time no talk @BrianWest-SW
After your last comment as it relates to “from_domain” I went back and sure enough you were right on the $$. No changes that I could find… so this didn’t make sense.

And lo and behold after a handful of restarts, the problem resurfaced. I then realized that all my attempts to correct this via some Sofia configuration were noise, and that if I restarted/rebooted 10 times, I had a 50/50 chance that the bind would work properly.
I could tell if the bind was appropriate because the failover profile would bind to the failover wan (wan2) irrespective of the failover condition (i.e. even if NOT in failover the bind is to wan2 interface) which is the older behavior.

I started looking into possible kernel “issues”; the older system was on 3.x this is on 5.10. It’s been a long while since I worked on Linux sockets last, but I found some ppl who seemed to have similar issues, though not specific to Sofia, but with socket binding.

TL;DR this issue does not seem to be Sofia or FS for that matter. So far this has been working: After the socket binding (tport_bind_socket()), make an explicit setsockopt(); snippet:

+--- a/libsofia-sip-ua/tport/tport.c
++++ b/libsofia-sip-ua/tport/tport.c
...
++  struct ifreq ifr;
...
++        if (setsockopt(s, SOL_SOCKET, SO_BINDTODEVICE, (void *)&ifr, sizeof(ifr)) < 0) {
++             SU_DEBUG_3(("socket: %d setsockopt(SO_BINDTODEVICE) error: %s\n",
++                 s, su_strerror(su_errno())));
++        }

We have this working reliably on a couple of systems, for more than a week; not yet declaring victory but I have high confidence.

Pablo

Can you file a github PR for this?

Will definitely do in the next few weeks!
Pablo