Unexpected freeswitch startup failures

I posted this in a subforum and didn’t see any responses, sorry if this is considered a dupe but I thought I’d try in the general forum. If I’m not doing something wrong as to not see the errors in failures like this I’m surprised no one else has noticed. Thanks…

I’m using Debian 12 and running the current freeswitch packages from the signalwire repo.

I’ve got an issue where sometimes freeswitch won’t start at boot, and there’s nothing in the logs that indicates why it failed.

I don’t know why it’s failing at boot, but it’s easy to replicate the issue of freeswitch failing without any indication of why. It’s kind of contrived, but say for example /etc/freeswitch/freeswitch.xml doesn’t exist. freeswitch fails to start:

pbx-01 ~ # systemctl status freeswitch.service
× freeswitch.service - freeswitch
Loaded: loaded (/lib/systemd/system/freeswitch.service; enabled; preset: enabled)
Drop-In: /etc/systemd/system/freeswitch.service.d
└─nochown.conf, order.conf, realtime.conf
Active: failed (Result: exit-code) since Sat 2024-06-29 19:50:22 PDT; 30s ago
Duration: 52.239s
Process: 6000 ExecStart=/usr/bin/freeswitch -u ${USER} -g ${GROUP} -ncwait ${DAEMON_OPTS} (code=exited, s>
CPU: 44ms

Jun 29 19:50:22 pbx-01 systemd[1]: freeswitch.service: Scheduled restart job, restart counter is at 5.
Jun 29 19:50:22 pbx-01 systemd[1]: Stopped freeswitch.service - freeswitch.
Jun 29 19:50:22 pbx-01 systemd[1]: freeswitch.service: Start request repeated too quickly.
Jun 29 19:50:22 pbx-01 systemd[1]: freeswitch.service: Failed with result ‘exit-code’.
Jun 29 19:50:22 pbx-01 systemd[1]: Failed to start freeswitch.service - freeswitch.

There’s nothing in the freeswitch log, the last entry is from being shut down before this test:

2024-06-29 19:50:06.562165 99.27% [NOTICE] switch_loadable_module.c:1573 Deleting Say interface ‘en’

Obviously, in this case, it failed before logging was configured, so nothing in the logs is expected. But in other cases, such as when it randomly fails at boot, there is still nothing in the logs.

There’s also nothing in the journal:

pbx-01 ~ # journalctl -xeu freeswitch.service

A start job for unit freeswitch.service has finished with a failure.

The job identifier is 3988 and the job result is failed.

The only thing that provides any detail as to what failed is running freeswitch by hand rather than via the service and leaving out the -ncwait option:

pbx-01 ~ # /usr/bin/freeswitch -u freeswitch -g freeswitch -nonat
2024-06-29 19:58:00.500758 0.00% [INFO] switch_event.c:714 Activate Eventing Engine.
2024-06-29 19:58:00.511267 0.00% [WARNING] switch_event.c:685 Create additional event dispatch thread 0
2024-06-29 19:58:00.511497 0.00% [ERR] switch_xml.c:1439 Couldn’t open /etc/freeswitch/freeswitch.xml (No such file or directory)
Cannot Initialize [Cannot Open log directory or XML Root!]

If the problem is reproducible (I’ve run into a few other config issues that make this happen other than the sledgehammer of whacking a main config file) this is a solution. But when it fails to start randomly, it’s not.

From what I can tell, when started via systemd:

Jun 29 19:59:59 pbx-01 systemd[1]: Starting freeswitch.service - freeswitch…
Jun 29 19:59:59 pbx-01 freeswitch[6112]: 6113 Backgrounding.
Jun 29 20:00:03 pbx-01 freeswitch[6112]: FreeSWITCH[6112] Waiting for background process pid:6113 to be ready>
Jun 29 20:00:03 pbx-01 freeswitch[6112]: FreeSWITCH[6112] Waiting for background process pid:6113 to be ready>
Jun 29 20:00:03 pbx-01 freeswitch[6112]: FreeSWITCH[6112] System Ready pid:6113
Jun 29 20:00:03 pbx-01 systemd[1]: Started freeswitch.service - freeswitch.

The process systemd runs spawns another process, and when that process is ready, the original process exits with success, and things work. But if that child process fails, the original process fails, but without any reason why.

Am I missing something? Should the initial output as seen by running freeswitch by hand show up somewhere when running as a service? Presumably it’s that child process that knows about it but I can’t find anywhere it might log it.

Thanks much for any insight as to sorting out what’s broken when freeswitch fails to start early and there’s no evidence of why…

Id start by starting FreeSWITCH manually and see if any errors happen. malformed xml can cause FreeSWITCH to not start.

Thanks for the response. The malformed XML was just an example to demonstrate how the freeswitch service can fail to start but not provide any evidence as to why.

My specific problem is that sometimes freeswitch fails to start at boot. When I start it after boot, whether as a service or in the foreground manually, it works fine.

I need a way to make the freeswitch service log why it failed to start, otherwise I have no way to sort out why it occasionally won’t start at boot.

Unless I’m missing a way to do that, this seems to be a deficiency in the systemd service implementation for freeswitch. Ideally if it failed to start before it was in a position to generate a log entry in its own log, it would emit something via stderr that would be captured and displayed by systemctl status.

Thanks…

I agree with Len that starting FS manually (as user ‘freeswitch’ if you do it that way) tells you much of what is going on here. I’m not sure why it’s intermittent, unless it has to do with networking weirdness affecting FS binding to a network interface.

I offer a few semi-related suggestions for your systemd unit file that might help:

[Unit]
Wants=network-online.target
Requires=network.target local-fs.target
After=network.target network-online.target local-fs.target

[Service]
PIDFile=/usr/local/freeswitch/run/freeswitch.pid

TimeoutStartSec=45
TimeoutStopSec=45
; Restart=always will ALWAYS restart FreeSWITCH even when you wish to stop it to debug and configure it
Restart=on-failure

[Install]
WantedBy=multi-user.target

I’m pretty sure the lines under [Unit] are how they appear in the repository, but if not FS might start before those subsystems are ready.

The repo PIDFile= points to /var/run (which is really /run nowadays) which only allows write by root. I compile FS from source which ends up in /usr/local/freeswitch so I just point to the /run directory under there which is writeable by user ‘freeswitch’. This has nothing to do with your problem, but is useful to do.

The timeout lines prevent FS from being killed and restarted so rapidly as to make it useless to troubleshoot. That’s what systemd was complaining about, among other things.

The Restart= line allows you to stop FS without systemd starting it behind your back.

The WantedBy= line is what actually tells systemd to start FS at boot time IIRC.

I have an ancient wiki page that lists a number of steps after I build FS from scratch to set permissions and such that might be worthwhile to you:

Thanks for the thoughts.

I agree with Len that starting FS manually (as user ‘freeswitch’ if you do it that way) tells you much of what is going on here.

I agree with both of you that it seems the only way to find out why freeswitch fails to start sometimes is to run it manually. Which is a problem if it’s failing when you’re not in a position to do so. It really seems that the log output provided by freeswitch when run manually should also be passed to the systemd status when run as a service. That could be accomplished by simply dumping it to stderr on a failure.

I offer a few semi-related suggestions for your systemd unit file that might help:

The dependencies in the official freeswitch package service match those you list. The pid is set to:

PIDFile=/run/freeswitch/freeswitch.pid

which is a directory created with the right permissions to use:

$ cat /usr/lib/tmpfiles.d/freeswitch.conf
d /var/run/freeswitch 0755 freeswitch freeswitch - -

The timeout lines prevent FS from being killed and restarted so rapidly as to make it useless to troubleshoot. That’s what systemd was complaining about, among other things.

No, those lines control how long systemd will wait for a service to start or stop before it’s considered failed, they’ve got nothing to do with how long freeswitch waits after a service fails before it tries to restart it. To control the wait between restart tries you need to use RestartSec or RestartSteps/RestartMaxDelaySec.

The Restart= line allows you to stop FS without systemd starting it behind your back.

What do you mean by behind your back? If you explicitly shut it down via systemctl, it doesn’t randomly restart it, unless it’s activated by a matching socket service or something (in which case systemd will warn you when you shut it down). It only restarts it if it exits on its own. Telling it to only restart on failure just means it won’t auto restart if freeswitch exits unexpectly but with a 0 status code.

I’m sure there’s something simple going on behind the scenes. I guess I could just tweak the dependencies to make it start at the absolute end of the boot process and see if that clears up whatever race condition is happening. It would sure be nice if it could just tell me why it’s not starting though.

Thanks again…

Looking at the source, switch.c calls daemonize which closes stderr in the child process. If -ncwait is passed, the parent creates a pipe before spawning the child and waits for the child to send it a 1 to indicate it started.

In between the call to daemonize and the write to the pipe by the child, various things could go wrong which cause it to just write to stderr and then exit, so the parent knows it failed, but not why.

Given stderr is dropped in the child, these errors are dumped into /dev/null and there’s no way to know what they were.

If -ncwait is passed, couldn’t the child wait to drop stderr until it writes the success to the pipe, so any errors or failures would still be written to stderr and captured by systemd so they could be seen?