Diagnosing unexpected freeswitch startup failures

I’m using Debian 12 and running the current freeswitch packages from the signalwire repo.

I’ve got an issue where sometimes freeswitch won’t start at boot, and there’s nothing in the logs that indicates why it failed.

I don’t know why it’s failing at boot, but it’s easy to replicate the issue of freeswitch failing without any indication of why. It’s kind of contrived, but say for example /etc/freeswitch/freeswitch.xml doesn’t exist. freeswitch fails to start:

pbx-01 ~ # systemctl status freeswitch.service
× freeswitch.service - freeswitch
Loaded: loaded (/lib/systemd/system/freeswitch.service; enabled; preset: enabled)
Drop-In: /etc/systemd/system/freeswitch.service.d
└─nochown.conf, order.conf, realtime.conf
Active: failed (Result: exit-code) since Sat 2024-06-29 19:50:22 PDT; 30s ago
Duration: 52.239s
Process: 6000 ExecStart=/usr/bin/freeswitch -u ${USER} -g ${GROUP} -ncwait ${DAEMON_OPTS} (code=exited, s>
CPU: 44ms

Jun 29 19:50:22 pbx-01 systemd[1]: freeswitch.service: Scheduled restart job, restart counter is at 5.
Jun 29 19:50:22 pbx-01 systemd[1]: Stopped freeswitch.service - freeswitch.
Jun 29 19:50:22 pbx-01 systemd[1]: freeswitch.service: Start request repeated too quickly.
Jun 29 19:50:22 pbx-01 systemd[1]: freeswitch.service: Failed with result ‘exit-code’.
Jun 29 19:50:22 pbx-01 systemd[1]: Failed to start freeswitch.service - freeswitch.

There’s nothing in the freeswitch log, the last entry is from being shut down before this test:

2024-06-29 19:50:06.562165 99.27% [NOTICE] switch_loadable_module.c:1573 Deleting Say interface ‘en’

Obviously, in this case, it failed before logging was configured, so nothing in the logs is expected. But in other cases, such as when it randomly fails at boot, there is still nothing in the logs.

There’s also nothing in the journal:

pbx-01 ~ # journalctl -xeu freeswitch.service

A start job for unit freeswitch.service has finished with a failure.

The job identifier is 3988 and the job result is failed.

The only thing that provides any detail as to what failed is running freeswitch by hand rather than via the service and leaving out the -ncwait option:

pbx-01 ~ # /usr/bin/freeswitch -u freeswitch -g freeswitch -nonat
2024-06-29 19:58:00.500758 0.00% [INFO] switch_event.c:714 Activate Eventing Engine.
2024-06-29 19:58:00.511267 0.00% [WARNING] switch_event.c:685 Create additional event dispatch thread 0
2024-06-29 19:58:00.511497 0.00% [ERR] switch_xml.c:1439 Couldn’t open /etc/freeswitch/freeswitch.xml (No such file or directory)
Cannot Initialize [Cannot Open log directory or XML Root!]

If the problem is reproducible (I’ve run into a few other config issues that make this happen other than the sledgehammer of whacking a main config file) this is a solution. But when it fails to start randomly, it’s not.

From what I can tell, when started via systemd:

Jun 29 19:59:59 pbx-01 systemd[1]: Starting freeswitch.service - freeswitch…
Jun 29 19:59:59 pbx-01 freeswitch[6112]: 6113 Backgrounding.
Jun 29 20:00:03 pbx-01 freeswitch[6112]: FreeSWITCH[6112] Waiting for background process pid:6113 to be ready>
Jun 29 20:00:03 pbx-01 freeswitch[6112]: FreeSWITCH[6112] Waiting for background process pid:6113 to be ready>
Jun 29 20:00:03 pbx-01 freeswitch[6112]: FreeSWITCH[6112] System Ready pid:6113
Jun 29 20:00:03 pbx-01 systemd[1]: Started freeswitch.service - freeswitch.

The process systemd runs spawns another process, and when that process is ready, the original process exits with success, and things work. But if that child process fails, the original process fails, but without any reason why.

Am I missing something? Should the initial output as seen by running freeswitch by hand show up somewhere when running as a service? Presumably it’s that child process that knows about it but I can’t find anywhere it might log it.

Thanks much for any insight as to sorting out what’s broken when freeswitch fails to start early and there’s no evidence of why…