Events postponed by timeout due to stream server connectivity

stevendewilde · January 21, 2019, 12:11pm

Yesterday we encountered a strange situation: some hourly events were postponed.
Our situation:

[ul][li]we have 2 hourly events:
[list type=decimal]
[li]xx:55:00 download news from FTP[/li]
[li]xx:57:00 append database playlist for next hour[/li]
[/list][/li]
[li]we use the encoder to stream to our internet stream + collecting the number of listeners[/li][/ul]

There were some issues with the connectivity with the stream server, frequentl connection loss etc… the error log mentioned some timeouts regarding the collection of nr of listeners, but it seems like this impacted the download event for the news (normally there were no issues with that connection). The event of the playlist was executed after a timeout error of the news event around xx:04:00. This resulted in a short underrun (I know I can use an action to handle the underrun) + appending the wrong playlist (since the option of the next hour is used).

Is there an internal connection between the FTP-download event and the collection of stream info that can cause this situation?
I thought (or it would be great) that:

[ul][li]an event doesn’t depend on the completion of the previous event [/li]
[li]a connection timeout can be configured for the different internet actions[/li][/ul]

Thanks for your replies!

Torben · January 21, 2019, 8:43pm

The actions within the same action list wait for each other, on purpose. Otherwise the “Emergency action list” feature would not work.

Consider putting the download action into a separate event. It will run indepedently.

Regarding timeouts, if the server was not reachable at all (connection timeout), this timeout duration is managed by Windows. Read timeouts (when the connection is already established) are a different thing and much more difficult to handle. For HTTP there is a default read timeout of 30 seconds implemented, but I’m not sure about FTP.

stevendewilde · January 22, 2019, 9:53am

Both the download event and the append playlist event are already seperate hourly scheduled events, so I expect them to run idependently, just like you described. It seemed otherwise.

I checked the logging (and attached it to this post):

[ul][li]in yellow all read timeout warnings from the stream count requests[/li]
[li]in green all logs about the news download event[/li]
[li]in blue all logs about the playlist append event[/li][/ul]

What I found out: the events start at the correct time, but both the download event and the playlist event suffer from timeouts although only the streamserver had connectivity issues. The database runs localhost and if you look at the timings of the playlist event you notice a serious delay:
18:57:00 - start of the playlist event
19:00:29 - advertisement containsers inspected (part of the append playlist event) (just after the socket error of the news download)
The playlist of 20:00 was appended and not the playlist of 19:00 (in the playlist event we request the playlist of the next hour), this indicates that the playlist was requested after 19:00… (same issue around 20:05)

Since a database connection is required to append the playlist (I think), could this rare case has anything to do with running out of available sockets in Windows? Maybe caused by the read timeouts of the stream count requests? If that’s the case, maybe these requests could be temporary stopped after subsequent errors?

Since I’m a developer myself, I’ll try to think along as much as possible to avoid this situation in the future and to make your software better than it already is.

mAirList - 20190120 - log.pdf (265 KB)

shorty.xs · January 23, 2019, 9:42am

Can you post the script that is handling the download?
Is this a mAirlist Script?

stevendewilde · January 23, 2019, 10:15am

I’m using a regular mAirList action as a scheduled hourly event: download file > direct download using an FTP:// address and providing a username and password.

I’ve changed the interval of the stream count to 60 seconds instead of the default 30 seconds, since the stream connectivity was a known issue at that time, I think the problems have something to do with the stream (e.g. start of a new request to get the number of listeners while a previous is stille running and waiting for a timeout).