Failover
Slicer Failover minimizes the impact on your viewer's playback experience when a Slicer's performance is sub-optimal by automatically switching the live stream's source to a different Slicer. This is possible because each Slicer in a Failover Group provides status information to our system at frequent intervals. If the primary Slicer is unhealthy, our system will switch the source of the live stream to a different Slicer. This may cause viewers to experience a few seconds of discontinuity.
This feature is only supported for Live Slicers only. Live Events are not included in the Slicer Failover solution at this time.
Slicers can be enabled in either Hot-Hot, Hot-Warm, or Flexible Hot-Warm configuration:
- With Hot-Hot, all Slicers are doing full encode and storage and allows for near instantaneous failover.
- With Hot-Warm, only the primary Slicer is doing a full encode to save cost, but there may be seconds of slate when failover occurs.
- With Flexible Hot-Warm, a user can specify how many Hot Slicer backups are desired with the rest being in Warm standby for the best balance between cost and user experience.
Slicer Failover can be run in either Prioritized, Flat, or Custom Mode:
- In Prioritized Mode, each Slicer has a priority to control the failover order and will automatically failback when higher priority Slicer becomes healthy again. There is an override provided to prevent this failback from occurring.
- In Flat Mode, when failover occurs, any healthy Slicer will be used (randomly chosen) and no automatic failback.
- In Custom Mode, there can be multiple Slicers with the same priority so there will be no automatic failback to a Slicer with the same priority as the active Slicer.
Slicer Failover also detects TR 101 290 errors to do failover based on MPEG transport stream errors.
Configuration
Prerequisities and Release Information
It is recommended to use Slicer Release 23062600 (June 2023) or later, but it is always recommended by Uplynk to use the latest official Slicer Release.
If using a Slicer version before Slicer Release 23062600, failover_id
must be set in the Slicer configuration file.
If failover based on TR 101 290 errors is desired, Slicer Release 22083100 (August 2022) or later must be used.
On the Channels tab, if you are using the OLD Slicer Failover, the Failover checkbox should be turned OFF when using the current Slicer Failover. This setting should not affect the current Slicer Failover, but it is safer to remove it to avoid any conflicts. Eventually, this checkbox will be removed.

Old Slicer Failover Checkbox
Step 1: Create the Channel in the CMS
- Go to Channels tab in CMS
- Click + Channel
- Enter a Channel Name, leaving Slicer ID blank
- Deselect Edit after creation
- Click Create
Step 2: Define the Failover Group in the CMS
- Go to Ingest tab in the CMS
- Select Failover Group
- Click + Failover Group
- Enter a Failover Group Name
- Click Create
- Note the Failover Group ID which truly defines the Failover Group, not the Failover Group Name
- If using a Slicer version prior to Slicer Release 23062600, copy this value to paste into Slicer config files
Step 3: Configure the Slicers in the Failover Group
- For each Slicer in the Failover Group, configure
failover_id
to the value that was in the CMS. This is only required when using a Slicer version before Slicer Release 23062600 (June 2023). - Add
enable_remote_config: 1
to indicate that the Slicer should pull configurations from the CMS. With Slicer Release 23062600 or later, this can be anywhere in the file rather than the last line. - Restart the Slicer(s) for the updates to take effect which is required for any configuration file change.
In order to retrieve TR 101 290 data from the Slicer Status API or to use TR 101 290 for Slicer Failover, the following fields must be added to the configuration file:
###### Set DtApi functionality on or off
dt_api_sdk_on: 1
###### Set mode where the modes are: 0 No tables, 1 ATSC, 2 DVB, 3 DVB_RCS
dt_api_sdk_standard_mode: 1
###### Target bitrate of the stream (the bitrate of the incoming stream,
###### dt_api_sdk uses it to estimate the timestamps)
dt_api_sdk_bitrate: (the bitrate of the incoming stream prior to encoding)
###### Priority is the class of error we want to monitor: 3 - catch all errors
######(P3 & P2 & P1); 2 - catch P1 & P2, 1- catch P1 errors; 0 catch nothing
dt_api_sdk_priority: 2
Step 4: Add the Slicers to the Failover Group in the CMS
- Select Hot-Warm Mode (All Hot shown chosen)
- Select Failover Mode of Prioritized, Flat, or Custom (Prioritized chosen)
- Click Add Slicers to Group
- Click on desired Slicers to add to the Failover Group and they will be checked
- Click Add X Slicers where X is the number of Slicers that will be added
Step 5: Enable the Slicers and Set Priority
- Enable the Slicers by moving the slider to Enabled
- Priority of the Slicers can be changed by dragging the Slicer entries to rearrange
- Click Save
Immediately after enabling, the Slicers are denylisted since they are new to failover and are shown as unhealthy. After several minutes, they will be allowlisted and become healthy.
After becoming allowlisted, Status changes to Hot. You may need to refresh the browser to see the update in status. The Event Log shows denylist to allowlist transitions.
Step 6: Mapping the Failover Group to the Channel
- Make sure Slicers are set to Enabled
- Click Add Channel Mapping
- Click the channel(s) to map the Failover Group
- Click Map X Channel(s) where X is the number of selected channels
- Channel will appear here
- Click Save
- Preview will show thumbnail and Active Slicer for the Failover Group
If the Preview doesn’t appear within a minute, may have disable and reenable the Slicers in the Failover Group or to restart the Slicers.
- Go to Channels tab and select the channel
- Observe Active Slicer ID
- Observe Failover Group and link to the Failover Group configuration
Enable Hot-Warm Failover
- Change dropdown to Hot-Warm
- Click Save
With Hot-Hot (the default), all Slicers are doing full processing, including encoding and storage. In this case, failover will be basically instantaneous.
With Hot-Warm, only the active Slicer is doing full processing and the backups are not encoding to save cost. In this case, there may be seconds of slate when failover occurs.
In our system, Slicers in Warm Mode are essentially put into an “Ads” state. This is reflected in the UI.
Enable Flat Priority Mode
- Change Priority Mode to Flat
- Click Save
All Slicers will be equal priority and replacement Slicer randomly chosen from healthy ones.
No automatic failback will occur when failed Slicer becomes healthy again since all considered equal.
Enable Custom Priority Mode
- Change Priority Mode to Custom
- Edit the priorities, can have multiple Slicers with the same priority
- Click Save
When in Custom Priority Mode, the Slicer will failover to any of the healthy Slicers that have the same or next highest priority as the failed Slicer.
If the failed Slicer becomes healthy, auto failback will only occur if the newly healthy Slicer has a higher priority than the current one. So, if the current Slicer is Priority 1 and the newly healthy one is too, there will be no failback.
Disable Auto Failback (Prioritized and Custom Priority Modes Only)
- Set Auto Fallback to Disabled
- Click Save
With Auto Failback disabled, after the failed Slicer becomes healthy again, the Slicer will not failback even though priorities are set.
With Auto Failback enabled, after the failed Slicer becomes healthy again, the Slicer will failback if the newly healthy Slicer is higher priority.
Flexible Hot-Warm Failover
When there more than two Slicers in a Failover Group, you have the option of how many Hot backups are desired with the rest being Warm.
At current time, recommended to use Prioritized Priority Mode only since Flat picks a random Slicer so it may pick a Warm Slicer.
Blue indicates Warm and Warm Slicers are always in an Ads state.
Failover Groups Summary
- Active Slicer is green and indicated with [A]
- Hot inactive Slicer is green and indicated with [H]
- Unhealthy Slicer is red and indicated with [U]
- Warm Slicer is blue and indicated with [W]
- Disabled Slicer is gray and indicated with [D]
Updating Existing Channels to Use Slicer Failover
In this scenario, there is a single existing channel that is being fed by a single existing Slicer. The desire is to replace the single existing Slicer with a Failover Group on the existing channel and then add the existing Slicer to the new Failover Group.
Steps:
- Observe the existing channel with original “main” Slicer
- Create a new Failover Group
- Add backup Slicer(s) to the new Failover Group
- Add existing channel to the new Failover Group. Since the original “main” Slicer is not in the new Failover Group, the system will switch the channel to a backup Slicer in the new Failover Group.
- In the Uplynk CMS, add the original “main” Slicer to the new Failover Group as top priority
Once enabled and saved, the system would switch the channel back to this original “main” Slicer (this will take several minutes while the original “main” Slicer goes from a denylisted to allowlisted state)
NOTE: The above assumes that Slicer Release 23062600 or later is being used
Failover Thresholds
For each Failover Group, thresholds can be defined to precisely control the failover conditions.
Failover thresholds are defined in the CMS and can be manipulated as desired. Generally, thresholds should only be changed from the defaults upon advisement from Uplynk Support.
All of the thresholds include a hysteresis -- how long a parameter needs to be in the failure condition to be considered failed and how long a parameter needs to be in the working condition to be considered working.
Individual thresholds have a “severity” which is used when more than one channel has impairments so that the channel with the least severe impairments is used.
For Slicer versions before Slicer Release 2306600:
- The Slicer reads the thresholds when it is restarted so you must restart the Slicers when the thresholds are changed to ensure the thresholds are applied as expected.
- The Slicer config file must have
enable_remote_config: 1
in it to enable the Slicer to fetch these settings and this has to be after thefailover_id
, if present.
If failover based on TR 101 290 errors is desired, Slicer Release 22083100 (August 2022) or later must be used.
Failover Thresholds UI
The Thresholds tab is used to set the failover thresholds. The toggle enables/disables the individual threshold.
Each threshold has a severity where if there are only unhealthy Slicers, the one with the least severe issue will be used as the active Slicer. Click Save to save thresholds.
Failover Threshold Definitions
Category | Parameter | Description |
---|---|---|
Audio Loss | Set Duration | Considered failed when missing audio for X seconds |
Audio Loss | Clear Duration | Considered working when not missing audio for X seconds |
Black Screen | Set Duration | Considered failed when black screen for X seconds |
Black Screen | Set Threshold | Luma value for black screen threshold (0-100, with higher being more sensitive) |
Black Screen | Clear Duration | Considered working when no black screen for X seconds |
CC Last Seen | Set Duration | Considered failed when missing closed captions for X seconds |
CC Last Seen | Clear Duration | Considered working when not missing closed captions for X seconds |
Dropped Frames | Set Duration | Considered failed when frames dropped threshold exceeded for X seconds |
Dropped Frames | Clear Duration | Considered working when frames dropped threshold not exceeded for X seconds. Frames may be dropped if Slicer overloaded. |
Dropped Frames | Set Threshold | Number of dropped frames allowed within a five second window |
Input Loss | Set Threshold | Considered failed when no input for X seconds |
Input Loss | Clear Duration | Considered working when input for X seconds |
Involuntary Blackout | Toggle | Will failover if Slicer loses connectivity with a Broker (recommended, although disabled by default) |
Nielsen Last Seen | Set Duration | Considered failed when missing Nielsen tags for X seconds |
Nielsen Last Seen | Clear Duration | Considered working when not missing Nielsen tags for X seconds |
Processing Queue | Set Duration | Considered failed when processing queue threshold exceeded for X seconds |
Processing Queue | Clear Duration | Considered working when processing queue threshold not exceeded for X seconds |
Processing Queue | Set Threshold | Slicer processing queue depth (A/V packets) where exceeding is considered a failure. This should trigger before the Slicer drops frames. |
SCTE Last Seen | Set Threshold | Considered failed when no SCTE cues for X seconds |
SCTE Last Seen | Clear Duration | Considered working when SCTE cues for X seconds |
Static Audio | Set Duration | Considered failed when static audio (volume exactly the same) for X seconds |
Static Audio | Clear Duration | Considered working when no static audio (volume exactly the same) for X seconds |
Static Video | Set Duration | Considered failed when static video (static luma value) for X seconds |
Static Video | Clear Duration | Considered working when no static video (static luma value) for X seconds |
Upload Queue | Set Duration | Considered failed when upload queue threshold exceeded for X seconds |
Upload Queue | Set Threshold | Considered working when upload queue threshold not exceeded for X seconds |
Upload Queue | Clear Duration | Slicer upload queue depth (A/V packets) where exceeding is considered a failure. This indicates that the Slicer is unable to upload packets to the cloud. |
Video Loss | Set Duration | Considered failed when no video feed in input transport stream for X seconds (No video PID in stream) |
Video Loss | Clear Duration | Considered working when video feed in input transport stream for X seconds (Video PID in stream) |
TR 101 290 P1 Errors | Set Duration | Considered failed when TR 101 290 Priority 1 errors occur in input transport stream for X seconds |
TR 101 290 P1 Errors | Clear Duration | Considered working when no TR 101 290 Priority 1 errors occur in input transport stream for X seconds |
TR 101 290 P2 Errors | Set Duration | Considered failed when TR 101 290 Priority 2 errors occur in input transport stream for X seconds |
TR 101 290 P2 Errors | Clear Duration | Considered working when no TR 101 290 Priority 2 errors occur in input transport stream for X seconds |
TR 101 290 Faults
TR 101 290 is a specification used to define errors in an MPEG Transport Stream. There are Priority 1 and Priority 2 faults. TR 101 290 errors are defined in https://www.etsi.org/deliver/etsi_tr/101200_101299/101290/01.04.01_60/tr_101290v010401p.pdf.
These are TR 101 290 Priority 1 faults:
- TS Sync Loss
- Sync Byte Error
- PAT Error
- Continuity Count Error
- PMT Error
- PID Error
These are TR 101 290 Priority 2 faults:
- Transport Error
- CRC Error
- PCR Error
- PCR Accuracy Error
- PTS Error
- CAT Error
Failover Status Monitoring
Change Log displays all user changes to the failover configuration.
Event Log displays all events detected by the failover mechanism. This includes allowlisting and denylisting events.
Slicer Failover API Updates
Slicer Failover API documentation is online at https://api-docs.uplynk.com/#Develop/Live-Slicer-Failover-API.htm.
Since the introduction of Slicer Failover, APIs have been added or updated for the creation of Failover Groups, the deletion of Failover Groups, and the ability to update Slicer Failover thresholds.
These APIs are:
- POST -
/failover-groups
to create a Failover Group (New) - DELETE -
/failover-groups/{failover_group_id}
to delete a Failover Group (New) - PATCH -
/failover-groups/{failover_group_id}
to update a Failover Group (Updated)
The code examples in this section use Uplynk’s api_auth module.
Create New Failover Group API
A new API has been created for creating a Failover Group:
POST /failover-groups
Request Body Parameters:
Name | Data Type | Description |
---|---|---|
name | String | Name for the new Failover Group Note: If same name already used, another will be created with the same name but different system-defined ID to differentiate |
Request Body Example:
{ "name": ”NewGroup" }
Sample Code:
import json
import requests
from api_auth import APICredentials, APIParams
class CreateFailoverGroup:
def __init__(self):
self.host = "https://services.uplynk.com"
def run(self):
self._create_failover_group()
def _create_failover_group(self):
failover_group_name = 'MyFailoverGroup' # Replace with the desired failover group name.
url = "{}{}".format(self.host, "/api/v4/failover-groups/")
payload = {
'name': failover_group_name
}
headers = {'Content-Type': 'application/json'}
response = requests.post(
url, params=APIParams(APICredentials()).get_params({}), data=json.dumps(payload), headers=headers
)
print(response.status_code)
CreateFailoverGroup().run()
Delete Failover Group API
A new API has been created for deleting a Failover Group:
DELETE /failover-groups/{failover_group_id}
Request URL variable:
Variable | Description |
---|---|
Failover Group ID (Required) | Replace this variable with the system-defined ID assigned to the desired Failover Group Insight: Use the Get All Failover Groups endpoint to retrieve a list of Failover Groups and their system-defined ID |
Sample Code:
import json
import requests
from api_auth import APICredentials, APIParams
class DeleteFailoverGroup:
def __init__(self):
self.host = "https://services.uplynk.com"
def run(self):
self._delete_failover_group()
def _delete_failover_group(self):
failover_group_id = 'd22a96e815f241319677659316d3fb0f' # Replace with the desired failover group ID.
url = "{}{}{}".format(self.host, "/api/v4/failover-groups/", failover_group_id)
headers = {'Content-Type': 'application/json'}
response = requests.delete(
url, params=APIParams(APICredentials()).get_params({}), headers=headers
)
print(response.status_code)
DeleteFailoverGroup().run()
Update Failover Group API
The Update Failover Group API has been updated to set failover thresholds and more fields than previously allowed:
PATCH /failover-groups/{failover_group_id}
Request URL Variable:
Variable | Description |
---|---|
Failover Group ID (Required) | Replace this variable with the system-defined ID assigned to the desired Failover Group Insight: Use the Get Failover Group endpoint to retrieve a list of parameters set for a given Failover Group |
Request Body Parameters (include only those to be updated):
Name | Data Type | Description |
---|---|---|
auto_failback | Boolean | Auto-failback when issue is resolved |
channels | List | List of channels in the Failover Group |
mode | String | Prioritized, flat, or custom |
name | String | Name of the Failover Group |
slicers | List | List of Slicers, each having a dictionary, in the Failover Group |
thresholds | List | List of Slicer Failover threshold values, including severities, which are in dictionaries |
Request Body Example:
{
"auto_failback": false,
"channels": [ "99994a11ead446e7b4d7a4c91c679999", "88884a11ead446e7b4d7a4c91c678888" ],
"mode": "prioritized",
"name": ”My Failover Group",
"slicers": { "slicer1": { "force_blacklist": false, "priority": 1 },
"slicer2": { "force_blacklist": false, "priority": 2 }
},
"thresholds": {
"failover_audio_loss": { "fault_duration": 90, "recovery_duration": 60, "enabled": false, ”severity”: 5 },
"failover_blackness": { "low": 1, "fault_duration": 120, "recovery_duration": 60, "enabled": false , ”severity”: 5 },
"failover_cc_last_seen": { "fault_duration": 60, "recovery_duration": 90, "enabled": false , ”severity”: 5 },
"failover_dropped": { "high": 0, "fault_duration": 5, "recovery_duration": 6, "enabled": false , ”severity”: 5 },
"failover_input": { "fault_duration": 0, "recovery_duration": 30, "enabled": false , ”severity”: 5 },
"failover_nielsen_last_seen": { "fault_duration": 60, "recovery_duration": 90, "enabled": false , ”severity”: 5 },
"failover_proc_q": { "high": 8, "fault_duration": 10, "recovery_duration": 30, "enabled": false , ”severity”: 5 },
"failover_queue": { "high": 5, "fault_duration": 5, "recovery_duration": 5, "enabled": false , ”severity”: 5 },
"failover_scte_last_seen": { "fault_duration": 60, "recovery_duration": 90, "enabled": false , ”severity”: 5 },
"failover_static_audio": { "fault_duration": 90, "recovery_duration": 60, "enabled": false , ”severity”: 5 },
"failover_static_video": { "fault_duration": 90, "recovery_duration": 60, "enabled": false , ”severity”: 5 },
"failover_tr_101_290_stats_P1_errors": { "high": 1, "fault_duration": 0, "recovery_duration": 30, "enabled": false, ”severity”: 5 },
"failover_tr_101_290_stats_P2_errors": { "high": 1, "fault_duration": 0, "recovery_duration": 30, "enabled": false , ”severity”: 5 },
"failover_video_loss": { "fault_duration": 90, "recovery_duration": 60, "enabled": false },
"failover_involuntary_blackout": { "enabled": false, ”severity”:5 }
}
}
NOTE 1: When specifying Slicers, please note that force_blacklist (set to false to enable Slicer in Failover Group) and priority are required.
NOTE 2: The Get Failover Group API returns threshold values for failover_audio and failover_video, but those are deprecated.
Dictionary Lists in the Update Failover Group API
The “slicers” list of dictionaries contains the names of the Slicers in the Failover Group as well as the priority of each Slicer and whether the Slicer should be enabled/disabled in the Failover Group.
The “thresholds” list of dictionaries maps to the failover thresholds in the Failover Thresholds UI.
The “thresholds” list also contains a severity value for each threshold with the default value being 5 (highest). If severity not returned by the Get Failover Group endpoint, the value is assumed to be 5.
Thresholds Dictionary List in Update Failover Group API
Name | Name on Failover Thresholds UI | Description |
---|---|---|
failover_audio_loss | Audio Loss | Specifies “fault_duration” and “recovery_duration” in seconds and “enabled” which is a Boolean |
failover_blackness | Black Screen | Specifies “low” level which is an integer, “fault_duration” and “recovery_duration” in seconds, and “enabled” which is a Boolean |
failover_cc_last_seen | CC Last Seen | Specifies “fault_duration” and “recovery_duration” in seconds and “enabled” which is a Boolean |
failover_dropped | Dropped Frames | Specifies “high” level which is an integer, “fault_duration” and “recovery_duration” in seconds, and “enabled” which is a Boolean |
failover_input | Input Loss | Specifies “fault_duration” and “recovery_duration” in seconds and “enabled” which is a Boolean |
failover_nielsen_last_seen | Nielsen Last Seen | Specifies “fault_duration” and “recovery_duration” in seconds and “enabled” which is a Boolean |
failover_proc_q | Processing Queue | Specifies “high” level which is an integer, “fault_duration” and “recovery_duration” in seconds, and “enabled” which is a Boolean |
failover_queue | Upload Queue | Specifies “high” level which is an integer, “fault_duration” and “recovery_duration” in seconds, and “enabled” which is a Boolean |
failover_scte_last_seen | SCTE Last Seen | Specifies “fault_duration” and “recovery_duration” in seconds and “enabled” which is a Boolean |
failover_static_audio | Static Audio | Specifies “fault_duration” and “recovery_duration” in seconds and “enabled” which is a Boolean |
failover_static_video | Static Video | Specifies “fault_duration” and “recovery_duration” in seconds and “enabled” which is a Boolean |
failover_tr_191_290_stats_P1_errors | TR 101 290 P1 Errors | Specifies “high” level which is an integer, “fault_duration” and “recovery_duration” in seconds, and “enabled” which is a Boolean |
failover_tr_191_290_stats_P2_errors | TR 101 290 P2 Errors | Specifies “high” level which is an integer, “fault_duration” and “recovery_duration” in seconds, and “enabled” which is a Boolean |
failover_video_loss | Video Loss | Specifies “fault_duration” and “recovery_duration” in seconds and “enabled” which is a Boolean |
failover_involuntary_blackout | Involuntary Blackout | Specifies “enabled” which is a Boolean |
Update Failover Group Sample Code
The example belows shows changing Failover Group name, priority of Slicers, and thresholds, including severities. You only need to include items that are changing (e.g., ‘name’, ‘slicers’, or ‘thresholds’). You only need to include thresholds that are changing.
When changing Slicers, all Slicers need to beincluded. The force_blacklist
value is the inverse of the enable of the Slicer in the Failover Group (set to false to enable Slicer in the Failover Group).
import json
import requests
from api_auth import APICredentials, APIParams
class UpdateFailoverGroup:
def __init__(self):
self.host = "https://services.uplynk.com"
def run(self):
self._update_failover_group()
def _update_failover_group(self):
failover_group_id = '99f7b514f79440bb940812f2eb8954d7' # Replace with the desired failover group ID.
url = "{}{}{}".format(self.host, "/api/v4/failover-groups/", failover_group_id)
payload = {
# Only include 'name, 'slicers', or 'thresholds' if changing values.
'name': 'MyFailoverGroup (Test)', # Change name of failover group.
'slicers': {
"main_slicer": { "force_blacklist": False , "priority": 2}, # Make priority 2.
"backup_slicer": { "force_blacklist": False , "priority": 1} # Make priority 1.
},
'thresholds': {
'failover_static_audio': {
'enabled': True, # Enable static audio threshold.
'severity': 3 # Lower severity from default
},
'failover_tr_101_290_stats_P1_errors': {
# Enable P1 error threshold.
"high": 3,
"fault_duration": 1,
"recovery_duration": 60,
"enabled": True,
"severity": 4
}
}
}
headers = {'Content-Type': 'application/json'}
response = requests.patch(
url, params=APIParams(APICredentials()).get_params({}), data=json.dumps(payload), headers=headers
)
data=json.dumps(payload)
print(data)
print(response.status_code)
UpdateFailoverGroup().run()
API Authentication Module (api_auth.py)
The api_auth.py module is used to authenticate Uplynk v4 APIs.
import base64
import os
import zlib, hmac, hashlib, time, json
class APICredentials:
"""
Stores credentials required to request our API.
"""
@property
def user_id(self):
"""
Set your user ID to the one defined on the User Settings page.
"""
return "1234567890abcdefghijklmnopqrstu"
@property
def secret(self):
"""
Set your API key to a value defined on the Integration Keys page.
"""
return "1234567890abcdefghijklmnopqrstuvwxyz1234"
class APIParams(object):
"""
Provides API authentication. Learn more at:
https://docs.uplynk.com
"""
def __init__(self, credentials):
self.credentials = credentials
def get_params(self, data):
"""
Encodes and signs <data> into the expected format and returns it.
"""
data = self._get_params(**data)
data.update(data)
return data
def _get_msg(self, msg=None):
"""
Encodes and returns the 'msg' parameter.
"""
msg = msg if msg else {}
msg.update({
'_owner': self.credentials.user_id,
'_timestamp': int(time.time())
})
msg = json.dumps(msg)
msg_compressed = zlib.compress(msg.encode(), 9)
return base64.b64encode(msg_compressed).strip()
def _get_params(self, **msg):
"""
Returns the message and its signature.
"""
msg = self._get_msg(msg)
sig = hmac.new(
self.credentials.secret.encode(), msg, hashlib.sha256
).hexdigest()
return {
'msg': msg,
'sig': sig
}
Miscellaneous
What is Allowlisting and Denylisting?
When a Slicer is first detected for either the first time or after being fixed after a failover event, the Slicer Failover system denylists that Slicer. When denylisted, a Slicer will not be used as an option for Slicer Failover. This prevents the Slicer Failover system from thrashing in case the Slicer is not truly healthy.
Once the Slicer Failover system considers the new Slicer healthy, it is allowlisted. When allowlisted, a Slicer is used as an option from Slicer Failover. The transition from a denylisted to an allowlisted state could take several minutes.
What Exactly is a Warm State Slicer Doing?
In the Warm state, backup Slicers are still operating normally. A Warm state Slicer that is healthy will indicate that it is “Slicing” (or “Ads”), again, since it is operating normally.
After the Warm state Slicer sends its content to the cloud, it is not encoded, which provides the reduced cost.
In a Warm state, the Slicer is essentially in blackout mode, meaning that its video is being discarded.
In the Event Log, a Warm state Slicer will have a state of 1 from the Broker which is the same state as ad break. The broker is the component that receives the video from the Slicer. A state of 1 means the Broker is discarding the video from the Slicer. A state of 0 means that the Broker is not discarding the Slicer content.
The amount of slate displayed in Hot-Warm failover can range from a few seconds to over 30 seconds and is dependent on a number of factors such as the thresholds used, system loading, etc..
Auto Failback
Auto Failback after a failover condition will happen under the following conditions:
- Auto Failback is ENABLED
- The formerly failed Slicer has to be considered “healthy” for 30 seconds
- After being healthy, the system takes 1-2 minutes for the Slicer to become allowlisted
- The priority of the formerly failed Slicer has to be higher than the current one
The system is designed to failover quickly and failback slowly to avoid toggling.
Event Log Info
State for xxxxx changed from 0 to 1:
- Means that Slicer xxxxx is going from Slicing state (0) to ad break or a Warm state (1)
- If going into a Warm state, there is usually an additional log message near this one
A denylisted Slicer means that it is unhealthy while an allowlisted Slicer means that the Slicer is considered healthy.
Things to Watch Out For
When changing the contents of the Slicer tab, make sure to click Save so that the changes take effect.
When you enable a Slicer from the Slicer tab, it takes about 1-2 minutes for the Slicer to truly become active in the Failover Group. This is to ensure that the enabled Slicer is truly healthy before being considered for failover.
If you manually kill a Slicer, the Slicer status on the Slicer tab and the Content tab still says it is Slicing. For the Slicer tab, will try and use Slicer Health instead to provide more accurate information. Slicer Monitoring shows the status of the Slicer accurately and Slicer Failover itself will work.
If using a Slicer version before Slicer Release Slicer Release 23062600, make sure your failover_id
in the Slicer config exactly matches that in the CMS, especially if adding/deleting Failover Groups in the CMS while doing testing (restarting the Slicers too).
If the backup Slicers have unstable/bad input, this could prevent a “failed” Slicer from doing failover since the backup Slicers would not be considered healthy. For instance, if two Slicers in a Failover Group have the same input feed and that feed stops, failover will not occur since both Slicers will be considered unhealthy.
When a Slicer is in the Warm state, its status would still be “Slicing” (or “Ads”) since it is operating properly. In the Warm state, the cloud encoding is disabled, not the Slicer itself.
If you get an error reading Failover Groups in the CMS, may still have a permissions issue.
If after adding a Failover Group to a channel, if there is no thumbnail for the channel displayed in the CMS, restart the Slicers in the Failover Group.
Make sure the Slicer config has enable_remote_config: 1
in it to enable the Slicer to get configurations from a centralized database. enable_remote_config: 1
has to be after the failover_id
for Slicer versions prior to Slicer Release 2306600.
If using any previous versions of Slicer Failover, they must be disabled before this Slicer Failover can be used.
If failover based on TR 101 290 errors is desired, Slicer Release 22083100 (August 2022) or later must be used.
With Cloud Slicer Live, having enable_remote_config: 1
in the config file may prevent Slicer Failover from working.
Removing a Slicer Permanently from Slicer Failover
If a Slicer is to be removed from using Slicer Failover entirely:
- Remove or comment out the
failover_id
in the configuration file, if present - Remove or comment out the
enable_remote_config: 1
in the configuration file - Remove the Slicer from the Failover Group in the CMS
- Reboot the Slicer
Recommendations
Slicer Failover reports events using SNS so it is recommended to use this functionality.
If the network feeding the Slicer has stability issues, it is recommended to Input Loss set threshold to higher than 0 to reduce the sensitivity.
When using Flexible Hot-Warm, it is recommended to use Prioritized or Custom as the Priority Mode failover because Flat Mode failover at the current time may pick a Warm Slicer instead of a Hot one since the backup Slicer is chosen randomly.
Failover Notifications via Amazon SNS
See Health Notifications via Amazon SNS for additional information.
Publish failover events through the following workflow:
-
Data Push:
- Our service pushes data to Amazon SNS whenever we fail over to another Live Slicer
-
Data Broadcast:
- Amazon SNS broadcasts data to one or more destinations (e.g., mobile device, web server, or Slack)
- Get started with Amazon SNS for free through its SNS free tier. Learn more
-
Data Formatting:
- Our service formats data using JSON. This data may then be filtered via custom code.
- This article explains how to strip out additional data generated by Amazon SNS via a custom function in Amazon Lambda
Get Started with Failover Notifications
Perform the following steps to set up notifications:
-
Set Up an Amazon SNS Topic:
- Our service pushes Live Slicer health and failover notifications to the same Amazon SNS topic
- You may skip this step if you have already created an Amazon SNS topic for Live Slicer health notifications
-
Configure Communication with Amazon SNS:
- Our service pushes Live Slicer health and failover notifications to the same Amazon SNS topic
- Updating the SNS topic for either Live Slicer health or failover notifications will affect both types of notifications
-
Navigate to the Failover Page:
- From the main menu, navigate to Slicers and then select Failover from the side navigation bar
-
Update SNS Topic:
- Click Update SNS Topic from the right-hand pane
- Set the Update your SNS Topic ARN option to the ARN for the topic created above
- Click Save Topic ARN
-
Configure Amazon SNS to Broadcast Notifications:
- Learn how to set up Amazon SNS and Lambda to broadcast notifications to a Slack channel
Failover Notification Fields
Our service sends information that describes a failover event in JSON format. Key parameters in this notification are described here.
Field | Description |
---|---|
Subject | Returns Slicer Failover |
Message | Provides detailed information about the Failover event. Key parameters are described below. |
Service | Returns failover. |
Sender | Returns failover. |
Account | Indicates the user name (e.g., email address) associated with the account for which this failover event occurred. |
OID | Indicates the system-defined ID of the account for which this failover event occurred. |
FO_Group_Name | Indicates a failover group's name. |
FO_Group_ID | Indicates a failover group's system-defined ID. |
Channels | Contains an array of the live channels associated with the failover group defined by the FO_Group_Name property. |
Date_Time | Indicates when the notification was triggered. This timestamp is reported as Unix time in milliseconds. |
Original_Slicer | Indicates the slicerID of the Live Slicer that was the source of the live stream prior to the failover event. |
Slicer | Indicates the slicerID of the Live Slicer that was the source of the live stream after the failover event. |
Reason | Provides additional information about this failover event. For example, this parameter may indicate the reason for failover. |
Slicers_In_Group | Contains a key-value pair for each Live Slicer associated with the failover group defined by the FO_Group_Name property. |
Each key-value pair identifies the name of a Live Slicer and its failover status. Valid failover states are described below.
Valid Failover States
- Active: Indicates that our service is using this Live Slicer's feed to generate the live stream for all live channels associated with this failover group.
- Hot: Indicates that the Live Slicer is encoding and storing content within our system. Our service can quickly fail over to a Live Slicer in this state.
- Warm: Indicates that the Live Slicer is currently slicing content but not uploading it to our system. Failing over to a Live Slicer in this state may cause a few seconds of slate.
- Unhealthy: Indicates that the Live Slicer is considered unhealthy due to at least one metric falling below a custom threshold for a given duration.
- Disabled: Indicates that the failover capability for this Live Slicer has been manually disabled.
Key-Value Pair Syntax: {Live Slicer}: {Failover Status}
Example
{
"Service": "failover",
"Sender": "failover",
"Account": "[email protected]",
"OID": "1ab0812e54f44b029bcae08685f025cc",
"FO_Group_Name": "My failover group",
"FO_Group_ID": "f18b0d3f6393428f9aca3815a17f663e",
"Channels": ["Basketball", "News"],
"Date_Time": 1667834461149,
"Original_Slicer": "bball_slicer_1",
"Slicer": "bball_slicer_2",
"Reason": "added to denylist: Not seen since 2022-11-07 15:21:06",
"Slicers_In_Group": {
"bball_slicer_1": "Active",
"bball_slicer_2": "Hot"
}
}
Updated 4 days ago