Slicer Failover minimizes the impact on your viewer's playback experience when a Slicer's performance is sub-optimal by automatically switching the live stream's source to a different Slicer. This is possible because each Slicer in a Failover Group provides status information to our system at frequent intervals. If the primary Slicer is unhealthy, our system will switch the source of the live stream to a different Slicer. This may cause viewers to experience a few seconds of discontinuity.

This feature is only supported for Live Slicers only. Live Events are not included in the Slicer Failover solution at this time.

Slicers can be enabled in either Hot-Hot, Hot-Warm, or Flexible Hot-Warm configuration:

With Hot-Hot, all Slicers are doing full encode and storage and allows for near instantaneous failover.
With Hot-Warm, only the primary Slicer is doing a full encode to save cost, but there may be seconds of slate when failover occurs.
With Flexible Hot-Warm, a user can specify how many Hot Slicer backups are desired with the rest being in Warm standby for the best balance between cost and user experience.

Slicer Failover can be run in either Prioritized, Flat, or Custom Mode:

In Prioritized Mode, each Slicer has a priority to control the failover order and will automatically failback when higher priority Slicer becomes healthy again. There is an override provided to prevent this failback from occurring.
In Flat Mode, when failover occurs, any healthy Slicer will be used (randomly chosen) and no automatic failback.
In Custom Mode, there can be multiple Slicers with the same priority so there will be no automatic failback to a Slicer with the same priority as the active Slicer.

Slicer Failover also detects TR 101 290 errors to do failover based on MPEG transport stream errors.

Configuration

Prerequisities and Release Information

It is recommended to use Slicer Release 23062600 (June 2023) or later, but it is always recommended by Uplynk to use the latest official Slicer Release.

If using a Slicer version before Slicer Release 23062600, failover_id must be set in the Slicer configuration file.

If failover based on TR 101 290 errors is desired, Slicer Release 22083100 (August 2022) or later must be used.

On the Channels tab, if you are using the OLD Slicer Failover, the Failover checkbox should be turned OFF when using the current Slicer Failover. This setting should not affect the current Slicer Failover, but it is safer to remove it to avoid any conflicts. Eventually, this checkbox will be removed.

Step 1: Create the Channel in the CMS

Go to Channels tab in CMS
Click + Channel
Enter a Channel Name, leaving Slicer ID blank
Deselect Edit after creation
Click Create

Step 2: Define the Failover Group in the CMS

Go to Ingest tab in the CMS
Select Failover Group
Click + Failover Group

Enter a Failover Group Name
Click Create

Note the Failover Group ID which truly defines the Failover Group, not the Failover Group Name
If using a Slicer version prior to Slicer Release 23062600, copy this value to paste into Slicer config files

Step 3: Configure the Slicers in the Failover Group

For each Slicer in the Failover Group, configure failover_id to the value that was in the CMS. This is only required when using a Slicer version before Slicer Release 23062600 (June 2023).
Add enable_remote_config: 1 to indicate that the Slicer should pull configurations from the CMS. With Slicer Release 23062600 or later, this can be anywhere in the file rather than the last line.
Restart the Slicer(s) for the updates to take effect which is required for any configuration file change.

In order to retrieve TR 101 290 data from the Slicer Status API or to use TR 101 290 for Slicer Failover, the following fields must be added to the configuration file:

###### Set DtApi functionality on or off
dt_api_sdk_on: 1

###### Set mode where the modes are: 0 No tables, 1 ATSC, 2 DVB, 3 DVB_RCS
dt_api_sdk_standard_mode: 1

###### Target bitrate of the stream (the bitrate of the incoming stream,
###### dt_api_sdk uses it to estimate the timestamps)
###### This is not needed with any Slicers after the September 2024 Slicer release
dt_api_sdk_bitrate: (the bitrate of the incoming stream prior to encoding)

###### Priority is the class of error we want to monitor: 3 - catch all errors
###### (P3 & P2 & P1); 2 - catch P1 & P2, 1- catch P1 errors; 0 catch nothing
###### It is recommended to use 2 since P3 errors occur too often
dt_api_sdk_priority: 2

It is recommended to only monitor TR 101 290 P1 and P2 errors since P3 errors occur too often in practice.

Step 4: Add the Slicers to the Failover Group in the CMS

Select Hot-Warm Mode (All Hot shown chosen)
Select Failover Mode of Prioritized, Flat, or Custom (Prioritized chosen)
Click Add Slicers to Group

Click on desired Slicers to add to the Failover Group and they will be checked
Click Add X Slicers where X is the number of Slicers that will be added

Step 5: Enable the Slicers and Set Priority

Enable the Slicers in the Failover Group by moving the slider to Yes
Priority of the Slicers can be changed by dragging the Slicer entries to rearrange
Click Save

Immediately after enabling, the Slicers are denylisted since they are new to failover and are shown as unhealthy. After several minutes, they will be allowlisted and become healthy.

After becoming allowlisted, Status changes to Hot. You may need to refresh the browser to see the update in status. The Event Log shows denylist to allowlist transitions.

Step 6: Mapping the Failover Group to the Channel

Make sure Slicers are set to Yes for Available for Failover
Click Add Channel Mapping

Click the channel(s) to map the Failover Group
Click Map X Channel(s) where X is the number of selected channels

Channel will appear here
Click Save
Preview will show thumbnail and Active Slicer for the Failover Group

If the Preview doesn’t appear within a minute, may have disable and reenable the Slicers in the Failover Group or to restart the Slicers.

Go to Channels tab and select the channel
Observe Active Slicer ID
Observe Failover Group and link to the Failover Group configuration

Enable Hot-Warm Failover

Change dropdown to Hot-Warm
Click Save

With Hot-Hot (the default), all Slicers are doing full processing, including encoding and storage. In this case, failover will be basically instantaneous.

With Hot-Warm, only the active Slicer is doing full processing and the backups are not encoding to save cost. In this case, there may be seconds of slate when failover occurs.

In our system, Slicers in Warm Mode are essentially put into an “Ads” state. This is reflected in the UI.

Enable Flat Priority Mode

Change Priority Mode to Flat
Click Save

All Slicers will be equal priority and replacement Slicer randomly chosen from healthy ones.

No automatic failback will occur when failed Slicer becomes healthy again since all considered equal.

Enable Custom Priority Mode

Change Priority Mode to Custom
Edit the priorities, can have multiple Slicers with the same priority
Click Save

When in Custom Priority Mode, the Slicer will failover to any of the healthy Slicers that have the same or next highest priority as the failed Slicer.

If the failed Slicer becomes healthy, auto failback will only occur if the newly healthy Slicer has a higher priority than the current one. So, if the current Slicer is Priority 1 and the newly healthy one is too, there will be no failback.

Disable Auto Failback (Prioritized and Custom Priority Modes Only)

Set Auto Fallback to Disabled
Click Save

With Auto Failback disabled, after the failed Slicer becomes healthy again, the Slicer will not failback even though priorities are set.

With Auto Failback enabled, after the failed Slicer becomes healthy again, the Slicer will failback if the newly healthy Slicer is higher priority.

Flexible Hot-Warm Failover

When there more than two Slicers in a Failover Group, you have the option of how many Hot backups are desired with the rest being Warm.

At current time, recommended to use Prioritized Priority Mode only since Flat picks a random Slicer so it may pick a Warm Slicer.

Blue indicates Warm and Warm Slicers are always in an Ads state.

Failover Groups Summary

Healthy Active Slicer is green and indicated with [A]
Healthy Hot inactive Slicer is green and indicated with [H]
Healthy Warm Slicer is blue and indicated with [W]
Unhealthy Slicer is red and overrides the green and blue
Disabled Slicer is gray and indicated with [D]

Updating Existing Channels to Use Slicer Failover

In this scenario, there is a single existing channel that is being fed by a single existing Slicer. The desire is to replace the single existing Slicer with a Failover Group on the existing channel and then add the existing Slicer to the new Failover Group.

Steps:

Observe the existing channel with original “main” Slicer
Create a new Failover Group
Add backup Slicer(s) to the new Failover Group
Add existing channel to the new Failover Group. Since the original “main” Slicer is not in the new Failover Group, the system will switch the channel to a backup Slicer in the new Failover Group.
In the Uplynk CMS, add the original “main” Slicer to the new Failover Group as top priority
Once enabled and saved, the system would switch the channel back to this original “main” Slicer (this will take several minutes while the original “main” Slicer goes from a denylisted to allowlisted state)

NOTE: The above assumes that Slicer Release 23062600 or later is being used

Failover Thresholds

For each Failover Group, thresholds can be defined to precisely control the failover conditions.
Failover thresholds are defined in the CMS and can be manipulated as desired. Generally, thresholds should only be changed from the defaults upon advisement from Uplynk Support.

All of the thresholds include a hysteresis -- how long a parameter needs to be in the failure condition to be considered failed and how long a parameter needs to be in the working condition to be considered working.

Individual thresholds have a “severity” which is used when more than one channel has impairments so that the channel with the least severe impairments is used.

For Slicer versions before Slicer Release 2306600:

The Slicer reads the thresholds when it is restarted so you must restart the Slicers when the thresholds are changed to ensure the thresholds are applied as expected.
The Slicer config file must have enable_remote_config: 1 in it to enable the Slicer to fetch these settings and this has to be after the failover_id, if present.

If failover based on TR 101 290 errors is desired, Slicer Release 22083100 (August 2022) or later must be used.

Failover Thresholds UI

The Thresholds tab is used to set the failover thresholds. The toggle enables/disables the individual threshold.

Each threshold has a severity where if there are only unhealthy Slicers, the one with the least severe issue will be used as the active Slicer. Click Save to save thresholds.

Failover Threshold Definitions

Category	Parameter	Description
Audio Loss	Set Duration	Considered failed when missing audio for X seconds
Audio Loss	Clear Duration	Considered working when not missing audio for X seconds
Black Screen	Set Duration	Considered failed when black screen for X seconds
Black Screen	Set Threshold	Luma value for black screen threshold (0-100, with higher being more sensitive)
Black Screen	Clear Duration	Considered working when no black screen for X seconds
CC Last Seen	Set Duration	Considered failed when missing closed captions for X seconds
CC Last Seen	Clear Duration	Considered working when not missing closed captions for X seconds
Dropped Frames	Set Duration	Considered failed when frames dropped threshold exceeded for X seconds
Dropped Frames	Clear Duration	Considered working when frames dropped threshold not exceeded for X seconds. Frames may be dropped if Slicer overloaded.
Dropped Frames	Set Threshold	Number of dropped frames allowed within a five second window
Input Loss	Set Threshold	Considered failed when no input for X seconds
Input Loss	Clear Duration	Considered working when input for X seconds
Involuntary Blackout	Toggle	Will failover if Slicer loses connectivity with a Broker (recommended, although disabled by default)
Nielsen Last Seen	Set Duration	Considered failed when missing Nielsen tags for X seconds
Nielsen Last Seen	Clear Duration	Considered working when not missing Nielsen tags for X seconds
Processing Queue	Set Duration	Considered failed when processing queue threshold exceeded for X seconds
Processing Queue	Clear Duration	Considered working when processing queue threshold not exceeded for X seconds
Processing Queue	Set Threshold	Slicer processing queue depth (A/V packets) where exceeding is considered a failure. This should trigger before the Slicer drops frames.
SCTE Last Seen	Set Threshold	Considered failed when no SCTE cues for X seconds
SCTE Last Seen	Clear Duration	Considered working when SCTE cues for X seconds
Static Audio	Set Duration	Considered failed when static audio (volume exactly the same) for X seconds
Static Audio	Clear Duration	Considered working when no static audio (volume exactly the same) for X seconds
Static Video	Set Duration	Considered failed when static video (static luma value) for X seconds
Static Video	Clear Duration	Considered working when no static video (static luma value) for X seconds
Upload Queue	Set Duration	Considered failed when upload queue threshold exceeded for X seconds
Upload Queue	Set Threshold	Considered working when upload queue threshold not exceeded for X seconds
Upload Queue	Clear Duration	Slicer upload queue depth (A/V packets) where exceeding is considered a failure. This indicates that the Slicer is unable to upload packets to the cloud.
Video Loss	Set Duration	Considered failed when no video feed in input transport stream for X seconds (No video PID in stream)
Video Loss	Clear Duration	Considered working when video feed in input transport stream for X seconds (Video PID in stream)
TR 101 290 P1 Errors	Set Duration	Considered failed when TR 101 290 Priority 1 errors occur in input transport stream for X seconds (Recommended to use larger values such 15 seconds or more to allow for slight glitches)
TR 101 290 P1 Errors	Clear Duration	Considered working when no TR 101 290 Priority 1 errors occur in input transport stream for X seconds (Recommended to use larger values such 15 seconds or more to allow for slight glitches)
TR 101 290 P2 Errors	Set Duration	Considered failed when TR 101 290 Priority 2 errors occur in input transport stream for X seconds (Recommended to use larger values such 15 seconds or more to allow for slight glitches)
TR 101 290 P2 Errors	Clear Duration	Considered working when no TR 101 290 Priority 2 errors occur in input transport stream for X seconds (Recommended to use larger values such 15 seconds or more to allow for slight glitches)

TR 101 290 Faults

TR 101 290 is a specification used to define errors in an MPEG Transport Stream. There are Priority 1 and Priority 2 faults. TR 101 290 errors are defined in https://www.etsi.org/deliver/etsi_tr/101200_101299/101290/01.04.01_60/tr_101290v010401p.pdf.

To use TR 101 290 for Slicer Failover, it must be enabled in the Slicer configuration file.

These are TR 101 290 Priority 1 faults:

TS Sync Loss
Sync Byte Error
PAT Error
Continuity Count Error
PMT Error
PID Error

These are TR 101 290 Priority 2 faults:

Transport Error
CRC Error
PCR Error
PCR Accuracy Error
PTS Error
CAT Error

Failover Status Monitoring

Change Log displays all user changes to the failover configuration.

Event Log displays all events detected by the failover mechanism. This includes allowlisting and denylisting events.

Slicer Failover API Updates

Slicer Failover API documentation is online at https://api-docs.uplynk.com/#Develop/Live-Slicer-Failover-API.htm.

Since the introduction of Slicer Failover, APIs have been added or updated for the creation of Failover Groups, the deletion of Failover Groups, and the ability to update Slicer Failover thresholds.

These APIs are:

POST - /failover-groups to create a Failover Group (New)
DELETE - /failover-groups/{failover_group_id} to delete a Failover Group (New)
PATCH - /failover-groups/{failover_group_id} to update a Failover Group (Updated)

The code examples in this section use Uplynk’s api_auth module.

Create New Failover Group API

A new API has been created for creating a Failover Group:
POST /failover-groups

Request Body Parameters:

Name	Data Type	Description
name	String	Name for the new Failover Group Note: If same name already used, another will be created with the same name but different system-defined ID to differentiate

Request Body Example:

{ "name": ”NewGroup" }

Sample Code:

import json
import requests
from api_auth import APICredentials, APIParams

class CreateFailoverGroup:
    def __init__(self):
        self.host = "https://services.uplynk.com"

    def run(self):
        self._create_failover_group()

    def _create_failover_group(self):
        
        failover_group_name = 'MyFailoverGroup' # Replace with the desired failover group name.
            
        url = "{}{}".format(self.host, "/api/v4/failover-groups/")

        payload = {
            'name': failover_group_name
        }

        headers = {'Content-Type': 'application/json'}
  
        response = requests.post(
            url, params=APIParams(APICredentials()).get_params({}), data=json.dumps(payload), headers=headers
        )
  
        print(response.status_code)
  
CreateFailoverGroup().run()

Delete Failover Group API

A new API has been created for deleting a Failover Group:
DELETE /failover-groups/{failover_group_id}

Request URL variable:

Variable	Description
Failover Group ID (Required)	Replace this variable with the system-defined ID assigned to the desired Failover Group Insight: Use the Get All Failover Groups endpoint to retrieve a list of Failover Groups and their system-defined ID

Sample Code:

import json
import requests
from api_auth import APICredentials, APIParams

class DeleteFailoverGroup:
    def __init__(self):
        self.host = "https://services.uplynk.com"

    def run(self):
        self._delete_failover_group()

    def _delete_failover_group(self):
        failover_group_id = 'd22a96e815f241319677659316d3fb0f' # Replace with the desired failover group ID.
        
        url = "{}{}{}".format(self.host, "/api/v4/failover-groups/", failover_group_id)

        headers = {'Content-Type': 'application/json'}
  
        response = requests.delete(
            url, params=APIParams(APICredentials()).get_params({}), headers=headers
        )
  
        print(response.status_code)
  
DeleteFailoverGroup().run()

Update Failover Group API

The Update Failover Group API has been updated to set failover thresholds and more fields than previously allowed:
PATCH /failover-groups/{failover_group_id}

Request URL Variable:

Variable	Description
Failover Group ID (Required)	Replace this variable with the system-defined ID assigned to the desired Failover Group Insight: Use the Get Failover Group endpoint to retrieve a list of parameters set for a given Failover Group

Request Body Parameters (include only those to be updated):

Name	Data Type	Description
auto_failback	Boolean	Auto-failback when issue is resolved
channels	List	List of channels in the Failover Group
mode	String	Prioritized, flat, or custom
name	String	Name of the Failover Group
slicers	List	List of Slicers, each having a dictionary, in the Failover Group
thresholds	List	List of Slicer Failover threshold values, including severities, which are in dictionaries

Request Body Example:

{ 
  "auto_failback": false, 
  "channels": [ "99994a11ead446e7b4d7a4c91c679999", "88884a11ead446e7b4d7a4c91c678888" ],
  "mode": "prioritized", 
  "name": ”My Failover Group", 
  "slicers": { "slicer1": { "force_blacklist": false, "priority": 1 },
               "slicer2": { "force_blacklist": false, "priority": 2 } 
             }, 
  "thresholds": {	
      "failover_audio_loss": { "fault_duration": 90, "recovery_duration": 60, "enabled": false, ”severity”: 5 }, 
      "failover_blackness": { "low": 1, "fault_duration": 120, "recovery_duration": 60, "enabled": false , ”severity”: 5 },
      "failover_cc_last_seen": { "fault_duration": 60, "recovery_duration": 90, "enabled": false , ”severity”: 5 },
      "failover_dropped": { "high": 0, "fault_duration": 5, "recovery_duration": 6, "enabled": false , ”severity”: 5 },
      "failover_input": { "fault_duration": 0, "recovery_duration": 30, "enabled": false , ”severity”: 5 },
      "failover_nielsen_last_seen": { "fault_duration": 60, "recovery_duration": 90, "enabled": false , ”severity”: 5 },
      "failover_proc_q": { "high": 8, "fault_duration": 10, "recovery_duration": 30, "enabled": false , ”severity”: 5 },
      "failover_queue": { "high": 5, "fault_duration": 5, "recovery_duration": 5, "enabled": false , ”severity”: 5 },
      "failover_scte_last_seen": { "fault_duration": 60, "recovery_duration": 90, "enabled": false , ”severity”: 5 },
      "failover_static_audio": { "fault_duration": 90, "recovery_duration": 60, "enabled": false , ”severity”: 5 },
      "failover_static_video": { "fault_duration": 90, "recovery_duration": 60, "enabled": false , ”severity”: 5 },
      "failover_tr_101_290_stats_P1_errors": { "high": 1, "fault_duration": 0, "recovery_duration": 30, "enabled": false, ”severity”: 5 },
      "failover_tr_101_290_stats_P2_errors": { "high": 1, "fault_duration": 0, "recovery_duration": 30, "enabled": false , ”severity”: 5 },
      "failover_video_loss": { "fault_duration": 90, "recovery_duration": 60, "enabled": false },
      "failover_involuntary_blackout": { "enabled": false, ”severity”:5 }
                } 
}

NOTE 1: When specifying Slicers, please note that force_blacklist (set to false to enable Slicer in Failover Group) and priority are required.
NOTE 2: The Get Failover Group API returns threshold values for failover_audio and failover_video, but those are deprecated.

Dictionary Lists in the Update Failover Group API

The “slicers” list of dictionaries contains the names of the Slicers in the Failover Group as well as the priority of each Slicer and whether the Slicer should be enabled/disabled in the Failover Group.

The “thresholds” list of dictionaries maps to the failover thresholds in the Failover Thresholds UI.

The “thresholds” list also contains a severity value for each threshold with the default value being 5 (highest). If severity not returned by the Get Failover Group endpoint, the value is assumed to be 5.

Thresholds Dictionary List in Update Failover Group API

Name	Name on Failover Thresholds UI	Description
failover_audio_loss	Audio Loss	Specifies “fault_duration” and “recovery_duration” in seconds and “enabled” which is a Boolean
failover_blackness	Black Screen	Specifies “low” level which is an integer, “fault_duration” and “recovery_duration” in seconds, and “enabled” which is a Boolean
failover_cc_last_seen	CC Last Seen	Specifies “fault_duration” and “recovery_duration” in seconds and “enabled” which is a Boolean
failover_dropped	Dropped Frames	Specifies “high” level which is an integer, “fault_duration” and “recovery_duration” in seconds, and “enabled” which is a Boolean
failover_input	Input Loss	Specifies “fault_duration” and “recovery_duration” in seconds and “enabled” which is a Boolean
failover_nielsen_last_seen	Nielsen Last Seen	Specifies “fault_duration” and “recovery_duration” in seconds and “enabled” which is a Boolean
failover_proc_q	Processing Queue	Specifies “high” level which is an integer, “fault_duration” and “recovery_duration” in seconds, and “enabled” which is a Boolean
failover_queue	Upload Queue	Specifies “high” level which is an integer, “fault_duration” and “recovery_duration” in seconds, and “enabled” which is a Boolean
failover_scte_last_seen	SCTE Last Seen	Specifies “fault_duration” and “recovery_duration” in seconds and “enabled” which is a Boolean
failover_static_audio	Static Audio	Specifies “fault_duration” and “recovery_duration” in seconds and “enabled” which is a Boolean
failover_static_video	Static Video	Specifies “fault_duration” and “recovery_duration” in seconds and “enabled” which is a Boolean
failover_tr_191_290_stats_P1_errors	TR 101 290 P1 Errors	Specifies “high” level which is an integer, “fault_duration” and “recovery_duration” in seconds, and “enabled” which is a Boolean
failover_tr_191_290_stats_P2_errors	TR 101 290 P2 Errors	Specifies “high” level which is an integer, “fault_duration” and “recovery_duration” in seconds, and “enabled” which is a Boolean
failover_video_loss	Video Loss	Specifies “fault_duration” and “recovery_duration” in seconds and “enabled” which is a Boolean
failover_involuntary_blackout	Involuntary Blackout	Specifies “enabled” which is a Boolean

Update Failover Group Sample Code

The example belows shows changing Failover Group name, priority of Slicers, and thresholds, including severities. You only need to include items that are changing (e.g., ‘name’, ‘slicers’, or ‘thresholds’). You only need to include thresholds that are changing.

When changing Slicers, all Slicers need to beincluded. The force_blacklist value is the inverse of the enable of the Slicer in the Failover Group (set to false to enable Slicer in the Failover Group).

import json
import requests
from api_auth import APICredentials, APIParams

class UpdateFailoverGroup:
    def __init__(self):
        self.host = "https://services.uplynk.com"

    def run(self):
        self._update_failover_group()

    def _update_failover_group(self):
        failover_group_id = '99f7b514f79440bb940812f2eb8954d7' # Replace with the desired failover group ID.

        url = "{}{}{}".format(self.host, "/api/v4/failover-groups/", failover_group_id)
        
        payload = {
            # Only include 'name, 'slicers', or 'thresholds' if changing values.
            'name': 'MyFailoverGroup (Test)',       # Change name of failover group.
            'slicers': { 
                            "main_slicer": { "force_blacklist": False , "priority": 2},     # Make priority 2.
                            "backup_slicer": { "force_blacklist": False , "priority": 1}    # Make priority 1.
                       },
            'thresholds': {
                            'failover_static_audio': {
                                                        'enabled': True,       # Enable static audio threshold.
                                                        'severity': 3          # Lower severity from default
                                                     },
                            'failover_tr_101_290_stats_P1_errors': {
                                                                     # Enable P1 error threshold.
                                                                     "high": 3,
                                                                     "fault_duration": 1,
                                                                     "recovery_duration": 60,
                                                                     "enabled": True,
                                                                     "severity": 4
                                                                   }
                          }
        }

        headers = {'Content-Type': 'application/json'}
  
        response = requests.patch(
            url, params=APIParams(APICredentials()).get_params({}), data=json.dumps(payload), headers=headers
        )
        
        data=json.dumps(payload)
        print(data)
  
        print(response.status_code)
  
UpdateFailoverGroup().run()

API Authentication Module (api_auth.py)

The api_auth.py module is used to authenticate Uplynk v4 APIs.

import base64
import os
import zlib, hmac, hashlib, time, json


class APICredentials:
    """
    Stores credentials required to request our API.
    """
    @property
    def user_id(self):
        """
        Set your user ID to the one defined on the User Settings page.
        """
        return "1234567890abcdefghijklmnopqrstu"

    @property
    def secret(self):
        """
        Set your API key to a value defined on the Integration Keys page. 
        """
        return "1234567890abcdefghijklmnopqrstuvwxyz1234"


class APIParams(object):
    """
    Provides API authentication. Learn more at:
        https://docs.uplynk.com   
    """
    def __init__(self, credentials):
        self.credentials = credentials

    def get_params(self, data):
        """
        Encodes and signs <data> into the expected format and returns it.
        """
        data = self._get_params(**data)
        data.update(data)
        return data

    def _get_msg(self, msg=None):
        """
        Encodes and returns the 'msg' parameter.
        """
        msg = msg if msg else {}

        msg.update({
            '_owner': self.credentials.user_id,
            '_timestamp': int(time.time())
        })

        msg = json.dumps(msg)
        msg_compressed = zlib.compress(msg.encode(), 9)
        return base64.b64encode(msg_compressed).strip()

    def _get_params(self, **msg):
        """
        Returns the message and its signature.
        """
        msg = self._get_msg(msg)

        sig = hmac.new(
            self.credentials.secret.encode(), msg, hashlib.sha256
        ).hexdigest()

        return {
            'msg': msg,
            'sig': sig
        }

Miscellaneous

What is Allowlisting and Denylisting?

When a Slicer is first detected for either the first time or after being fixed after a failover event, the Slicer Failover system denylists that Slicer. When denylisted, a Slicer will not be used as an option for Slicer Failover. This prevents the Slicer Failover system from thrashing in case the Slicer is not truly healthy.

Once the Slicer Failover system considers the new Slicer healthy, it is allowlisted. When allowlisted, a Slicer is used as an option from Slicer Failover. The transition from a denylisted to an allowlisted state could take several minutes.

What Exactly is a Warm State Slicer Doing?

In the Warm state, backup Slicers are still operating normally. A Warm state Slicer that is healthy will indicate that it is “Slicing” (or “Ads”), again, since it is operating normally.

After the Warm state Slicer sends its content to the cloud, it is not encoded, which provides the reduced cost.
In a Warm state, the Slicer is essentially in blackout mode, meaning that its video is being discarded.

In the Event Log, a Warm state Slicer will have a state of 1 from the Broker which is the same state as ad break. The broker is the component that receives the video from the Slicer. A state of 1 means the Broker is discarding the video from the Slicer. A state of 0 means that the Broker is not discarding the Slicer content.

The amount of slate displayed in Hot-Warm failover can range from a few seconds to over 30 seconds and is dependent on a number of factors such as the thresholds used, system loading, etc..

Auto Failback

Auto Failback after a failover condition will happen under the following conditions:

Auto Failback is ENABLED
The formerly failed Slicer has to be considered “healthy” for 30 seconds
After being healthy, the system takes 1-2 minutes for the Slicer to become allowlisted
The priority of the formerly failed Slicer has to be higher than the current one

The system is designed to failover quickly and failback slowly to avoid toggling.

Event Log Info

State for xxxxx changed from 0 to 1:

Means that Slicer xxxxx is going from Slicing state (0) to ad break or a Warm state (1)
If going into a Warm state, there is usually an additional log message near this one

A denylisted Slicer means that it is unhealthy while an allowlisted Slicer means that the Slicer is considered healthy.

Things to Watch Out For

When changing the contents of the Slicer tab, make sure to click Save so that the changes take effect.

When you enable a Slicer from the Slicer tab, it takes about 1-2 minutes for the Slicer to truly become active in the Failover Group. This is to ensure that the enabled Slicer is truly healthy before being considered for failover.

If you manually kill a Slicer, the Slicer status on the Slicer tab and the Content tab still says it is Slicing. For the Slicer tab, will try and use Slicer Health instead to provide more accurate information. Slicer Monitoring shows the status of the Slicer accurately and Slicer Failover itself will work.

If using a Slicer version before Slicer Release Slicer Release 23062600, make sure your failover_id in the Slicer config exactly matches that in the CMS, especially if adding/deleting Failover Groups in the CMS while doing testing (restarting the Slicers too).

If the backup Slicers have unstable/bad input, this could prevent a “failed” Slicer from doing failover since the backup Slicers would not be considered healthy. For instance, if two Slicers in a Failover Group have the same input feed and that feed stops, failover will not occur since both Slicers will be considered unhealthy.

When a Slicer is in the Warm state, its status would still be “Slicing” (or “Ads”) since it is operating properly. In the Warm state, the cloud encoding is disabled, not the Slicer itself.

If you get an error reading Failover Groups in the CMS, may still have a permissions issue.

If after adding a Failover Group to a channel, if there is no thumbnail for the channel displayed in the CMS, restart the Slicers in the Failover Group.

Make sure the Slicer config has enable_remote_config: 1 in it to enable the Slicer to get configurations from a centralized database. enable_remote_config: 1 has to be after the failover_id for Slicer versions prior to Slicer Release 2306600.

If using any previous versions of Slicer Failover, they must be disabled before this Slicer Failover can be used.

If failover based on TR 101 290 errors is desired, Slicer Release 22083100 (August 2022) or later must be used.

With Cloud Slicer Live, having enable_remote_config: 1 in the config file may prevent Slicer Failover from working.

Removing a Slicer Permanently from Slicer Failover

If a Slicer is to be removed from using Slicer Failover entirely:

Remove or comment out the failover_id in the configuration file, if present
Remove or comment out the enable_remote_config: 1 in the configuration file
Remove the Slicer from the Failover Group in the CMS
Reboot the Slicer

Recommendations

Slicer Failover reports events using SNS so it is recommended to use this functionality.

If the network feeding the Slicer has stability issues, it is recommended to Input Loss set threshold to higher than 0 to reduce the sensitivity.

When using Flexible Hot-Warm, it is recommended to use Prioritized or Custom as the Priority Mode failover because Flat Mode failover at the current time may pick a Warm Slicer instead of a Hot one since the backup Slicer is chosen randomly.

Failover Notifications via Amazon SNS

📘
See Health Notifications via Amazon SNS for additional information.

Publish failover events through the following workflow:

Data Push:
- Our service pushes data to Amazon SNS whenever we fail over to another Live Slicer
Data Broadcast:
- Amazon SNS broadcasts data to one or more destinations (e.g., mobile device, web server, or Slack)
- Get started with Amazon SNS for free through its SNS free tier. Learn more
Data Formatting:
- Our service formats data using JSON. This data may then be filtered via custom code.
- This article explains how to strip out additional data generated by Amazon SNS via a custom function in Amazon Lambda

Get Started with Failover Notifications

Perform the following steps to set up notifications:

Set Up an Amazon SNS Topic:
- Our service pushes Live Slicer health and failover notifications to the same Amazon SNS topic
- You may skip this step if you have already created an Amazon SNS topic for Live Slicer health notifications
Configure Communication with Amazon SNS:
- Our service pushes Live Slicer health and failover notifications to the same Amazon SNS topic
- Updating the SNS topic for either Live Slicer health or failover notifications will affect both types of notifications
Navigate to the Failover Page:
- From the main menu, navigate to Slicers and then select Failover from the side navigation bar
Update SNS Topic:
- Click Update SNS Topic from the right-hand pane
- Set the Update your SNS Topic ARN option to the ARN for the topic created above
- Click Save Topic ARN
Configure Amazon SNS to Broadcast Notifications:
- Learn how to set up Amazon SNS and Lambda to broadcast notifications to a Slack channel

Failover Notification Fields

Our service sends information that describes a failover event in JSON format. Key parameters in this notification are described here.

Field	Description
Subject	Returns Slicer Failover
Message	Provides detailed information about the Failover event. Key parameters are described below.
Service	Returns failover.
Sender	Returns failover.
Account	Indicates the user name (e.g., email address) associated with the account for which this failover event occurred.
OID	Indicates the system-defined ID of the account for which this failover event occurred.
FO_Group_Name	Indicates a failover group's name.
FO_Group_ID	Indicates a failover group's system-defined ID.
Channels	Contains an array of the live channels associated with the failover group defined by the `FO_Group_Name` property.
Date_Time	Indicates when the notification was triggered. This timestamp is reported as Unix time in milliseconds.
Original_Slicer	Indicates the slicerID of the Live Slicer that was the source of the live stream prior to the failover event.
Slicer	Indicates the slicerID of the Live Slicer that was the source of the live stream after the failover event.
Reason	Provides additional information about this failover event. For example, this parameter may indicate the reason for failover.
Slicers_In_Group	Contains a key-value pair for each Live Slicer associated with the failover group defined by the `FO_Group_Name` property.

Each key-value pair identifies the name of a Live Slicer and its failover status. Valid failover states are described below.

Valid Failover States

Active: Indicates that our service is using this Live Slicer's feed to generate the live stream for all live channels associated with this failover group.
Hot: Indicates that the Live Slicer is encoding and storing content within our system. Our service can quickly fail over to a Live Slicer in this state.
Warm: Indicates that the Live Slicer is currently slicing content but not uploading it to our system. Failing over to a Live Slicer in this state may cause a few seconds of slate.
Unhealthy: Indicates that the Live Slicer is considered unhealthy due to at least one metric falling below a custom threshold for a given duration.
Disabled: Indicates that the failover capability for this Live Slicer has been manually disabled.

Key-Value Pair Syntax: {Live Slicer}: {Failover Status}

Example

{
  "Service": "failover",
  "Sender": "failover",
  "Account": "[email protected]",
  "OID": "1ab0812e54f44b029bcae08685f025cc",
  "FO_Group_Name": "My failover group",
  "FO_Group_ID": "f18b0d3f6393428f9aca3815a17f663e",
  "Channels": ["Basketball", "News"],
  "Date_Time": 1667834461149,
  "Original_Slicer": "bball_slicer_1",
  "Slicer": "bball_slicer_2",
  "Reason": "added to denylist: Not seen since 2022-11-07 15:21:06",
  "Slicers_In_Group": {
    "bball_slicer_1": "Active",
    "bball_slicer_2": "Hot"
  }
}

Configuration

Prerequisities and Release Information

Step 1: Create the Channel in the CMS

Step 2: Define the Failover Group in the CMS

Step 3: Configure the Slicers in the Failover Group

Step 4: Add the Slicers to the Failover Group in the CMS

Step 5: Enable the Slicers and Set Priority

Step 6: Mapping the Failover Group to the Channel

Enable Hot-Warm Failover

Enable Flat Priority Mode

Enable Custom Priority Mode

Disable Auto Failback (Prioritized and Custom Priority Modes Only)

Flexible Hot-Warm Failover

Failover Groups Summary

Updating Existing Channels to Use Slicer Failover

Failover Thresholds

Failover Thresholds UI

Failover Threshold Definitions

TR 101 290 Faults

Failover Status Monitoring

Slicer Failover API Updates

Create New Failover Group API

Request Body Parameters:

Request Body Example:

Sample Code:

Delete Failover Group API

Request URL variable:

Sample Code:

Update Failover Group API

Request URL Variable:

Request Body Parameters (include only those to be updated):

Request Body Example:

Dictionary Lists in the Update Failover Group API

Thresholds Dictionary List in Update Failover Group API

Update Failover Group Sample Code

API Authentication Module (api_auth.py)

Miscellaneous

What is Allowlisting and Denylisting?

What Exactly is a Warm State Slicer Doing?

Auto Failback

Event Log Info

Things to Watch Out For

Removing a Slicer Permanently from Slicer Failover

Recommendations

Failover Notifications via Amazon SNS

📘See Health Notifications via Amazon SNS for additional information.

Get Started with Failover Notifications

Failover Notification Fields

Valid Failover States

📘
See Health Notifications via Amazon SNS for additional information.