Service Level Expectations (SLE)

Mist Predictive Analytics and Correlation Engine (PACE) provides the Industry’s true first attempt at applying data science and machine learning to understand the actual end user experience on the network.

The primary Mist dashboard presents the results of the PACE engine, in the form of Service Level Experience metrics (SLE) Metrics.

Mist PACE monitors 7 Services Level Experience (SLE) metrics.

  • Time to Connect
  • Throughput
  • Coverage
  • Capacity
  • Roaming
  • Successful Connects
  • AP Health

SLE Metric: Time to Connect:

This SLE metric tracks the number of connections that took longer than the specified threshold to connect to the internet. The time to connect to the internet is calculated as the time from the start of the association packet from the mobile client to the point where the client is able to successfully move data.

time_to_connect = tconnected – tfirst-assoc

The classifiers for this metric are fired if the time_to_connect exceeds threshold.

note:

If the client fails to connect to the internet, this metric does not count the connection towards the connect-time metric. That is tracked by a separate service level metric. Current implementation has the classifiers divide up the time_to_connect into various buckets.

time_to_connect = sum(all tclassifier)

Classifiers

  • Association Latency
    • This classifier is assigned if a users’s time to go past the “association” state is more than 2 sigma from the average association latency, for this site
  • Authentication Latency
    • This classifier is assigned if a user’s time to go past the “authentication” state is more than 2 sigma from the average authentication latency, for this site.
  • DHCP Latency
    • This classifier is assigned if a user’s DHCP time is more than 2 sigma from the average DCHP time of fully completed successful connections for this site.
  • IP Services Latency
    • This classifier is triggered if the time between dhcp and the first dns packet is more than 2 sigma from the moving average for this site.

SLE Metric: Throughput

This SLE metric tracks the amount of time, that a client’s estimated throughput is below the threshold configured by the customer in the graph.

A client’s estimated throughput is defined as the probabilistic throughput given the clients, current wireless conditions. The estimator consider many effects, such as AP bandwidth, load, interference events, the type of wireless device (protocol, number of streams),  signal strength, and wired bandwidth. It is calculated on a per client basis for the whole site.

Four classifiers are defined for low throughput. These four are the likely causes for potential low throughput.

Classifiers

We have improved the data accuracy and visible information for our Throughput SLE Metric.  The Capacity classifier can now be expanded to reveal four sub-classifiers to provide a more granular view into specific reasons for Capacity issues in your Throughput metric.  The sub-classifiers for Throughput > Capacity are:

  • Device Capability
    • This metric tracks the user minutes that client’s throughput is  below the configured threshold, primarily due to the capacity of device.
    • This metric tracks the user minutes that client’s predicted throughput is below the configured threshold, primarily by the capacity of wired network. The capacity of wired is measured periodically by running iperf to a cloud service.
  • Capacity
    • WiFi Interference
    • Non WiFi Interference
    • High Bandwidth Utilization
    • Excessive Client Load
  • Coverage
    • This metric tracks the user minutes that client’s throughput is below the configured threshold, primarily due to the load on the associated AP.
  • Network Issues

In these sub-classifiers, you may examine Users and Access Points below the service level goal, the Timeline of failures and system changes, as well as the distribution of failures and affected items relating to the sub-classifier.

SLE Metric: Coverage

This SLE metric tracks the number of user minutes that a client’s RSSI as measured by the access point is below the threshold configurable by IT. This metric accounts for client activity – if the client is not active, the classifiers are not fired.  The Asymmetry classifiers display bad coverage between the client and the AP. This field is usually displayed in minutes (Number of minutes of bad coverage).  Asymmetry indicates there is a power level mismatch between the client & AP. Since APs are capable of higher power levels than a client, if power levels are set too high, then there is a chance that the low powered client is not heard by the AP, resulting in an asymmetry uplink issue.  Asymmetry measurements are divided into 2 Classifiers: Asymmetry Uplink and Asymmetry Downlink.

Classifiers

  • Asymmetry Uplink
    • This classifier tracks the number of user minutes that a client experiences bad coverage that can be attributed to asymmetric uplink transmit powers between the AP and client device.  Meaning, the AP hears a weak signal from the clients for a number of minutes.  There are various reasons causing this, like clients being too far from the AP. The traffic going from the client to the AP, and then to Internet is called uplink traffic.
  • Asymmetry Downlink
    • This classifier tracks the number of user minutes that a client experiences bad coverage that can be attributed to asymmetric downlink transmit powers between the AP and client device.  Meaning, clients are hearing a weak signal from the AP.  The traffic going from the AP to the client is called downlink traffic.
  • Weak Signal
    • This classifier tracks all other user minutes below the RSSI threshold.

In this example, we see that Asymmetry Downlink is zero and Asymmetry Uplink is 29%

 

The below says the AP was detecting weak signal strength from the clients for 531 minutes.  Out of the 94 users, 39 users were far away from the AP, causing the AP to detect low signal strength.

 

SLE Metric: Capacity

The SLE metric tracks the user minutes that a client experiences bad capacity. This metric tracks the per-user available channel capacity and fires off classifiers when the available capacity drops below the specified SLE threshold.

Classifiers

  • WiFi interference
    • This classifier tracks the number of user minutes that a client experiences low capacity that can be attributed to interference.
  • Non-WiFi interference
    • This classifier tracks the number of user minutes that a client experiences low capacity that can be attributed to interference.
  • Client Count
    • This classifier tracks the number of user minutes that a client experiences low capacity that can be attributed to the number of attached clients.
  • Client Usage
    • This classifier tracks the number of user minutes that a client experiences low capacity that can be attributed to client load.

SLE Metric: Roaming

This SLE metric tracks the percentage of successful roams between 2 access points for clients that are within the prescribed thresholds.  The user defines the threshold as a target time it takes for a client to roam.  Fast roaming as defined by 802.11r and OKC are for clients using RADIUS based authentication.

  • Slow 11r Roams – roams that fall below the configured threshold
  • Slow Standard Roams – this covers PSK and Open Access clients
  • Slow OKC Roams – this covers slow opportunistic key caching roams that fall below the configured threshold
  • Failed to Fast Roam – The 802.1x client didn’t try to fast roam at all and performed the full 802.1x key exchange

SLE Metric: Successful Connects

This SLE metric tracks the percentage of successful Authorization, Association, and DHCP connections by a client to the network.

AP Uptime

This is the latest SLE. Now, this Metric will be calculated using AP Reboots, AP Unreachable events, and Site Down events.  AP Unreachable is when your AP loses cloud connectivity. This can be due to a WAN issue, or just when the AP is unplugged from the switch.  Site Down events occur when all APs on your site are unreachable.