About Hangups

Most of the time, calls end in a handful of expected ways that are considered 'normal'. Rarely seen hangup causes are typically not relevant in isolation but when spikes of the same abnormal hangup cause(s) occur,

Just as most computers show you metrics related to CPU usage over time (one, five and fifteen minutes typically), the hangups app tracks hangup causes and gives you the tools to send alerts when thresholds are exceeded.

For instance, a common sign that an upstream carrier is having issues with their network is an increase in PROGRESS_TIMEOUT hangup causes when sending calls to the carrier. When the INVITE goes to the carrier, Kazoo waits a certain amount of time (8 seconds by default) to hear back a "progress" SIP message (180, 183, or 200 typically). If the carrier fails to respond in time, the leg is hungup with PROGRESS_TIMEOUT and the cause, and the next carrier (if any) is tried. If you set the threshold for PROGRESS_TIMEOUT to 0.5 for the one minute metric, you will receive alerts when the number of calls terminating with PRGORESS_TIMEOUT increase to the point that the threshold is tripped.

Other hangup causes can imply other failure scenarios worth investigating. The table below offers some suggestions for interpreting the hangup cause.

Typical Abnormal Hangup Causes

||Hangup State||Possible Causes|| |WRONG_CALL_STATE|We see these sometimes when ACLs are out of sorts| |NO_ROUTE_DESTINATION|This could mean there is no callflow defined for the number, or the number is unassigned to a PBX in PBX Connector. You'll have to figure out which side of the dialog is hanging up first though.| |CALL_REJECTED|The side sending this hangup isn't going to route the call.| |MANDATORY_IE_MISSING|This might be because the leg was challenged for authentication and was unable to comply. Another cause could be no codec was negotiated between the two sides. Check the SDP codec listings for both sides.| |PROGRESS_TIMEOUT|The endpoint (carrier or device) failed to progress to early media, ringing, or answering the call within the allotted time. May be indicative of errors on the endpoint's side. If a carrier, consider removing them from the offnet/account routing until you can discover the issue. These hangups impact PDD (post-dial delay) and are quite noticable to the caller.| `RECOVERY_ON_TIMER_EXPIRE|Often seen when NAT is interfering with receiving responses from the endpoint. Check the firewall at the customer's site for SIP ALG (and turn it off), try port 7000 or TCP as necessary.|

Setting up monitors

Setting the load average

Load averages help tell you when something is occuring regularly. But how to know what thresholds should cause alerting? There are many articles that talk about this. One article uses a bridge metaphor that can be helpful. The truth is you'll need to play with these as you go, since as your volume of calls increase, the thresholds at which you reach "hair is on fire" severity will change. In CPU load terms, 1.0 is high when you have a single CPU but is nothing when you have a 24-core server.

{LOAD_AVG} should be the number above which alerting should start (0.0 will disable alerting for that metric).

{METRIC} is one of: - one - five - fifteen - day

Set thresholds for a hangup cause / metric


$> sup hangups_maintenance set_threshold {HANGUP_CAUSE} {METRIC} {LOAD_AVG}
set {METRIC} for hangups.{HANGUP_CAUSE} to {LOAD_AVG}


$> sup hangups_maintenance set_threshold {ACCOUNT_ID} {HANGUP_CAUSE} {METRIC} {LOAD_AVG}

Set metric thresholds

A good way to initialize your thresholds is to apply the same threshold across all tracked hangup causes.


$> sup hangups_maintenance set_metric {METRIC} {LOAD_AVG}
set {METRIC} for hangups.WRONG_CALL_STATE to {LOAD_AVG}
set {METRIC} for hangups.CALL_REJECT to {LOAD_AVG}


$> sup hangups_maintenance set_metric {ACCOUNT_ID} {METRIC} {LOAD_AVG}

Deprecated SUP commands

Monitor an account's minute load average

sup hangups_maintenance activate_monitors {ACCOUNT_ID} {LOAD_AVG}

Add a monitor for a specific hangup cause

sup hangups_maintenance activate_monitor {ACCOUNT_ID} {HANGUP_CAUSE}

Set threshold for a specific hangup cause

sup hangups_maintenance set_monitor_threshold {HANGUP_CAUSE} {{per-minute threshold}}

Set a particular threshold for a specific hangup cause

sup hangups_maintenance set_monitor_threshold {HANGUP_CAUSE} {{threshold name}} {{value}}

Where {{threshold name}} is one of one five fifteen day

Edit this page here