In our previous post, VoIP Performance Over WiFi we have seen an overview of VoIP, the mechanism it uses for data flow, why VoIP application is so special to consider, indicators of VoIP performance, challenges in understanding the performance over wifi and more. In this blog, let us see the VoIP test methodology, results and takeaways.
We did VoIP tests using our Wifi Multi-Client Emulator SWAT WiCheck on various Access Points. The tests verified our expectation that VoIP performance varies significantly across Access Points.
We will use the results from 3 of these access points to demonstrate the generic trends and bottlenecks in performance when it comes to VoIP.
All the 3 Access Points are dual band 2.4Ghz + 5Ghz and support 100+ clients on each band. Since 100+ clients are supported, we could potentially load up to 50 concurrent VoIP calls i.e. 50 clients calling other 50 clients.
We transferred an audio file with conversational quality audio from both the ends. Throughput requirement for 100 Clients together did not exceed 75 Mbps with the chosen audio (48000 Hz, 16 bit). All the APs support sufficient throughput to handle this number, and there is sufficient headroom in terms of capacity. Tests were run in the same 5GHz band for all Access Points
But as mentioned in the previous blog, VoIP is challenging and we did not expect each AP to be able to reach the 50 calls pass criterion. So we planned to increase the load in a phased manner. First, we checked with 1 call (2 clients connected, with one calling the other) and then we moved in steps of 16 calls and so on. We noted the performance at 1 call, 16 calls, 32 calls and 48 calls. (We stopped at 48 rather than 50 to maintain the steps of 16)
We measured two metrics at each load level
- Call Success Percentage – This measures the percentage of successful calls i.e. the number of calls that got connected and remained connected for the entire duration of the test without any drops.
- MOS –
- For each call, we measured the MOS based on Jitter, Latency and Packet loss as mentioned in the previous blog.
- We measured both Rx MOS and Tx MOS and took average to arrive at a MOS for a call.
- Then we averaged the MOS for all the calls to arrive at an average MOS for a particular load level. If the call did not go through or got disconnected in between then we marked the MOS to 0.
Here is the AP performance vs Number of concurrent calls
Call Success Percentage
So we see that AP1 and AP2 both did pretty well, while the AP3 could not hold up the calls. So it seems AP1 and AP2 are very similar in performance while only AP3 is not good.
However when we see the MOS results than we can see the real picture.
Based on our experiments, we found that any MOS score of 4.2+ is excellent and anything between 3.8 and 4.2 is good and acceptable.
Mean Opinion Score
AP1 is holding up to 4.2+ score for all load levels. This is inline with the Call Success percentage as well.
AP3 is going down at the load level of 16 calls. This is also inline with the Call Success percentage.
AP2 was good up to 16 calls, when we went to 32 call level, it dropped to an unacceptable quality level, even though it could sustain the calls. That does not help. For a user, it is as good as call drop as they will manually disconnect the call. Here we see a massive difference compared to our expectation from Call Success percentage data. We expected AP2 to be like AP1. Both of them could hold all the calls but the user experience is very different.
So the takeaway from this test is that throughput or for that matter performance of any other application is different from VoIP and VoIP should be tested independently. Also just being able to keep up the call is not a good measure of VoIP performance, MOS is the only true measure.
There are various factors that can be at play, and they can be tuned to achieve better VoIP scores. Lot of enterprise AP OEMs already do that and are able to make significant improvements.
May be the airtime fairness did not work properly, may be the power control did not kick in, may be the MCS rates were not shared properly. With access point logs, and wireshark logs at various test complexity levels, it is possible to investigate and find the root cause.
The idea is to start from a working level and slowly increase the complexity of test to hit the non-working level.
Complexity can be increased by considering more variables in each test scenario
- Number of Concurrent Calls
- Amount of Call data in each call – no data transfer, or only one person talking, both persons talking
- Amount of wifi Interference/Noise
- There is always a significant amount of non voice traffic in the background, so different level of the non-voice traffic is another variable that can be introduced in the test
Comparing logs captured at each complexity level provide us more insights into the influence and degree of impact of each factor we introduce into the test scenario.
Once the trends are identified, Access point makers can investigate the root cause and by tuning various parameters, chase the target load and MOS supported under various conditions.
Alethea SWAT WiCheck
Validating 10s to 100s of simultaneous VoIP calls manually with 100 Laptops/Phones/Ipads is a tedious task. To understand the user experience of 100s of simultaneous VoIP calls, the smart and effective approach is using SWAT WiCheck. SWAT WiCheck can emulate real user behavior and generate objective call quality metrics. WiCheck can help to run enterprise-grade VoIP tests whether you use Skype for Business or standard SIP servers.