Real-life math problem from software quality assurance

dstromberg

New member
Joined
Feb 21, 2020
Messages
6
Hi folks. First post on this forum.

Consider an automated test suite comprised of individual tests, all launched together with a single command. Further consider that some of the tests have "transient errors" - that is, they fail sometimes but not always, even with the same (formal) inputs.

If a single transient error in the test suite fails with probability p for one invocation of a test, how many times t do you have to run the test to have probability q of an undetected, continued failure getting missed?

In other words, we want some assurance that a specific transient error has been corrected. We run the test suite a bunch of times before attempting a fix, to get "p" using a quotient (times_failed / times_run), which tells us how often that single test fails for one run. How many times t should we run the test without error (after attempting a fix) to have probability q that a problem remains, undetected?

Thanks!
 
you'll need to specify a confidence level to approach it mathematically
 
If it's known to be transient, why do you have to fix it?
 
If it's known to be transient, why do you have to fix it?

We want to be able to trust our test suite to be a strong indicator of how well our software is working, without needing a list of tests to ignore. And we don't want to delete the tests that fail sometimes. Our test suite takes about 15 or 20 minutes to run, so if a transient error makes us resubmit the test run, that can really slow us down.
 
q was my feeble attempt at describing that. Shall we say we need a confidence level c instead?

Restating the problem with the above:
If a single transient error in the test suite fails with probability p for one invocation of a test, how many times t do you have to run the test to have confidence c of an undetected, continued failure getting missed?
 
Restating the problem with the above:
If a single transient error in the test suite fails with probability p for one invocation of a test, how many times t do you have to run the test to have confidence c of an undetected, continued failure getting missed?

I could have worded that a little better:

If a single transient error in the test suite fails with probability p for one invocation of a test, how many times t do you have to run the test to have confidence c of there no longer being an undetected, continued failure in that test?
 
I am a confused by this question. What do you mean by "transient error". Would "intermittent error" be a better description? I will give an answer based on my best guess at your requirement - hopefully this will help

Given the probability of a test failing is p, then the probability of the test passing is (1-p)
The probability of t repeated tests ALL passing is (1-p)^t (assuming they are independent events)
Let the probability of having one, or more, test failures in "t" repeated events be q, where q=1 - (1-p)^t.

Rearranging to make t the subject, t = log(1 - q)/log(1 - p)

So, if p=0.1 and you require the probability of having at least one test failure in t repeated events to be greater than 0.9 (or 90%) then
t > log(1 - 0.9)/log(1 - 0.1)
t > 21.85
since t is integer, choose t ≥ 22 (22 corresponds to q=0.9015)

--

Hopefully you're not testing code that flies airplanes! If so then please bump q up to be MUCH nearer to 1. But preferably use some development tools that can directly trace the causes of many intermittent problems (like "valgrind").
 
Googling, I'm seeing that yes," intermittent error" is probably a better phrase for it.

Thanks!
 
Top