Real-life math problem from software quality assurance

dstromberg · Feb 21, 2020

Hi folks. First post on this forum.

Consider an automated test suite comprised of individual tests, all launched together with a single command. Further consider that some of the tests have "transient errors" - that is, they fail sometimes but not always, even with the same (formal) inputs.

If a single transient error in the test suite fails with probability p for one invocation of a test, how many times t do you have to run the test to have probability q of an undetected, continued failure getting missed?

In other words, we want some assurance that a specific transient error has been corrected. We run the test suite a bunch of times before attempting a fix, to get "p" using a quotient (times_failed / times_run), which tells us how often that single test fails for one run. How many times t should we run the test without error (after attempting a fix) to have probability q that a problem remains, undetected?

Thanks!

Romsek · Feb 21, 2020

you'll need to specify a confidence level to approach it mathematically

tkhunny · Feb 21, 2020

If it's known to be transient, why do you have to fix it?

dstromberg · Feb 22, 2020

Romsek said:
you'll need to specify a confidence level to approach it mathematically

q was my feeble attempt at describing that. Shall we say we need a confidence level c instead?

dstromberg · Feb 22, 2020

tkhunny said:
If it's known to be transient, why do you have to fix it?

We want to be able to trust our test suite to be a strong indicator of how well our software is working, without needing a list of tests to ignore. And we don't want to delete the tests that fail sometimes. Our test suite takes about 15 or 20 minutes to run, so if a transient error makes us resubmit the test run, that can really slow us down.

dstromberg · Feb 24, 2020

dstromberg said:
q was my feeble attempt at describing that. Shall we say we need a confidence level c instead?

Restating the problem with the above:
If a single transient error in the test suite fails with probability p for one invocation of a test, how many times t do you have to run the test to have confidence c of an undetected, continued failure getting missed?

dstromberg · Feb 27, 2020

dstromberg said:
Restating the problem with the above:
If a single transient error in the test suite fails with probability p for one invocation of a test, how many times t do you have to run the test to have confidence c of an undetected, continued failure getting missed?

I could have worded that a little better:

If a single transient error in the test suite fails with probability p for one invocation of a test, how many times t do you have to run the test to have confidence c of there no longer being an undetected, continued failure in that test?

Cubist · Feb 27, 2020

I am a confused by this question. What do you mean by "transient error". Would "intermittent error" be a better description? I will give an answer based on my best guess at your requirement - hopefully this will help

Given the probability of a test failing is p, then the probability of the test passing is (1-p)
The probability of t repeated tests ALL passing is (1-p)^t (assuming they are independent events)
Let the probability of having one, or more, test failures in "t" repeated events be q, where q=1 - (1-p)^t.

Rearranging to make t the subject, t = log(1 - q)/log(1 - p)

So, if p=0.1 and you require the probability of having at least one test failure in t repeated events to be greater than 0.9 (or 90%) then
t > log(1 - 0.9)/log(1 - 0.1)
t > 21.85
since t is integer, choose t ≥ 22 (22 corresponds to q=0.9015)

--

Hopefully you're not testing code that flies airplanes! If so then please bump q up to be MUCH nearer to 1. But preferably use some development tools that can directly trace the causes of many intermittent problems (like "valgrind").

dstromberg · Mar 1, 2020

Googling, I'm seeing that yes," intermittent error" is probably a better phrase for it.

Thanks!

Cubist · Mar 1, 2020

dstromberg said:
Thanks!

You're welcome. Hope you can find the cause of the test failures.

Real-life math problem from software quality assurance

dstromberg

New member

Romsek

Senior Member

tkhunny

Moderator

dstromberg

New member

dstromberg

New member

dstromberg

New member

dstromberg

New member

Cubist

Senior Member

dstromberg

New member

Cubist

Senior Member