I don't use term "Accuracy" in math sense.
I am doing OCR (Optical Character Recognition) of letters, and those sets represent values of how accurate characters have been recognized.
100 means perfect accuracy, 0 means none.
Whole set represent word, so if one set is {98,98,98} and another is {99,99,0} then {99,99,0} is more accurate.
OCR works with probability of recognition. Those numbers represent probability of recognition.
If I have some words with chars with probability of recognition {98,98,98,98} and {99,99,99,90} it's more certain that latter word {99,99,99,90} is probably correctly recognized, so I can't do simple mean of values. I have to compare those two sets in a manner that I have described.
As I stated in previous post, don't go into background of what I need, just answer me what I asked for, because I can't go into analysis why is so, I know why I am asking.
The reason background matters is that in order to meet your needs,
we need to know exactly what your needs are. When people ask what kind of average is "best", for example, I always have to ask about their
purpose, which typically depends on the
context of their question, which will determine what matters most. That's all that's happening here. What's
best can't be isolated from the specifics of what you are doing. And when people try to isolate a small part of a project, they usually don't get the right answer, because they are withholding information that they don't even realize is important.
So let's look at the facts as they have been gradually revealed. You have data representing the
probability that each letter in a word will be recognized correctly; I'll suppose that is because, say, the software recognizes A 99% of the time but Z 90% of the time, and you are mapping each letter in a word to its probability. A word made up of more easily-recognized letters,
and fewer easily-missed letters, is more likely to be correct; what you are asking for, then, is the
probability that a word with a certain set of letter probabilities will be correctly recognized.
If that is what you want (and it would have been
so much easier if you had said that at the beginning), then this is a basic probability problem. You are given p_i = P(letter a_i is correct), and want to know p = P(a_1 is correct
and a_2 is correct
and ... a_n is correct). That will just be the product of the probabilities, p = p_1 * p_2 * ... * p_n.
For your example, these are
- {98,98,98,98}: 0.98*0.98*0.98*0.98 = 0.922368
- {99,99,99,90}: 0.99*0.99*0.99*0.90 = 0.873269
So the former is in fact more likely to be recognized correctly, even though it has no 99's. Intuition is not always right. (That's why we have math ...)
Similarly, for your original example,
- [99,97,99,97,99,97]: .99*.97*.99*.97*.99*.97 = 0.885566
- [99,99,99,99,99,89]: .99*.99*.99*.99*.99*.89 = 0.846381
Again, the first is a little more likely to be correct (contrary to your guess), even though it has fewer 99's, because it has more high probabilities.
Does that sound like what you are looking for?
By the way, do you still think that {99,99,0} is more accurate? (If the last letter can't ever be accurate, can the whole word be accurate?)