#### MasterAria

##### New member

- Joined
- Nov 17, 2020

- Messages
- 6

- Thread starter MasterAria
- Start date

- Joined
- Nov 17, 2020

- Messages
- 6

- Joined
- Nov 12, 2017

- Messages
- 11,297

I'm not sure what you mean by "each less frequent than the previous". Can you give an example?

But the bigger issue is, how are you defining "pattern"? Often that is entirely subjective, and can't be defined mathematically. If you have a specific kind of "pattern" in mind, you'll have to define it before you can program it.

- Joined
- Nov 17, 2020

- Messages
- 6

First of all, thank you for respondingI'm not sure what you mean by "each less frequent than the previous". Can you give an example?

But the bigger issue is, how are you defining "pattern"? Often that is entirely subjective, and can't be defined mathematically. If you have a specific kind of "pattern" in mind, you'll have to define it before you can program it.

Secondly, here's my clarification:

The 1st number has a 48% chance of appearing next in the sequence, the 2nd has a 24% chance, the 3rd = 16%, 4th = 8%, and the 5th = 4%.

Since the sequence is randomly generated, I can't predict the next number. The "patterns" that I'm trying to find are how often a number would repeat and form a "group" and the frequency of each "group" forming.

(ex: A group of four 1s appearing in a row = 1111)

I was wondering if there was any way to predict when a group would form after seeing the chosen number appear in the sequence, or if the "patterns" are just as random as the numbers and are a coincidence of the probabilities.

(ex: Chosen number = 3; Sequence = 211143 < Will the 3 form into a group of 3s, or will it be an outlier like the 2 at the beginning?)

- Joined
- Nov 12, 2017

- Messages
- 11,297

I still don't understand; I need a clear example, and I think you need to use clearer terminology.The Numbers

The 1st number has a 48% chance of appearing next in the sequence, the 2nd has a 24% chance, the 3rd = 16%, 4th = 8%, and the 5th = 4%.

The Patterns

Since the sequence is randomly generated, I can't predict the next number. The "patterns" that I'm trying to find are how often a number would repeat and form a "group" and the frequency of each "group" forming.

(ex: A group of four 1s appearing in a row = 1111)

My question

I was wondering if there was any way to predict when a group would form after seeing the chosen number appear in the sequence, or if the "patterns" are just as random as the numbers and are a coincidence of the probabilities.

(ex: Chosen number = 3; Sequence = 211143 < Will the 3 form into a group of 3s, or will it be an outlier like the 2 at the beginning?)

When you say "the first number", do you mean

Maybe you're talking about a sequence of unspecified length, consisting of elements taken (with repetition) from the set {1, 2, 3, 4, 5}?

Also, this sounds like a question about the probability of "patterns", not one about writing a program to "find patterns". I'm always suspicious when a problem transmutes.

- Joined
- Nov 17, 2020

- Messages
- 6

Sorry for my poor wording, I didn't know how to phrase my question/where to ask.Maybe you're talking about a sequence of unspecified length, consisting of elements taken (with repetition) from the set {1, 2, 3, 4, 5}?

Also, this sounds like a question about the probability of "patterns", not one about writing a program to "find patterns". I'm always suspicious when a problem transmutes.

I believe you're correct,

However, my goal is to use machine learning so I can change the probabilities/length of the set/sequence and still get results. Before I can start programming a bot that can calculate the probability of patterns, I need to find out how the process of finding these probabilities works.

Thank you for your help and understanding.

Last edited:

- Joined
- Nov 17, 2020

- Messages
- 6

I just found a similar question to mine that might be helpful:

- Joined
- Nov 12, 2017

- Messages
- 11,297

Okay, let's try stating the problem clearly.Sorry for my poor wording, I didn't know how to phrase my question/where to ask.

is 5 (pre-set) numbers, (let's go with your example of {1, 2, 3, 4, 5})The set

has an undetermined length and can continue to grow if needed until enough data has been gathered.The sequence

I mentioned earlier were about the numbers in the set, not the sequence. (The number 1 has a 48% probability of showing up next in the sequence, etc. It's almost like the numbers on a roulette wheel, with 1 being the most common and 5 being the least)The probabilities

I believe you're correct,I'm trying to find the probability of "patterns".

However, my goal is to use machine learning so I can change the probabilities/length of the set/sequence and still get results. Before I can start programming a bot that can calculate the probability of patterns, I need to find out how the process of finding these probabilities works.

Thank you for your help and understanding.

A sequence is formed from a set of possible values in a given probability distribution. That is, each term has a known probability of being each of the numbers in the set, say \(\displaystyle p_1=0.48\), \(\displaystyle p_2=0.24\), \(\displaystyle p_3=0.16\), \(\displaystyle p_4=0.08\), and \(\displaystyle p_5=0.04\). We can think of this as rolling the same weighted die to choose each number.

Given the nth term of the sequence, you want to find (a) the probability that it starts a run of n terms, or (b) the expected length of a run of that value. Or something like that.

Does that sound right?

Let’s assume for now that what you mean by patterns are repetitions of any of your symbols (there does not appear to be any significance to using numerals).

There are several questions that I can think of that make sense if Dr. Peterson is basically correct in his interpretation.

What is the probability that, in a string or sequence on n symbols drawn from a set of m distinct symbols, there is at least one sub-sequence consisting of k repetitions of at least one of the symbols, given that the probability of drawing a particular symbol is constant from drawing to drawing and those probabilities are independent?

- Joined
- Nov 17, 2020

- Messages
- 6

This is exactly what I wanted to ask. Thank you for narrowing it down into a solid question.What is the probability that, in a string or sequence on n symbols drawn from a set of m distinct symbols, there is at least one sub-sequence consisting of k repetitions of at least one of the symbols, given that the probability of drawing a particular symbol is constant from drawing to drawing and those probabilities are independent?

Since I didn't know how to describe this question, I had to go through the previous shenanigans of describing the situation but only making things more complicated/confusing.

I also want to be able to ask this question for each of the distinct symbols rather than "at least one of the symbols".

Some examples:

º What is the probability of a sub-sequence of 1 of any length greater than x?

º How many sub-sequences of 1 will there be on average if the main sequence is n digits long?

º How long is the average length of a sub-sequence of 1?

(Rinse & Repeat for all the distinct symbols in the set)

I want to thank you both for helping me this far. (I've been saying thank you in almost every post, but I really do appreciate the help)

- Joined
- Nov 12, 2017

- Messages
- 11,297

Suppose you see a 2, and want to know the probability that there will be (at least) 4 of them in a row (that is, at least 3 more after the one you already have). Then you are asking for the probability of "222". The probability of the first (subsequent) term being a 2 is 0.24. For the second, it is again 0.24. For the third, it is again 0.24. So the probability of all three is 0.24*0.24*0.24 = 0.013824, or 1.4%.

You'd do similarly for elements with other probabilities.

The question about average length is what you found about the expected length of a run, and is more complicated. You can probably search for terms like "run length" to find more.

The question about how many runs is probably a lot more complicated, because the runs could be of any length.

- Joined
- Apr 12, 2005

- Messages
- 11,307

Read that a couple of times and tell me it makes any sense at all. Why would there be trends or patterns in a sequence of Random Numbers? You have done well to take encouragement and important lessons from previous posts. Keep up the good work.

- Joined
- Nov 17, 2020

- Messages
- 6

Thank you and Happy (early) Thanksgiving!

- I started working on a bot that learns to play simple tile-based games by playing against itself, then playing against people.

- I made the original version already know every possible move from the start and increase the value of moves that lead to victory while decreasing the value of losing moves after playing against people, but with games like chess, things started to get complicated.

- I got into machine learning and made a new version that only sees the score & current possible moves. It uses a neural network to select the next move, and "learns" by playing against itself.

It was pretty good at playing simple games but often ran into issues when playing against people.

The problem is,

- Tik Tac Toe (Some games like Tik Tak Toe have unbeatable strategies, but this is just an example)

- The numbers 1-9 in the set represent the player's moves (bottom-left to top-right), & 10-18 represent the computer's moves (bottom-left to top-right)

- A possible sequence of moves could be: {1, 13, 3, 11, 9, 15, 5} (Player wins)

> If someone plays multiple rounds, is there a way to identify a sequence of moves they repeat often?

> What about multiple people using the same strategy?

We have \(\displaystyle \text i \in \mathbb Z \text { and } 1 \le i \le m \implies 0 < p_i \text { and } \sum_{i=1}^m p_i = 1.\)

\(\displaystyle \therefore p_{min} = \text {min}(p_1,\ ... \ p_m) \implies 0 < p_{min} \le \dfrac{1}{m},\)

We are interested in the probability of at least one event, where an event is defined to be k > 0 successive occurrences of any one of the m symbols. Obviously, if m = 1, that probability is 1. Equally obviously, if k = 1, the probability is 1. Furthermore, it is also obvious that if k > n, the probability is 0. Thus, for the problem to be non-trivial, we must have

\(\displaystyle 2 \le k \le n \text { and } m \ge 2.\)

\(\displaystyle \therefore i \in \mathbb Z \text { and } 1 \le i \le m \implies 0 < p_i < 1 \text { and } \sum_{i=1}^m p_i = 1.\)

If we have at least one event, there must be a first event. That could start at the first selection, the second selection, and so so on all the way up to the n+1-k selection. (If n = 10 and k = 2, we cannot start a sequence of length 2 in position 10; the latest it can start is position 9.)

Let \(\displaystyle q_j\) be the probability that the first event starts at selection j > 0,

and let \(\displaystyle c_j\) be the probability that the first event starts on or before selection j > 0.

Define \(\displaystyle c_0 = 0. \)

Obviously, \(\displaystyle j > 0 \implies c_j = q_j + c_{j-1}.\)

Let r = the probability that a given selection is the start of a sequence of k occurrences of one of the m symbols.

\(\displaystyle \therefore r = \sum_{i=1}^m p_i^k \implies 0 < r < 1 \ \because \ k \ge 2.\)

The probability that the

\(\displaystyle q_j = (1 - c_{j-1}) * r.\)

And the probability that we want is \(\displaystyle c_{n+1-k}.\)

This is enough to set up a

My difficulty has been to

\(\displaystyle q_1 = (1 - c_0) * r = (1 - 0)r = r.\)

\(\displaystyle c_1 = q_1 + c_0 = r + 0 = r.\)

\(\displaystyle q_2 = (1 - c_1) * r = (1 - r)r = r - r^2.\)

\(\displaystyle c_2 = q_2 + c_1 = r - r^2 + r = 2r - r^2.\)

\(\displaystyle q_3 = (1 - c_2) * r = r - r(2r - r^2) + r^3= r - 2r^2 + r^3.\)

\(\displaystyle c_3 = q_3 + c_2 = r - 2r^2 + r^3 + 2r - r^2 = 3r - 3r^2 + r^3.\)

\(\displaystyle q_4 = (1 - c_3) * r = r - r(3r - 3r^2 + r^3) =\\

r - 3r^2 + 3r^3 - r^4.\)

\(\displaystyle c_4 = q_4 + c_3=\\

r - 3r^2 + 3r^3 - r^4 + 3r - 3r^2 + r^3 = 4r - 6r^2 + 4r^3 - r^4.\)

\(\displaystyle q_5 = (1 - c_4) * r = r - r(4r - 6r^2 + 4r^3 - r^4) =\\

r - 4r^2 + 6r^3 - 4r^4 + r^5.\)

\(\displaystyle c_5 = q_5 + c_4=\\

r - 4r^2 + 6r^3 - 4r^4 + r^5 + 4r - 6r^2 + 4r^3 - r^4 = \\

5r - 10r^2 +10r^3 - 5r^4 + r^5.\)

\(\displaystyle q_6 = (1 - c_5) * r = r - r(5r - 10r^2 +10r^3 - 5r^4 + r^5) =\\

r - 5r^2 + 10r^3 - 10r^4 + 5r^5 - r^6.\)

\(\displaystyle c_6 = q_6 + c_5= r - 5r^2 + 10r^3 - 10r^4 + 5r^5 - r^6 \\

+ 5r - 10r^2 +10r^3 - 5r^4 + r^5 = \\

6r - 15r^2 + 20r^3 - 15r^4 + 6r^5 - r^6.\)

Does anyone see a way to generalize that?

- Joined
- Nov 23, 2020

- Messages
- 19

Sounds like you're dealing with a negative binomial distribution to me? Check wikipedia? Also you posted a reply instead of an individual thread?

We have \(\displaystyle \text i \in \mathbb Z \text { and } 1 \le i \le m \implies 0 < p_i \text { and } \sum_{i=1}^m p_i = 1.\)

\(\displaystyle \therefore p_{min} = \text {min}(p_1,\ ... \ p_m) \implies 0 < p_{min} \le \dfrac{1}{m},\)

We are interested in the probability of at least one event, where an event is defined to be k > 0 successive occurrences of any one of the m symbols. Obviously, if m = 1, that probability is 1. Equally obviously, if k = 1, the probability is 1. Furthermore, it is also obvious that if k > n, the probability is 0. Thus, for the problem to be non-trivial, we must have

\(\displaystyle 2 \le k \le n \text { and } m \ge 2.\)

\(\displaystyle \therefore i \in \mathbb Z \text { and } 1 \le i \le m \implies 0 < p_i < 1 \text { and } \sum_{i=1}^m p_i = 1.\)

If we have at least one event, there must be a first event. That could start at the first selection, the second selection, and so so on all the way up to the n+1-k selection. (If n = 10 and k = 2, we cannot start a sequence of length 2 in position 10; the latest it can start is position 9.)

Let \(\displaystyle q_j\) be the probability that the first event starts at selection j > 0,

and let \(\displaystyle c_j\) be the probability that the first event starts on or before selection j > 0.

Define \(\displaystyle c_0 = 0. \)

Obviously, \(\displaystyle j > 0 \implies c_j = q_j + c_{j-1}.\)

Let r = the probability that a given selection is the start of a sequence of k occurrences of one of the m symbols.

\(\displaystyle \therefore r = \sum_{i=1}^m p_i^k \implies 0 < r < 1 \ \because \ k \ge 2.\)

The probability that theFIRSTsequence of k occurrences of one of the m symbols occurs at position j is a conditional probability equal to the probability that the first such sequence did not occur earlier times the probability that such a sequence starts at this position. In other words,

\(\displaystyle q_j = (1 - c_{j-1}) * r.\)

And the probability that we want is \(\displaystyle c_{n+1-k}.\)

This is enough to set up aVERYsimple spreadsheet to solve problems where m, k, n and the probabilities associated with each symbol are known. Moreover, messing around with such a spreadsheet seems to confirm my intuition that if n >> k, the probability of at least one sequence of at least length k of at least one symbol approaches 1. For example, with m = 4, k = 4, and probabilities for individual symbols of 0.5, 0.25, 0.15, and 0.1, the probability of at least one sequence of at least length k of at least one symbol > 99% if n > 66. It exceeds 50% if n > 10.

My difficulty has been toprovethat as k/n approaches 0, the probability approaches 1, let alone say anything about the rate of convergence. I get a pattern that is similar to (but obviously different from) Pascal's triangle, but I have not been able to generalize it.

\(\displaystyle q_1 = (1 - c_0) * r = (1 - 0)r = r.\)

\(\displaystyle c_1 = q_1 + c_0 = r + 0 = r.\)

\(\displaystyle q_2 = (1 - c_1) * r = (1 - r)r = r - r^2.\)

\(\displaystyle c_2 = q_2 + c_1 = r - r^2 + r = 2r - r^2.\)

\(\displaystyle q_3 = (1 - c_2) * r = r - r(2r - r^2) + r^3= r - 2r^2 + r^3.\)

\(\displaystyle c_3 = q_3 + c_2 = r - 2r^2 + r^3 + 2r - r^2 = 3r - 3r^2 + r^3.\)

\(\displaystyle q_4 = (1 - c_3) * r = r - r(3r - 3r^2 + r^3) =\\

r - 3r^2 + 3r^3 - r^4.\)

\(\displaystyle c_4 = q_4 + c_3=\\

r - 3r^2 + 3r^3 - r^4 + 3r - 3r^2 + r^3 = 4r - 6r^2 + 4r^3 - r^4.\)

\(\displaystyle q_5 = (1 - c_4) * r = r - r(4r - 6r^2 + 4r^3 - r^4) =\\

r - 4r^2 + 6r^3 - 4r^4 + r^5.\)

\(\displaystyle c_5 = q_5 + c_4=\\

r - 4r^2 + 6r^3 - 4r^4 + r^5 + 4r - 6r^2 + 4r^3 - r^4 = \\

5r - 10r^2 +10r^3 - 5r^4 + r^5.\)

\(\displaystyle q_6 = (1 - c_5) * r = r - r(5r - 10r^2 +10r^3 - 5r^4 + r^5) =\\

r - 5r^2 + 10r^3 - 10r^4 + 5r^5 - r^6.\)

\(\displaystyle c_6 = q_6 + c_5= r - 5r^2 + 10r^3 - 10r^4 + 5r^5 - r^6 \\

+ 5r - 10r^2 +10r^3 - 5r^4 + r^5 = \\

6r - 15r^2 + 20r^3 - 15r^4 + 6r^5 - r^6.\)

Does anyone see a way to generalize that?

Yes. I was answering the original question. I suspect I gave the OP what was needed. The rest is more sort of intellectual curiosity. I'll look up negative binomial distribution.Sounds like you're dealing with a negative binomial distribution to me? Check wikipedia? Also you posted a reply instead of an individual thread?