How do you find general patterns/trends in a sequence of random numbers?

MasterAria

New member
Joined
Nov 17, 2020
Messages
6
I'm currently working on a coding project and want to incorporate very basic pattern finding. However, the sequence of data that it needs to find patterns in is a random sequence of 5 numbers, each less frequent than the previous. How can I go about programming this? I can't seem to find any methods of doing this.
 

Dr.Peterson

Elite Member
Joined
Nov 12, 2017
Messages
9,600
I'm currently working on a coding project and want to incorporate very basic pattern finding. However, the sequence of data that it needs to find patterns in is a random sequence of 5 numbers, each less frequent than the previous. How can I go about programming this? I can't seem to find any methods of doing this.
I'm not sure what you mean by "each less frequent than the previous". Can you give an example?

But the bigger issue is, how are you defining "pattern"? Often that is entirely subjective, and can't be defined mathematically. If you have a specific kind of "pattern" in mind, you'll have to define it before you can program it.
 

MasterAria

New member
Joined
Nov 17, 2020
Messages
6
I'm not sure what you mean by "each less frequent than the previous". Can you give an example?

But the bigger issue is, how are you defining "pattern"? Often that is entirely subjective, and can't be defined mathematically. If you have a specific kind of "pattern" in mind, you'll have to define it before you can program it.
First of all, thank you for responding :)
Secondly, here's my clarification:

The Numbers
The 1st number has a 48% chance of appearing next in the sequence, the 2nd has a 24% chance, the 3rd = 16%, 4th = 8%, and the 5th = 4%.

The Patterns
Since the sequence is randomly generated, I can't predict the next number. The "patterns" that I'm trying to find are how often a number would repeat and form a "group" and the frequency of each "group" forming.
(ex: A group of four 1s appearing in a row = 1111)

My question
I was wondering if there was any way to predict when a group would form after seeing the chosen number appear in the sequence, or if the "patterns" are just as random as the numbers and are a coincidence of the probabilities.
(ex: Chosen number = 3; Sequence = 211143 < Will the 3 form into a group of 3s, or will it be an outlier like the 2 at the beginning?)
 

Dr.Peterson

Elite Member
Joined
Nov 12, 2017
Messages
9,600
The Numbers
The 1st number has a 48% chance of appearing next in the sequence, the 2nd has a 24% chance, the 3rd = 16%, 4th = 8%, and the 5th = 4%.

The Patterns
Since the sequence is randomly generated, I can't predict the next number. The "patterns" that I'm trying to find are how often a number would repeat and form a "group" and the frequency of each "group" forming.
(ex: A group of four 1s appearing in a row = 1111)

My question
I was wondering if there was any way to predict when a group would form after seeing the chosen number appear in the sequence, or if the "patterns" are just as random as the numbers and are a coincidence of the probabilities.
(ex: Chosen number = 3; Sequence = 211143 < Will the 3 form into a group of 3s, or will it be an outlier like the 2 at the beginning?)
I still don't understand; I need a clear example, and I think you need to use clearer terminology.

When you say "the first number", do you mean the first number in the sequence, or perhaps the first number in some unspecified set from which the numbers are drawn? You appear to be saying that there is a 48% chance that the first two terms in the sequence are the same. Is that what you mean? (No, it can't be, because you list a probability that the last one appears next, when there is no next! And the example 211143 consists of 6 terms, not 5.)

Maybe you're talking about a sequence of unspecified length, consisting of elements taken (with repetition) from the set {1, 2, 3, 4, 5}?

Also, this sounds like a question about the probability of "patterns", not one about writing a program to "find patterns". I'm always suspicious when a problem transmutes.
 

MasterAria

New member
Joined
Nov 17, 2020
Messages
6
Maybe you're talking about a sequence of unspecified length, consisting of elements taken (with repetition) from the set {1, 2, 3, 4, 5}?

Also, this sounds like a question about the probability of "patterns", not one about writing a program to "find patterns". I'm always suspicious when a problem transmutes.
Sorry for my poor wording, I didn't know how to phrase my question/where to ask.

The set is 5 (pre-set) numbers, (let's go with your example of {1, 2, 3, 4, 5})
The sequence has an undetermined length and can continue to grow if needed until enough data has been gathered.
The probabilities I mentioned earlier were about the numbers in the set, not the sequence. (The number 1 has a 48% probability of showing up next in the sequence, etc. It's almost like the numbers on a roulette wheel, with 1 being the most common and 5 being the least)

I believe you're correct, I'm trying to find the probability of "patterns".

However, my goal is to use machine learning so I can change the probabilities/length of the set/sequence and still get results. Before I can start programming a bot that can calculate the probability of patterns, I need to find out how the process of finding these probabilities works.

Thank you for your help and understanding.
 
Last edited:

Dr.Peterson

Elite Member
Joined
Nov 12, 2017
Messages
9,600
Sorry for my poor wording, I didn't know how to phrase my question/where to ask.

The set is 5 (pre-set) numbers, (let's go with your example of {1, 2, 3, 4, 5})
The sequence has an undetermined length and can continue to grow if needed until enough data has been gathered.
The probabilities I mentioned earlier were about the numbers in the set, not the sequence. (The number 1 has a 48% probability of showing up next in the sequence, etc. It's almost like the numbers on a roulette wheel, with 1 being the most common and 5 being the least)

I believe you're correct, I'm trying to find the probability of "patterns".

However, my goal is to use machine learning so I can change the probabilities/length of the set/sequence and still get results. Before I can start programming a bot that can calculate the probability of patterns, I need to find out how the process of finding these probabilities works.

Thank you for your help and understanding.
Okay, let's try stating the problem clearly.

A sequence is formed from a set of possible values in a given probability distribution. That is, each term has a known probability of being each of the numbers in the set, say \(\displaystyle p_1=0.48\), \(\displaystyle p_2=0.24\), \(\displaystyle p_3=0.16\), \(\displaystyle p_4=0.08\), and \(\displaystyle p_5=0.04\). We can think of this as rolling the same weighted die to choose each number.

Given the nth term of the sequence, you want to find (a) the probability that it starts a run of n terms, or (b) the expected length of a run of that value. Or something like that.

Does that sound right?
 

JeffM

Elite Member
Joined
Sep 14, 2012
Messages
5,453
I think this thread got off to a bad start by use of the word “pattern.” A random sequence by definition cannot have a pattern. Therefore, the pattern can’t be learned by a machine or anything else because it doesn’t exist.

Let’s assume for now that what you mean by patterns are repetitions of any of your symbols (there does not appear to be any significance to using numerals).

There are several questions that I can think of that make sense if Dr. Peterson is basically correct in his interpretation.

What is the probability that, in a string or sequence on n symbols drawn from a set of m distinct symbols, there is at least one sub-sequence consisting of k repetitions of at least one of the symbols, given that the probability of drawing a particular symbol is constant from drawing to drawing and those probabilities are independent?
 

MasterAria

New member
Joined
Nov 17, 2020
Messages
6
What is the probability that, in a string or sequence on n symbols drawn from a set of m distinct symbols, there is at least one sub-sequence consisting of k repetitions of at least one of the symbols, given that the probability of drawing a particular symbol is constant from drawing to drawing and those probabilities are independent?
This is exactly what I wanted to ask. Thank you for narrowing it down into a solid question.
Since I didn't know how to describe this question, I had to go through the previous shenanigans of describing the situation but only making things more complicated/confusing.

I also want to be able to ask this question for each of the distinct symbols rather than "at least one of the symbols".
Some examples:
º What is the probability of a sub-sequence of 1 of any length greater than x?
º How many sub-sequences of 1 will there be on average if the main sequence is n digits long?
º How long is the average length of a sub-sequence of 1?
(Rinse & Repeat for all the distinct symbols in the set)

I want to thank you both for helping me this far. (I've been saying thank you in almost every post, but I really do appreciate the help)
 

Dr.Peterson

Elite Member
Joined
Nov 12, 2017
Messages
9,600
In probability, it is extremely important to be very clear about what you are asking. You have several types of questions here, each of which would still need some clarification. I'll just answer one:

Suppose you see a 2, and want to know the probability that there will be (at least) 4 of them in a row (that is, at least 3 more after the one you already have). Then you are asking for the probability of "222". The probability of the first (subsequent) term being a 2 is 0.24. For the second, it is again 0.24. For the third, it is again 0.24. So the probability of all three is 0.24*0.24*0.24 = 0.013824, or 1.4%.

You'd do similarly for elements with other probabilities.

The question about average length is what you found about the expected length of a run, and is more complicated. You can probably search for terms like "run length" to find more.

The question about how many runs is probably a lot more complicated, because the runs could be of any length.
 

tkhunny

Moderator
Staff member
Joined
Apr 12, 2005
Messages
11,122
As for your first wording,

"How do you find general patterns/trends in a sequence of random numbers?"

Read that a couple of times and tell me it makes any sense at all. Why would there be trends or patterns in a sequence of Random Numbers? You have done well to take encouragement and important lessons from previous posts. Keep up the good work.
 

JeffM

Elite Member
Joined
Sep 14, 2012
Messages
5,453
To answer the question I posed is likely to turn into a rather ugly formula. I’m working on it. It may make more sense to do it as a spread sheet algorithm.
 

MasterAria

New member
Joined
Nov 17, 2020
Messages
6
Take as much time as you need for the algorithm, I don't want to impede your progress, but I had a similar question about another program I'm working on and was wondering if you could look at it after you finished the algorithm.
Thank you and Happy (early) Thanksgiving!

Summary:
- I started working on a bot that learns to play simple tile-based games by playing against itself, then playing against people.
- I made the original version already know every possible move from the start and increase the value of moves that lead to victory while decreasing the value of losing moves after playing against people, but with games like chess, things started to get complicated.
- I got into machine learning and made a new version that only sees the score & current possible moves. It uses a neural network to select the next move, and "learns" by playing against itself.

Problem:
It was pretty good at playing simple games but often ran into issues when playing against people.
The problem is, it can't learn any long-term strategy or remember moves that players make often.

Example:
- Tik Tac Toe (Some games like Tik Tak Toe have unbeatable strategies, but this is just an example)
- The numbers 1-9 in the set represent the player's moves (bottom-left to top-right), & 10-18 represent the computer's moves (bottom-left to top-right)
- A possible sequence of moves could be: {1, 13, 3, 11, 9, 15, 5} (Player wins)

Question:
> If the player plays the same corners in multiple games and the sub-sequence {1, 3, 9} shows up in previous rounds they played, how can the bot recognize the repetition and change its strategy?
> If someone plays multiple rounds, is there a way to identify a sequence of moves they repeat often?
> What about multiple people using the same strategy?
 

Romsek

Senior Member
Joined
Nov 16, 2013
Messages
1,006
looking at the Autocorrelation or Power Spectrum is a way to suss out patterns in a stream of possibly random numbers.
 
Last edited:

JeffM

Elite Member
Joined
Sep 14, 2012
Messages
5,453
We have an experiment where we select n times from a set of m arbitrary but distinct symbols. The probability of selecting a given symbol is constant and independent on each selection. \(\displaystyle p_i\) is the probability of selecting the ith symbol, i = 1, ... m.

We have \(\displaystyle \text i \in \mathbb Z \text { and } 1 \le i \le m \implies 0 < p_i \text { and } \sum_{i=1}^m p_i = 1.\)

\(\displaystyle \therefore p_{min} = \text {min}(p_1,\ ... \ p_m) \implies 0 < p_{min} \le \dfrac{1}{m},\)

We are interested in the probability of at least one event, where an event is defined to be k > 0 successive occurrences of any one of the m symbols. Obviously, if m = 1, that probability is 1. Equally obviously, if k = 1, the probability is 1. Furthermore, it is also obvious that if k > n, the probability is 0. Thus, for the problem to be non-trivial, we must have

\(\displaystyle 2 \le k \le n \text { and } m \ge 2.\)

\(\displaystyle \therefore i \in \mathbb Z \text { and } 1 \le i \le m \implies 0 < p_i < 1 \text { and } \sum_{i=1}^m p_i = 1.\)

If we have at least one event, there must be a first event. That could start at the first selection, the second selection, and so so on all the way up to the n+1-k selection. (If n = 10 and k = 2, we cannot start a sequence of length 2 in position 10; the latest it can start is position 9.)

Let \(\displaystyle q_j\) be the probability that the first event starts at selection j > 0,
and let \(\displaystyle c_j\) be the probability that the first event starts on or before selection j > 0.

Define \(\displaystyle c_0 = 0. \)

Obviously, \(\displaystyle j > 0 \implies c_j = q_j + c_{j-1}.\)

Let r = the probability that a given selection is the start of a sequence of k occurrences of one of the m symbols.

\(\displaystyle \therefore r = \sum_{i=1}^m p_i^k \implies 0 < r < 1 \ \because \ k \ge 2.\)

The probability that the FIRST sequence of k occurrences of one of the m symbols occurs at position j is a conditional probability equal to the probability that the first such sequence did not occur earlier times the probability that such a sequence starts at this position. In other words,

\(\displaystyle q_j = (1 - c_{j-1}) * r.\)

And the probability that we want is \(\displaystyle c_{n+1-k}.\)

This is enough to set up a VERY simple spreadsheet to solve problems where m, k, n and the probabilities associated with each symbol are known. Moreover, messing around with such a spreadsheet seems to confirm my intuition that if n >> k, the probability of at least one sequence of at least length k of at least one symbol approaches 1. For example, with m = 4, k = 4, and probabilities for individual symbols of 0.5, 0.25, 0.15, and 0.1, the probability of at least one sequence of at least length k of at least one symbol > 99% if n > 66. It exceeds 50% if n > 10.

My difficulty has been to prove that as k/n approaches 0, the probability approaches 1, let alone say anything about the rate of convergence. I get a pattern that is similar to (but obviously different from) Pascal's triangle, but I have not been able to generalize it.

\(\displaystyle q_1 = (1 - c_0) * r = (1 - 0)r = r.\)

\(\displaystyle c_1 = q_1 + c_0 = r + 0 = r.\)

\(\displaystyle q_2 = (1 - c_1) * r = (1 - r)r = r - r^2.\)

\(\displaystyle c_2 = q_2 + c_1 = r - r^2 + r = 2r - r^2.\)

\(\displaystyle q_3 = (1 - c_2) * r = r - r(2r - r^2) + r^3= r - 2r^2 + r^3.\)

\(\displaystyle c_3 = q_3 + c_2 = r - 2r^2 + r^3 + 2r - r^2 = 3r - 3r^2 + r^3.\)

\(\displaystyle q_4 = (1 - c_3) * r = r - r(3r - 3r^2 + r^3) =\\
r - 3r^2 + 3r^3 - r^4.\)
\(\displaystyle c_4 = q_4 + c_3=\\

r - 3r^2 + 3r^3 - r^4 + 3r - 3r^2 + r^3 = 4r - 6r^2 + 4r^3 - r^4.\)
\(\displaystyle q_5 = (1 - c_4) * r = r - r(4r - 6r^2 + 4r^3 - r^4) =\\

r - 4r^2 + 6r^3 - 4r^4 + r^5.\)
\(\displaystyle c_5 = q_5 + c_4=\\

r - 4r^2 + 6r^3 - 4r^4 + r^5 + 4r - 6r^2 + 4r^3 - r^4 = \\

5r - 10r^2 +10r^3 - 5r^4 + r^5.\)
\(\displaystyle q_6 = (1 - c_5) * r = r - r(5r - 10r^2 +10r^3 - 5r^4 + r^5) =\\

r - 5r^2 + 10r^3 - 10r^4 + 5r^5 - r^6.\)
\(\displaystyle c_6 = q_6 + c_5= r - 5r^2 + 10r^3 - 10r^4 + 5r^5 - r^6 \\

+ 5r - 10r^2 +10r^3 - 5r^4 + r^5 = \\

6r - 15r^2 + 20r^3 - 15r^4 + 6r^5 - r^6.\)
Does anyone see a way to generalize that?
 

anon_hedgepig

New member
Joined
Nov 23, 2020
Messages
19
We have an experiment where we select n times from a set of m arbitrary but distinct symbols. The probability of selecting a given symbol is constant and independent on each selection. \(\displaystyle p_i\) is the probability of selecting the ith symbol, i = 1, ... m.

We have \(\displaystyle \text i \in \mathbb Z \text { and } 1 \le i \le m \implies 0 < p_i \text { and } \sum_{i=1}^m p_i = 1.\)

\(\displaystyle \therefore p_{min} = \text {min}(p_1,\ ... \ p_m) \implies 0 < p_{min} \le \dfrac{1}{m},\)

We are interested in the probability of at least one event, where an event is defined to be k > 0 successive occurrences of any one of the m symbols. Obviously, if m = 1, that probability is 1. Equally obviously, if k = 1, the probability is 1. Furthermore, it is also obvious that if k > n, the probability is 0. Thus, for the problem to be non-trivial, we must have

\(\displaystyle 2 \le k \le n \text { and } m \ge 2.\)

\(\displaystyle \therefore i \in \mathbb Z \text { and } 1 \le i \le m \implies 0 < p_i < 1 \text { and } \sum_{i=1}^m p_i = 1.\)

If we have at least one event, there must be a first event. That could start at the first selection, the second selection, and so so on all the way up to the n+1-k selection. (If n = 10 and k = 2, we cannot start a sequence of length 2 in position 10; the latest it can start is position 9.)

Let \(\displaystyle q_j\) be the probability that the first event starts at selection j > 0,
and let \(\displaystyle c_j\) be the probability that the first event starts on or before selection j > 0.

Define \(\displaystyle c_0 = 0. \)

Obviously, \(\displaystyle j > 0 \implies c_j = q_j + c_{j-1}.\)

Let r = the probability that a given selection is the start of a sequence of k occurrences of one of the m symbols.

\(\displaystyle \therefore r = \sum_{i=1}^m p_i^k \implies 0 < r < 1 \ \because \ k \ge 2.\)

The probability that the FIRST sequence of k occurrences of one of the m symbols occurs at position j is a conditional probability equal to the probability that the first such sequence did not occur earlier times the probability that such a sequence starts at this position. In other words,

\(\displaystyle q_j = (1 - c_{j-1}) * r.\)

And the probability that we want is \(\displaystyle c_{n+1-k}.\)

This is enough to set up a VERY simple spreadsheet to solve problems where m, k, n and the probabilities associated with each symbol are known. Moreover, messing around with such a spreadsheet seems to confirm my intuition that if n >> k, the probability of at least one sequence of at least length k of at least one symbol approaches 1. For example, with m = 4, k = 4, and probabilities for individual symbols of 0.5, 0.25, 0.15, and 0.1, the probability of at least one sequence of at least length k of at least one symbol > 99% if n > 66. It exceeds 50% if n > 10.

My difficulty has been to prove that as k/n approaches 0, the probability approaches 1, let alone say anything about the rate of convergence. I get a pattern that is similar to (but obviously different from) Pascal's triangle, but I have not been able to generalize it.

\(\displaystyle q_1 = (1 - c_0) * r = (1 - 0)r = r.\)

\(\displaystyle c_1 = q_1 + c_0 = r + 0 = r.\)

\(\displaystyle q_2 = (1 - c_1) * r = (1 - r)r = r - r^2.\)

\(\displaystyle c_2 = q_2 + c_1 = r - r^2 + r = 2r - r^2.\)

\(\displaystyle q_3 = (1 - c_2) * r = r - r(2r - r^2) + r^3= r - 2r^2 + r^3.\)

\(\displaystyle c_3 = q_3 + c_2 = r - 2r^2 + r^3 + 2r - r^2 = 3r - 3r^2 + r^3.\)

\(\displaystyle q_4 = (1 - c_3) * r = r - r(3r - 3r^2 + r^3) =\\
r - 3r^2 + 3r^3 - r^4.\)
\(\displaystyle c_4 = q_4 + c_3=\\

r - 3r^2 + 3r^3 - r^4 + 3r - 3r^2 + r^3 = 4r - 6r^2 + 4r^3 - r^4.\)
\(\displaystyle q_5 = (1 - c_4) * r = r - r(4r - 6r^2 + 4r^3 - r^4) =\\

r - 4r^2 + 6r^3 - 4r^4 + r^5.\)
\(\displaystyle c_5 = q_5 + c_4=\\

r - 4r^2 + 6r^3 - 4r^4 + r^5 + 4r - 6r^2 + 4r^3 - r^4 = \\

5r - 10r^2 +10r^3 - 5r^4 + r^5.\)
\(\displaystyle q_6 = (1 - c_5) * r = r - r(5r - 10r^2 +10r^3 - 5r^4 + r^5) =\\

r - 5r^2 + 10r^3 - 10r^4 + 5r^5 - r^6.\)
\(\displaystyle c_6 = q_6 + c_5= r - 5r^2 + 10r^3 - 10r^4 + 5r^5 - r^6 \\

+ 5r - 10r^2 +10r^3 - 5r^4 + r^5 = \\

6r - 15r^2 + 20r^3 - 15r^4 + 6r^5 - r^6.\)
Does anyone see a way to generalize that?
Sounds like you're dealing with a negative binomial distribution to me? Check wikipedia? Also you posted a reply instead of an individual thread?
 

JeffM

Elite Member
Joined
Sep 14, 2012
Messages
5,453
Sounds like you're dealing with a negative binomial distribution to me? Check wikipedia? Also you posted a reply instead of an individual thread?
Yes. I was answering the original question. I suspect I gave the OP what was needed. The rest is more sort of intellectual curiosity. I'll look up negative binomial distribution.
 
Top