Probabilities concerning a sequence of numbers.

daveylibra · Dec 15, 2019

Consider a random sequence, length n, of 0s and 1s, eg ( 0,0,1,1,0,1,1,1,0,0,1,0,1,0,1,1,1,1,1,0,0,..... )

Let us denote a sequence of one 0 or 1, as S1. A sequence of two 0s or 1s as S2 etc.
I wish to prove that S1 > (S2+S3+S4+...)
ie, the total number of S1 will probably be more than the total number of all other sequences, S1+S2+S3 etc.

Further, each sequence starts after a "switch": ...0,1 or ...1,0. If we bet after a "switch" for an S1 to form, ie ...0,1,0 or ...1,0,1
prove that we have an advantage over randomly betting.

In trying to solve this, I read that S1 = n/4 + 1/2 and that S1+S2+S3.... = n/4 but I am struggling to prove this.
I have also wrote a computer simulation that backs this up for large n, but cannot find mathematical proof, so any help appreciated!

Steven G · Dec 15, 2019

daveylibra said:
Consider a random sequence, length n, of 0s and 1s, eg ( 0,0,1,1,0,1,1,1,0,0,1,0,1,0,1,1,1,1,1,0,0,....., 0 or 1 ) I understand this

Let us denote a sequence of one 0 or 1, as S1. A sequence of two 0s or 1s as S2 etc. I understand this, but it is not well defined! Suppose you have (0001111), is this S3 or S4??

I wish to prove that S1 > (S2+S3+S4+...) S1
ie, the total number of S1 will probably be more than the total number of all other sequences, S1+S2+S3 etc. This is nonsense as you change your meaning of S1. Now S1 means the total number of S1! Try using a different symbol like S'1!!

Further, each sequence starts after a "switch": ...0,1 or ...1,0. If we bet after a "switch" for an S1 to form, ie ...0,1,0 or ...1,0,1
prove that we have an advantage over randomly betting.

In trying to solve this, I read that S1 = n/4 + 1/2 and that S1+S2+S3.... = n/4 but I am struggling to prove this.
I have also wrote a computer simulation that backs this up for large n, but cannot find mathematical proof, so any help appreciated!

See my comments above.

Dr.Peterson · Dec 15, 2019

daveylibra said:
Consider a random sequence, length n, of 0s and 1s, eg ( 0,0,1,1,0,1,1,1,0,0,1,0,1,0,1,1,1,1,1,0,0,..... )

Let us denote a sequence of one 0 or 1, as S1. A sequence of two 0s or 1s as S2 etc.
I wish to prove that S1 > (S2+S3+S4+...)
ie, the total number of S1 will probably be more than the total number of all other sequences, S1+S2+S3 etc.

Further, each sequence starts after a "switch": ...0,1 or ...1,0. If we bet after a "switch" for an S1 to form, ie ...0,1,0 or ...1,0,1
prove that we have an advantage over randomly betting.

In trying to solve this, I read that S1 = n/4 + 1/2 and that S1+S2+S3.... = n/4 but I am struggling to prove this.
I have also wrote a computer simulation that backs this up for large n, but cannot find mathematical proof, so any help appreciated!

This is very unclear, in part because you seem to be making up your own terminology.

Trying to make sense of it, I think maybe Sn is supposed to be the number of maximal subsequences consisting of n consecutive identical digits. For example, in 0,0,1,1,0,1,1,1,0,0,1,0,1,0,1,1,1,1,1,0,0, S1 = 5, S2 = 4, S3 = 1, and S5 = 1. (So here S1 is not greater than S2+S3+S4+S5.)

But then, you indicate an infinite sequence, so these would all be expected to be infinite, really. And is this supposed to be true for every such sequence, or something probabilistic? And, then what bets are you comparing?

Please tell us what you really want to do, with complete examples. Maybe showing us where you read your formula, and telling us what your simulation actually simulates, would help.

daveylibra · Dec 16, 2019

Hi Jomo - I really did try to be as clear as possible. (0001111) here you have one S3 then one S4. It doesn't matter if the sequence consists of 0s
or 1s, its the length that matters.
Yes you are right the meaning of S1 etc is vague, I did notice this and should have corrected it.
How about T1 = total number of S1, T2 = total number of S2 etc.
Then, T1 will probably be greater than (T2 + T3 + T4 + ... )

daveylibra · Dec 16, 2019

Hi Dr Peterson -
To quote you - For example, in 0,0,1,1,0,1,1,1,0,0,1,0,1,0,1,1,1,1,1,0,0, S1 = 5, S2 = 4, S3 = 1, and S5 = 1. (So here S1 is not greater than S2+S3+S4+S5.)

Yes this is correct. But, I am not indicating a an infinite sequence. In my testing I was using n=10,000.
It also would not necessarily be true for every sequence.
To use my own terminology, "the total number of S1 will probably be greater than the total number of all other sequences."
I became interested in these sequences after reading a book by John Allen Paulos.
Let me denote total number of S1 as T1, total number of S2 as T2 etc.
I was told by someone on another forum some time ago that the average for T1 = n/4+ 1/2, and average for (T1+T2+T3+....) = n/2 +1/2.
So, I assume, the average for (T2+T3+T4...) = n/4
This is actually what I am trying to prove mathematically, unfortunately I'm not in contact with the original poster.
My computer simulation simply generates a random sequence of 0s and 1s and counts the S1, S2 etc.
Due to variance, and computers generating psuedo-random numbers, the results are inconclusive.
Hence, I'm trying to prove this mathematically!

Dr.Peterson · Dec 16, 2019

What you indicated (by using ".....") was an infinite sequence; what you meant was not! There's a difference.

We still have to clarify what you mean by "probably". Maybe you mean that the expected value of T1 is greater than that of all others combined (that's probably what you mean by "average"); or you could mean that T1 > T2 +T3 + ... + Tn with probability greater than, say, 1/2, or 95%, or something.

Now the question is, what theorems do you know that you could use to do a proof? How much do you know about random variables and expected value?

daveylibra · Dec 16, 2019

Yes, thats what I mean! - the expected value of T1 is greater than that of all others combined.

I know a series can be represented by summation notation. I know a convergent series is when the partial sums have a limit.
so for a finite n, the series will have a limit? The equation for a convergent series is as below-

Now, expected value. I would instinctively say that a switch from 0 to 1 or vice versa has a probablity of 1/2, as we could have 0,0 or 0,1 or 1,0 or 1,1.
2 from 4 qualify. = 1/2. Another switch is needed to form a S1. Probability is again 1/2. This gives the final probability of 1/4, multiplied by n is n/4.
Now this is obviously not quite correct as the answer is n/4 + 1/2. So this is why I am on this forum!

Dr.Peterson · Dec 16, 2019

You said you are not talking about an infinite sequence. Why do you now bring up infinite series??

What you've listed here are not theorems, and are not about expected values. I'm trying to determine what you can base a proof on (or what sort of hints you would understand), and you aren't providing any help in that area.

It might also help if you could give a specific reference to Paulos so we could be sure what you are thinking of, and to the forum where you got your formula, so we can see what it claims to be an answer to, and what justification, if any, was given.

Just to see how close I am to understanding, I think you are saying that you were told that the expected number of bits in an n-bit string that are different from both the preceding and following bits is n/4 + 1/2. So in a 4-bit string, you expect on average there will be 1.5 "S1"s. In fact, of the 16 such strings, I find the number of S1's to be:

0000 0

0001 1

0010 2

0011 0

0100 2

0101 4

0110 2

0111 1

1000 1

1001 2

1010 4

1011 2

1100 0

1101 2

1110 1

1111 0

total: 24/16 = 1.5

Interesting! I expected the formula to be asymptotic at best, but it appears to be exact.

Please confirm that this agrees with your understanding of what you are asking.

daveylibra · Dec 18, 2019

Hi, yes this does agree with what I am asking.
I'm sorry that I'm struggling to find my previous reference points about this, but as you can tell this is not a textbook question.
Here is a snippet from a computer forum, this was my comment-

"Well, I just generate a a long string of numbers - eg 0,1,1,0,0,0,1,1,1,0,0,1,0,1,0,1,1,1,1,0,0,.........
The string should be random. In this example we have -
One 0. That's a series length 1, then
2 1s. A series length 2, then
3 0s. A series length 3, then
3 1s. A series length 3 etc...

Probability theory says (amount of series length 1) > (amount series of all other lengths). I was just trying to test this out. "

.....and this was the reply.....

"Yeah each series length has half the likely-hood of the previous with the first = 1/2,
so 1/2 + 1/4 + 1/8 + ... = 1 the probability of any length. "

Now, you may say that this is different from my original question. And I admit it was a computer programming forum, not a maths one. But it would be helpful to me if you could confirm whether my formula : the expected number of bits in an n-bit string that are different from both the preceding and following bits is n/4 + 1/2 is correct or not, before I attempt to prove it!

By the way, I notice that 1/2 + 1/4 + 1/8... will approach but never reach 1, and is an infinite series. But I am actually referring to a finite n.
Obviously in this case the series would stop at 1/n.

Dr.Peterson · Dec 18, 2019

daveylibra said:
Hi, yes this does agree with what I am asking.
I'm sorry that I'm struggling to find my previous reference points about this, but as you can tell this is not a textbook question.
Here is a snippet from a computer forum, this was my comment-

"Well, I just generate a a long string of numbers - eg 0,1,1,0,0,0,1,1,1,0,0,1,0,1,0,1,1,1,1,0,0,.........
The string should be random. In this example we have -
One 0. That's a series length 1, then
2 1s. A series length 2, then
3 0s. A series length 3, then
3 1s. A series length 3 etc...

Probability theory says (amount of series length 1) > (amount series of all other lengths). I was just trying to test this out. "

.....and this was the reply.....

"Yeah each series length has half the likely-hood of the previous with the first = 1/2,
so 1/2 + 1/4 + 1/8 + ... = 1 the probability of any length. "

Now, you may say that this is different from my original question. And I admit it was a computer programming forum, not a maths one. But it would be helpful to me if you could confirm whether my formula : the expected number of bits in an n-bit string that are different from both the preceding and following bits is n/4 + 1/2 is correct or not, before I attempt to prove it!

By the way, I notice that 1/2 + 1/4 + 1/8... will approach but never reach 1, and is an infinite series. But I am actually referring to a finite n.
Obviously in this case the series would stop at 1/n.

I don't yet have a proof, so I can't yet say whether it is true.

I can comment that your claim that "Probability theory says (amount of series length 1) > (amount series of all other lengths)" makes no sense taken literally, since probability says nothing about what will happen in any specific case; changing the question to asking about expected number of runs of a given length has fixed that. (And, of course, your writing your string with an ellipsis at the end implied it was infinite, which is not what you intended. And the answer you were given is irrelevant. I presume whoever gave you n/4 + 1/2 did some more careful thinking.)

I'll start thinking about proving it, now that I think I know what you are asking for. I hope someone else will, too.

daveylibra · Dec 19, 2019

Thank you, I'm working on it too. I think I'm making progress.

Dr.Peterson · Dec 19, 2019

Just as an update, I have managed to prove the formula, by finding a recursion for the total number of isolated digits among the 2^n digit strings. I think a similar method should yield the expected number of all larger groups of like digits combined. I'm sure there are other ways, too.

daveylibra · Dec 19, 2019

Well done for proving the formula! Unfortunately I am still struggling, any hints at all?

Dr.Peterson · Dec 19, 2019

I gave you my hint: recursion.

Specifically, look back at my list for n=4 in post #8, and think about how you could construct that from the list for n=3, or use it to get the list for n=5. This gives a recursive formula for this function T(n) [e.g. T(4) = 24], and then the expected value is T(n)/2^n. You can work backward from the formula we are seeking for the expected value, n/4 + 1/2 = (n+2)/4 to a formula for T(n) to see if your work is headed in the right direction. It turns out that this formula applies only for n>1.

I am hoping you or someone else will have a completely different derivation!

Probabilities concerning a sequence of numbers.

daveylibra

New member

Steven G

Elite Member

Dr.Peterson

Elite Member

daveylibra

New member

daveylibra

New member

Dr.Peterson

Elite Member

daveylibra

New member

Dr.Peterson

Elite Member

daveylibra

New member

Dr.Peterson

Elite Member

daveylibra

New member

Dr.Peterson

Elite Member

daveylibra

New member

Dr.Peterson

Elite Member