distribution of same-bit-sequences in quantity of n bits, where n/2=0s and n/2=1s

isuckatprobability

New member
Joined
May 15, 2022
Messages
1
Hello,
I'm trying to make sense of groupings in a (theoretical) bit-sequence, that is encoded so that it is:
1) scrambled into pseudo random code
2) 8b/10b encoded, so that on average the number of 1s and 0s should be the same (i.e. maps 8bit word onto 10bits to even out the number of 1s and 0s over several words)

What I want to know:
What is the distribution of same-bit-sequences. So, if I measure n bits, where n/2 is 0s and n/2 is 1s: How many would be in groups of 1, 2, 3, 4 etc.? The actual value, 1 or 0, doesn't matter, only the groupings.
In other words: If I have a quantity of n bits what percentage of these n bits are:
1) bi, so that bi != bi-1 and bi!=bi+1, sequence of ONLY one same value
2) bi. so that (bi=bi+1 and bi != bi-1) or (bi = bi-1 and bi != bi+1), sequence of ONLY two same values
3) etc
I'm pretty sure it should be Gaussian, shouldn't it? But how do I bring this question in a form that I can put it together in formulas? My brain is fried, I just can't wrap my head around it and the harder I try the less I get it.
Telling me what kind of problem this is, so I can look up how to solve it would be appreciated also.

It is not completely random, i.e. always the same probabilty of getting a 1 or 0, because it depends on how many 1s or 0s have been through before. After a sequence of several 1s the probability of a 0 should be higher.
Of course if I sample for an infinitely long time, I should have a perfect Gaussian distribution, but if I am only looking at n bits and want to describe how many groups would be 1 bit wide, how many groups would be 2bits wide etc.

Background: The signal of the sequence is made up of positive and negative rectangular pulses with double data, so the main frequency should be half the datarate, if it was sending alternating bits. But the main frequency component of a pulse, that's two bits wide would be datarate/4. So the overall spectrum would be the sum of the spectrum of all pulses, but I don't know how many "wide" pulses there would be.

Sorry about the ugly formulas further up, I'll fix that later, when I can look up how to to that in Latex.

Cheers,
isuckatprobability :)
 
Top