Help me understand Entropy Measure formula

As I understand entropy: it means - how random components are in an architecture. More random components you use the architecture becomes complex.
 
Can you ask a more specific question?
In communications, entropy can be interpreted as the minimal number of bits required to encode an average symbol. If a symbol happens rarely then one can spend more bits on coding that symbol (e.g. Morse code). Ideally, an optimal code would spend [imath]\log_2 \frac{1}{f_i}[/imath] on [imath]i[/imath]-th symbol whose probability is [imath]f_i[/imath]. If you weight the symbols' lengths by their probability you get the average number of bits per symbol as [imath]\sum f_i \log_2 \frac{1}{f_i}[/imath].

An example: 4 symbols with two different distributions:
1) When each symbol has the same frequency 1/4 we have to use 2 bits per symbol,
2) but when the symbols' frequencies are 1/2, 1/4, 1/8 and 1/8 we can use 1, 2, 3 and 3 bits per symbol respectively which, on average, gives us 1.75 bits per symbol.

Hope this did not add to your confusion :)
 
As I understand entropy: it means - how random components are in an architecture. More random components you use the architecture becomes complex.
Informally, entropy is the average level of information gained from some possible outcomes.

The core idea of entropy is as follows:
  • If a highly likely event occurs, the message carries very little information.
  • Whereas, if a highly unlikely event occurs, the message is much more informative
For example, you have some knowledge about which days are NOT going to rain for the next 365 days. This provides a little information about which day it's going to rain exactly, but highly probable. On the other hand, if you know exactly the day that it's going to rain, this is highly informational, but it communicates a very low probability.

All that is to say information value and probability are inversely related. To describe the relationship, we say informational value, [imath]I_i[/imath]:
[math]I_i= \log\left({\frac{1}{f_i}}\right) ; \text{where } f_i = \frac{x_i}{\sum x_i }\text{ the probability of event } i[/math]
The reason we use [imath]\log[/imath] is because when [imath]f_i = 1[/imath], the surprisal of the event has no value [imath]I_i = 0[/imath] and when the probability decreases to 0, [imath]I_i[/imath] increases .

Formally, the entropy measure is defined as the weighted average of all informational values gained:
[math]\text{Entropy} = \sum I_i\cdot f_i= \sum \log\left(\frac{1}{f_i}\right) \cdot f_i[/math][math]\text{ where } f_i \text{ are the weights}[/math]
Understanding it on a fundamental level helps you see how it can be applied in different contexts like the example @blamocur gave, enterprise architecture in your PowerPoint, decision tree in machine learning, ect...
 
Last edited:
The reason we use log⁡\loglog is because when fi=1f_i = 1fi=1, the surprisal of the event has no value Ii=0I_i = 0Ii=0 and when the probability decreases to 0, IiI_iIi increases .
I saw an explanation (here, among other places) which I like more: when events are independent their total amount of information is the sum of individual amounts.
 
I saw an explanation (here, among other places) which I like more: when events are independent their total amount of information is the sum of individual amounts.
Yes, but it also satisfies conditions (1) and (2) in the link, which makes the log a perfect candidate.
 
Yes, but it also satisfies conditions (1) and (2) in the link, which makes the log a perfect candidate.
Not to be a bore, but: since [imath]0\leq p \leq 1[/imath] and [imath]I(p) \geq 0[/imath] then (1) and (2) follow from (3)
 
Top