How to find parallel efficiency of word count algorithm?

shivajikobardan

Junior Member
Joined
Nov 1, 2021
Messages
107
Background:

-V78SAp5pxdpPLk24eK-SboC901wNYwjN7nxe_zQf3w2YtCy0dFkCJeiRM5tUfPZycXWrOfyyxNadlfXdXVx0EUrFWwVvPr-cPR0Q427hlR8bkn-aSlcozVxDztchvXGNBxigDxuZm5Y7g8kWslkKEQ

NwXFTvguP6UJt-5YuNoiHQj99-BXh9wBU38crJ1AIczQEx43KwDyfRc3KdKrbVw6HD8dVRvvaNYU5Yyl5M4VMzqDPEaifroFKshNrU6WaPtNRztNvUBYXvtR7VoExnU3PeeT9kwBHMEeJ9pEeGOrulU

fx0_9zHACNQt_6IR_9YZHS1-08xc3YsfU-C6MN80ALlS9hRuQaoEyLuxYGg88mqVwRgGZfXP8pYNCt2cK74CqABeRYALZ5Uur4bDNM3hzbZ4XowQxcRyp9TJqZqS4B8pzd81vWoY-6GzToRHJ0GJnKo

Problem-:

Consider a very large collection of documents, say web pages crawled from the entire internet. The problem is to determine the frequency (i.e., total number of occurrences) of each word in this collection. Thus, if there are n documents and m distinct words, we wish to determine m frequencies, one for each word.

Method followed:

Now we compare two approaches to compute these frequencies in parallel using p
Processors:
(a) let each processor compute the frequencies for m/p words and

(b) let each processor compute the frequencies of m words across n/p documents, followed by all the processors summing their results.

Assumptions:

We assume a distributed-memory model with a shared disk, so that each processor is able to access any document from disk in parallel with no contention. Further we assume that the time spent c for reading each word in the document is the same as that of sending it to another processor via interprocessor communication. On the other hand, the time to add to a running
total of frequencies is negligible as compared to the time spent on a disk read or interprocessor communication, so we ignore the time taken for arithmetic additions in our analysis. Finally, assume that each word occurs f times in a document, on average.

Solution:

Let us take 4 documents-:

D1: apple is good.
D2: doctor likes apple.
D3: apple keeps doctor away.
D4: doctor sorry apple.

So, here are 4 documents, the value of n=4.

Here are 7 distinct words, the value of m=7.

Assume 7 processors, so p=7.

Serial time for computing frequencies of each word is?

Mine answer-:


n documents*m words*f words/document* c read_time/(word)
=nmfc words*read_time

? why are the units like this? It should be just read_time in units I think.


Book's answer-:

67RUOu4lJubJlTCPSEeFDLxqFsagocoER0OOjUi_zStgSg3emFXCzgUVfP9yk5nfmmQf46S7eMXIVhZb1No99XNyGr7PIu6BQhT0SWcUQ3wIzA9gYmtDxZ53OOoNnIfYg-NNRyq0K16BLW8XhqxuX8U


Now, to calculate efficiency, we also need to calculate parallel time for case a).

Mine answer-:

efficiency=serial time/(p*parallel time)

= nmfc/(p*parallel time)

Parallel time=?

=m/p words * n documents * c read_time/word * f words/document

= nmfc/p word*read_time

So efficiency=1 ......................................................................(WRONG as per the book)


Book's answer-:
fRKbDY40d-rAah5BWIYulr-KYvZcC3Z9wwKgka3fUcL_vv9e1L-98lXEtQzeZ80F1Gtpc4WzOhxUegYtOtIAWgox8aC36P-ZDsvU7-Le6pvpI14tFskJwoo1xDpe-u0quxi-hiEyqadr8XQT9J5ZsYg


I don't understand a single word after this. It all went out of my head.

Why're we adding 2 different values except nmfc(unlike above case a) here? I don't get it.
zU05jSGKRtUYfJxg3ov4EqbanIbLqli5yp1-tek3TNgiXeZBBeQkHPqg4PJrrxqVVc0ko36xStnaNUDt3ZN-rCQd8EYenFWhMDm7VS_f4faauirex0mzVR7hREArlTxlQVs859recVYMMsvNbJVsQkU

4_KlWCnUY2NWE3AMpBwqd9VZANFN5iZoqzsgi9gXP-2HytZw0ohZdBBSP-GQHMiPuPiRQXiU4yC1CDfyBjfnEU2pOkRl0NF9hKjKUmm-u4S7UJjxpa9Ny1A4U281t3YPYdWjr8dNQXGXF_aGV9fzLzU

Finding whether it's scalable or not is easy as you can just look at the initial guidelines, and see the answer and determine.
 
Is there a question in the preceding?

If so, what is it?

What have you done so far!
 
You are correct. This example is obscurely expressed.

What is going on is that some very detailed and lengthy computations are being approximated by a number of of unstated simplifying assumptions that are very close if the numbers involved are very large. You will never grasp it dealing with small numbers as you tried to do.

It takes time to coordinate multiple computers. We can call this overhead.

The first obscurity is

[math]\epsilon = \dfrac{\text {time without overhead}}{\text {number of processors} \times \text {time with overhead}}[/math]
is just the reciprocal of the number of processors times a dimensionless ratio. The dimension of time is lost. However, what the text actually uses is the ratio of times. There may be some benefit to the measure of parallel efficiency, but the example does not appear to show it,

The second obscurity is the way the math is done. Let’s do the example differently.

Let n = number of documents in the set, w = the number of words in the set, p = the number of processors, T = total time, t = the time for the slowest processor to do its part of the task, a = the time to read a document, b = the time to determine whether a word is distinct, c = the time to update the processor’s vector, d(p) = overhead with p processors. We then we get this

[math]T = t + d(p) \approx t \text { if } d(p) << pt[/math]
If we use 1 processor, d(p) is obviously 0, and

[math]T_1 = na + bw + cw + 0 = na + w(b + c)[/math]
Now lets look at method A on the assumption that d(p) is negligible and that each processor is equally fast because each has only the average amount of work to be done. Remember, method A requires each processor to read every document, but, on average, to process only 1/p of the words in a document. On average, each document has w/n words.

[math]T_A = n \left ( a + (b + c) * \dfrac{w}{n} * \dfrac{1}{p} \right ) + d(p) \approx an + \dfrac{w(b + c)}{p}[/math]
Let’s look at the ratio of the times letting y = w(b + c) and x = na

[math]\dfrac{T_A}{T_1} = \dfrac{x + y \div p}{x + y} = \dfrac{px + y}{px + py} = \dfrac{px + py + y - py}{px + py} = 1 - \dfrac{p - 1}{p} * \dfrac{y}{x + y}.[/math]
Now it is obvious from inspection that if p > 1, this ratio is less than 1. So long as d(p) is negligible, we get an improvement in time by using more than one processor.

However, T_A increases if we double n, w, and p. Method A is not scaleable.

Clear?

Let’s look at method B.

Each processor reads n/p documents, but then processes all the words in the document. On average, each document will have w/n words. Ignoring d(p) as negligible, we get

[math]T_B \approx \dfrac{n}{p} * \left (a + (b + c) * \dfrac{w}{n} \right ) = \dfrac{an + w(b + c)}{p}[/math]
Now it is clear that if we double n, w, and p, the time stays the same. At least in this respect, method B is scaleable.

What about the ratio of times?

[math]\dfrac{T_B}{T_1} \approx \dfrac{an + w(b + c)}{p} \div \{na + w(b + c)\} = \dfrac{1}{p}[/math]
Provided p > 1, method B is faster than one processor.

There may be more than this in the text, but this is what I got from it. The purpose of epsilon may become clear later on. This is not my field. I studied history and languages. If this is not helpful, there probably is a stack exchange that can help.
 
Top