In Google File System,hotspots haven't been a major issue because our applications mostly read large multi chunk files sequentially. what it mean?

shivajikobardan

Junior Member
Joined
Nov 1, 2021
Messages
107
hotspot-: region of computer program where a high proportion of executed instructions occur

Lazy space allocation-:https://stackoverflow.com/questions/18109582/what-is-lazy-space-allocation-in-google-file-system

With lazy space allocation, the physical allocation of space is delayed as long as possible, until data at the size of the chunk size (in GFS's case, 64 MB according the 2003 paper) is accumulated.
Large chunk size in GFS-:
=>A large chunk size, even with lazy space allocation has its disadvantages.
=> A small file consists of a small number of chunks, perhaps just one.
=> The chunkservers storing those chunks may become hot spots if many clients are accessing the same file.
=> In practice hotspots haven't been a major issue because our applications mostly read large multi chunk files sequentially.
I don't understand how hotspots are no issue when we read large multi chunk files sequentially. They say hotspots are issue if clients are accessing same small file(file of just 1 chunk).

I will represent scenario where small file=small no. of chunks is being accesed by multiple clients.


it makes sense why chunkservers will be hotspot in this case as they will be active if they are being accessed by multiple clients.
but it absolutely doesn't make sense when the research paper say " In practice hotspots haven't been a major issue because our applications mostly read large multi chunk files sequentially." What's the difference. If I imagine a scenario like above, here file is made up of multiple chunks and rest is same, what difference is made here?
 
In practice hotspots haven't been a major issue because our applications mostly read large multi chunk files sequentially.
Scenario A: read large number of small files in random order
Scenario B: read small number of large files sequentially

In which scenario do file servers do less work?
 
Scenario A: read large number of small files in random order

Here small chunks need to read randomly
=> here locations of each chunk need to be given by server.

Scenario B: read small number of large files sequentially
here just sequentially to be read, so only once server gives location..so...but how's that relevant to the question though?iwtl.
 
Top