When it comes to setting an Incredibuild grid, there is always the following question:
“What would be best - setting less but powerful Helpers or more but less powerful Helpers?”
The answer, as most thing in life is “depends” so lets see the cons and pros for both ways assuming that your goal is to have 128 Helper cores.
Less powerful Helpers (8 cores):
In that case, you will have 16 Helpers which in total provide 128 cores. The benefit of this setting is that you will have better problem tolerance in case one or two of the Helpers will go down. The impact will be less seen since you lost 16 cores out of 128.
More powerful Helpers (64 cores):
In that case, 2 Helpers will be enough to cover the need for 120 cores. The benefit here is that a lot of tasks will run on one Helper and performance wise it might be better since 64 core machine is in general more powerful then a 8 core machine. Also, there are fewer machines to maintain so less headache there.
The down side is the impact on the build when one machine goes down. In that case, you will loose 64 out of 128 cores which is 50% of your grid. Therefore, from the tolerance perspective, this setup is less recommend. However, from performance wise - this might result in overall better performance.
Another aspect to consider is the Initiator machine spec/configuration. Note that the Initiator machine works the hardest since on top of executing the tasks it also controls the distribution and synchronization. Therefore, if you have a large grid (more than 150 remote cores) - make sure that the Initiator has at least 8 logical cores. If your dedicated Initiator has less cores, you should use the “Avoid tasks execution on local machine when possible” setting (Agent Settings → Initiator → General) so the Initiator will execute only the tasks that cant be distributed (such as Links) and will be more available to handle the grid.
One symptom of a highly loaded Initiator which might need the above setting is that the remote bars will be much longer then the local ones:
As can be seen above, the stdafx task is much longer on remote machines and one of the reasons (in case there is no network issue) is that the synchronization takes very long time due to a very busy Initiator.
If you have any questions, feel free to post them here or approach me directly