Thank you for the feedback!
Overall, our readings are pretty close, plus or minus. The GPU load is generated by shaders entirely within the GPU alone by its internal microcode regardless of CPU bitness. So, I don't think it will depend noticeably on whether the parent app is 32 or 64 bits, interpreted or natively compiled. The number of parallel processors in the GPU, video data bus width/throughput, and VRAM technology are more important factors that may differ from chip to chip, especially between different generations.
OTOH the other GPU brands like ATi or Intel may yield a yet much more significant spread due to more fundamental differences in their GPU architecture and instruction sets.