GPU accelerators have become essential to the recent advance in computational power of high- performance computing (HPC) systems. Current HPC systems’ reaching an approximately 20–30 mega-watt power demand has resulted in increasing CO2 emissions, energy costs and necessitate increasingly complex cooling systems. This is a very real challenge. To address this, new mechanisms of software power control could be employed. In this paper, a dynamic new method of limiting software power is introduced on one of the latest NVIDIA GPUs: a software tool called the Dynamic Energy- Performance Optimiser (DEPO). DEPO minimizes the energy consumption of the CUDA based GPU workloads, with respect to one of the three given metrics: minimum of energy (E), Energy-Delay product (EDP) and Energy-Delay sum (EDS). The tool gathers power measurements from NVIDIA Management Library (NVML). Measuring the application progress at runtime is based on CUDA Profiling Tools Interface (CUPTI) kernel-counting. We have evaluated the DEPO tool on the NVIDIA RTX A4500 and A100 GPUs with machine learning workloads. Depending on the application (training of neural networks: Resnet152, Densenet161, VGG- 19 or a GEMM benchmark) for the E target metric, we were able to obtain energy savings exceeding 22% for both NVIDIA A100 and RTX A4500 GPUs while the performance drop has never been higher than 20%. Using one of the bi-objective EDP or EDS metrics allowed finding configurations resulting in 15% or 18% of energy saved with only 8% of performance loss. For most of the experiments the percentage-wise performance penalty is lower than the energy savings. This demonstrates its potential for energy consumption reduction in HPC systems with GPU accelerators.
Autorzy
Informacje dodatkowe
- DOI
- Cyfrowy identyfikator dokumentu elektronicznego link otwiera się w nowej karcie 10.1016/j.future.2023.03.041
- Kategoria
- Publikacja w czasopiśmie
- Typ
- artykuły w czasopismach
- Język
- angielski
- Rok wydania
- 2023