The paper presents architecture and design of three versions for fail-safe data storage in a distributed cache using NVRAM in cluster nodes. In the first one, cache consistency is assured through additional buffering write requests. The second one is based on additional write log managers running on different nodes. The third one benefits from synchronization with a Parallel File System (PFS) for saving data into a new file which allows to keep file history at the cost of space. We have shown that the three level fail-safe mode incorporating these versions does introduce minimal overhead for a random walk microbenchmark application for a 1GB file and checkpoints created every 2000 iterations, computing powers of a graph with 10000 vertices and up to 20% overhead for parallel processing of images up to 1000 megapixels compared to the basic NVRAM cache without fail-safe modes. We also presented times for checkpoint creation and restoring for sizes up to 10GBs.
Autorzy
Informacje dodatkowe
- DOI
- Cyfrowy identyfikator dokumentu elektronicznego link otwiera się w nowej karcie 10.1016/j.procs.2018.08.237
- Kategoria
- Publikacja w czasopiśmie
- Typ
- publikacja w in. zagranicznym czasopiśmie naukowym (tylko język obcy)
- Język
- angielski
- Rok wydania
- 2018
Źródło danych: MOSTWiedzy.pl - publikacja "Three levels of fail-safe mode in MPI I/O NVRAM distributed cache" link otwiera się w nowej karcie