The article describes a problem of splitting data for k-fold cross-validation, where class proportions must be preserved, with additional constraint that data is divided into groups that cannot be split into different cross-validation sets. This problem often occurs in e.g. medical data processing, where data samples from one patient must be included in the same cross-validation set. As this problem is NP-complete, a heuristic anytime polynomial algorithm is proposed and described in the article. Also, it is experimentally compared to two other, simpler algorithms.
Authors
Additional information
- Category
- Publikacja monograficzna
- Type
- rozdział, artykuł w książce - dziele zbiorowym /podręczniku w języku o zasięgu międzynarodowym
- Language
- angielski
- Publication year
- 2014