In visual recognition, the task is to identify and localize all objects of interest in the input image. With the ubiquitous presence of visual data in modern days, the role of object recognition algorithms is becoming more significant than ever and ranges from autonomous driving to computer-aided diagnosis in medicine. Current models for visual recognition are dominated by models based on Convolutional Neural Networks (CNNs), which achieve impressive performance on many benchmarks. However, when deployed to the real world, the performance of these CNN models can drop drastically, lacking the desired robustness property. This is because of the so-called distributional shift, where the test-time data differ from data observed during training, and it poses one of the most important challenges in modern machine learning. At the same time, modern CNN-based models may be too expensive or too slow for general deployment. As such, the goal of this thesis is to develop robust and efficient models for visual object recognition. In the experimental section, the focus is on autonomous driving because of the datasets' availability, and also because the aforementioned problems are essential for autonomous driving. The next goal was to understand the impact of model compression methods is on model accuracy. Model compression works by removing some neurons or filters during training, which improves the inference time, without hurting overall accuracy. It was hypothesized that one of the reasons for this is data imbalance, a~compressed model (with smaller capacity) will firstly remove neurons responsible for recognition of less-common data. The experimental section found that using data balancing methods helped to improve the accuracy of some classes.
Authors
Additional information
- Category
- Doktoraty, rozprawy habilitacyjne, nostryfikacje
- Type
- praca doktorska pracowników zatrudnionych w PG oraz studentów studium doktoranckiego
- Language
- angielski
- Publication year
- 2022