CheXNet with COVID-19 (reproduce-chexnet-covid)

This project is an exploration of adding COVID-19 to an existing multilabel classifier like CheXNet. Review and read the GitHub repository here.

In 2020, the whole world was affected by the COVID-19 pandemic. Economies stopped. Industries switched gears to producing supplies needed to reduce the spread and provide materials needed for care. There were a lot of research made to understand the disease that year with many focusing on differentiating COVID-19 from pneumonia in chest x-rays. With numerous new strains of COVID-19 spreading, it makes sense for COVID-19 to be incorporated in multilabel thoracic pathology classifiers.

Model and Methods

CheXNet¹ is used as the basis of the multilabel classifier. Many implementations exist, but John Zech’s implementation is used for this project. You can read more about it here.

CheXNet is a 121-layer Dense Convolutional Neural Network (DenseNet). The final fully connected layer is modified to have a single output, then a sigmoid nonlinearity is applied. It also uses a weighted binary cross entropy loss function and Adam optimizer with standard parameters. It uses images from ChestX-ray14, also found from the National Institute of Health (NIH).

Images are downscaled to 224 x 224 then normalized using the same mean and standard deviation of the pretrained ImageNet from Deng, et. al., 2009². Training also incorporates random horizontal image flipping.

You can read more about the original CheXNet paper by Rajpurkar, et. al., 2017 here.

Implementation differences from the original paper:

Results and Discussion

The AUC for all fourteen original labels decreased marginally. However, COVID-19’s AUC were practically perfect. Table I shows the comparison of the AUC between the original paper, John Zech’s implementation, and the expanded CheXNet (including COVID-19).

Table1.png

Table II shows the small improvement when batch size is decreased from 32 to 16.

Table2.png

In the GitHub repository, a python notebook is supplied to review the Class Activated Mappings (CAM) heat map overlaid on the images. Below are some examples.

The near perfect COVID-19 AUC is suspicious, though it may be related to the fact that the added dataset only contains one label unlike the images in the NIH dataset where multiple labels often exist in a single chest x-ray image. It is possible that there are comorbidities present in COVID-19 positive chest x-rays which were not indicated in the aggregated Kaggle dataset while the rest of the original dataset from NIH have multiple labels present in each image.

Future Work

The dataset used to add COVID-19 as a label should be re-assessed by radiologists to determine if any of the other fourteen thoracic pathologies present in the ChestX-ray14 dataset are present in the COVID-19 positive x-rays.

References

[1] - P. Rajpurkar, J. Irvin, K. Zhu, B. Yang, H. Mehta, T. Duan, D. Ding, A. Bagul, C. Langlotz, K. Shpanskaya, M. P. Lungren, and A. Y. Ng. Chexnet: Radiologist-level pneumonia detection on chest x-rays with deep learning. arXiv preprint arXiv:1711.05225, 2017.

[2] - Deng, Jia, Dong, Wei, Socher, Richard, Li, Li-Jia, Li,Kai, and Fei-Fei, Li. Imagenet: A large-scale hier-archical image database. InComputer Vision andPattern Recognition, 2009. CVPR 2009. IEEE Con-ference on, pp. 248–255. IEEE, 2009.