Deep learning models are being utilised increasingly within finance. Given the models are opaque in nature and are now being deployed for internal and consumer facing decisions, there are increasing concerns around the trustworthiness of their results. We test the stability of predictions and explanations of different deep learning models, which differ between each other only via subtle changes to model settings, with each model trained over the same data. Our results show that the models produce similar predictions but different explanations, even when the differences in model architecture are due to arbitrary factors like random seeds. We compare this behaviour with traditional, interpretable, ‘glass-box models‘, which show similar accuracies while maintaining stable explanations and predictions. Finally, we show a methodology based on network analysis to compare deep learning models. Our analysis has implications for the adoption and risk management of future deep learning models by regulated institutions.