Effective management of surface water resources necessitates the use of robust hydrological models. This is especially true in regions that are experiencing a growing population, an influx of water-intensive industries, have large areas under agriculture, and are likely to experience changes in the hydrological regimes because of climate change. However, comprehensive model inter-comparison studies that quantify disparities in the modeling and predicting of surface water availability are lacking. In this study, we assess daily to monthly scale runoff and streamflow predictions at fine-watershed and grid scale for the Southeastern US using three different hydrological models: VIC, SWAT, and PCR-GLOBWB. We calibrate and test the models to include anthropogenic water abstraction and return flow, ensuring minimum environmental flow conditions. The inclusion of water abstractions further improved model performance in some basins. We apply multiple statistical indices to evaluate model performance at different thresholds of simulated streamflow and runoff for each basin. We show that while the models generally show good performance in simulating high and very high flows (streamflow/runoff threshold >50% and >75%), low flow biases differed between models and between catchment characteristics. This study provides an in-depth analysis for selecting the best-performing model, where we show that model performance is linked to both calibration strategy and the underlying process representations.