Extreme events like floods remain a global challenge nowadays, causing loss of life and extensive property loss. While data-driven and machine-learning models showed some advantage in flood depth prediction, their accuracy hinges on extensive datasets, which are often lacking in certain basins due to insufficient monitoring gauges or limited historical data. To address this critical issue, recent studies have employed transfer learning to enhance spatial predictions of maximum flood depth using 'unseen' and scarce data. However, the application of transfer learning to capture temporal dependencies in flood depth data has yet to be explored. This study aims to bridge this gap by investigating the use of data-driven (e.g., long short-term memory and/or convolutional neural network) and transfer learning models to transfer spatiotemporal knowledge from data-rich basins to those facing data scarcity, with a focus on dynamic water level prediction. We categorize the model’s input into two types: spatial data (e.g., flow direction and slope) and dynamic data (i.e., water level time series). We employ the spatial connections among upstream stations, their topographic attributes, and historical water level data to predict water levels at downstream stations across multiple time intervals in data-rich basins for different precipitation events. Subsequently, we utilize the well-trained model to transfer this knowledge to the data-scarce basin. Three catchments within the Delaware River Basin in the Northeast US have been selected as the study area, with two catchments as data-rich basins and one representing a data-scarce basin.