Green infrastructure (GI) has great potential for managing urban and transportation stormwater and is increasingly used by watershed managers, but GI designs are still developing, and many different design options are available. Performance monitoring data to guide GI design and assess effectiveness are challenging to collect and sparse, especially for multiple design types within the same study area. Furthermore, using these data to evaluate GI performance can be done in many ways, with different managers using different metrics for evaluation. In this study, the year-round field performance of four roadside GI practices (bioretention, grass channel [GC], compost-amended grass channel [CAGC], and bioswale) in Northern Virginia were monitored to investigate the effectiveness of different designs. Stormwater runoff volumes and 12 water quality parameters (TSS, TDN, NO3-, TP, PO43-, DOC, Cr, Ni, Cu, Zn, Cd, and Pb) were measured year-round at the inlets and outlets of four GI practices during 27 storm events over 14 months. The four GI designs displayed a wide range of performances, with three out of four acting as pollutant sinks on average but one (CAGC) as a pollutant source. Different performance metrics favored different GI, but by nearly all metrics, the bioretention and grass channel outperformed the CAGC and bioswale. Results indicate that GI design differences (e.g., compost amendment) can significantly alter GI performance and that performance evaluation can vary based on water quality parameter monitored and evaluation metric used. Variability in GI performance highlights the need for careful design and consideration of multiple performance metrics to ensure effective stormwater management and avoid unintended performance.