The use of Graphics Processing Units (GPUs) in supercomputing is motivated by a desire for greater processing at lower power costs. We will review data movement within the hierarchy of supercomputing memory and within GPUs. Challenges will be addressed in terms of latencies and bandwidths of various modes of data movement and what that means for staging data, overlapping / hiding transfers, clustering computation, and FLoating-point OPerationS (FLOPS) per data fetch to extract efficiency from GPUs. Some examples from CUDA will inform how to access memory, thread efficiently, and use fast, local memory reservoirs. Algorithmic changes for GPU efficiency will be addressed. The presentation ends with a survey of successful science performed on GPUs.