With the advent of depth enabled sensors and increasing needs in surveillance systems, we propose a novel framework to detect fine-grained human attributes (e.g., having backpack, talking on cell phone, wearing glasses) in the surveillance environments. Traditional detection and recognition methods generally suffer from the problems such as variations in lighting conditions, poses, and viewpoints of object instances. To tackle these problems, we propose a multi-view part-based attribute detecting system based on color-depth inputs instead of purely utilizing color images. We address several important attributes in the surveillance environments and train multiple attribute classifiers based on features inferred from 3D information to construct our discriminative model. Several state-of-the-art methods are compared and the experimental results show that our method is more robust under large variations in surveillance conditions.