Recently, smart phones not only perform the basic communication function but also become the first choice in information collection. For instance, when smartphone users want to obtain relevant information about the products on the shelf, all they have to do is take a snapshot and send it back to the server. In order to save time and effort for the users, it is important to retrieve information as many as possible from one shot. Thus, multiple object recognition and localization over large-scale object classes (database) is the first bottleneck to break through. To tackle this issue, we propose a bottom up search-based approach, which localizes the grid-based search candidates in Markov Random Field (MRF). The proposed approach enables simultaneously recognizing and localizing multiple objects; therefore, it reduces response time and ensures the accuracy as well. Experimental results show that the proposed method can have 40% relative improvement over the state-of-the-art bag-of-words model. We also demonstrate the proposed method in two datasets and show that our method can have good improvement in running time (5 times faster), and also competitive accuracy for multi-object recognition and localization.