Query-based Configuration of Text Retrieval Solutions for Software Engineering Tasks (ESEC/FSE 2015)

by Laura Moreno, Gabriele Bavota, Sonia Haiduc, Massimiliano Di Penta, Rocco Oliveto, Barbara Russo, Andrian Marcus

Text Retrieval (TR) approaches have been used to leverage the textual information contained in software artifacts to address a multitude of software engineering tasks. However, TR approaches need to be configured properly in order to lead to good results. Current approaches for automatic TR configuration in SE configure a single TR approach and then use it for all possible queries that can be formulated. In this paper, we show that such a configuration strategy leads to suboptimal results and propose quest, the first approach bringing TR configuration selection to the query level. quest recommends the best TR configuration for a given query, based on a supervised learning approach which determines the TR configuration that performs the best for each query based on its properties. We evaluated quest in the context of feature and bug localization, using a dataset with more than 1,000 queries. We found that quest is able to recommend one of the top three TR configurations for a query with a 69% accuracy, on average. We compared the results obtained with the configurations recommended by quest for every query with those obtained using a single TR configuration for all queries in a system and in the en- tire dataset. We found that using quest we obtain better results than with any of the considered TR configurations.