Understanding the Impact of Support for Iteration on Code Search

Lee Martie, André van der Hoek, Thomas Kwak. 2017. Understanding the Impact of Support for Iteration on Code Search. In Proceedings of 2017 11th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, Paderborn, Germany, September 4–8, 2017 (ESEC/FSE’17), 12 pages. https://doi.org/10.1145/3106237.3106293

Unlike when people search on the Internet to be aware of the latest news or current temperature, a distinctly informational activity [60], programmers routinely search for source code on the Internet [13, 68, 72] when they are looking for solutions to aid in their current programming problem [19]. Sometimes the programmer uses a search engine to find one specific code snippet. For example, the programmer might need to remember how to write a few lines of code (e.g., the code to open a database in PHP) [15, 25]. Other times, the programmer uses a search engine when she is not quite sure what she is searching for and there is not exactly one code snippet she has in mind. For example, the programmer might need to learn a concept [15, 66], such as database transactions, and needs to look at multiple examples illustrating different aspects of databases and alternative examples to clarify her understanding [15, 25, 36, 61, 66]. For another example, the programmer might need to get ideas [72], such as when designing a new game and wants to see, and sometimes compare [25], how other code handles game characters, board state, or visualization.

Given how important searching for code on the Internet is to programmers, researchers are investigating how to improve code search engines. Some, for instance, have been investigating how to support more expressive queries (e.g., searching by test case or method signatures) that afford more precise matching of code compared to keywords (e.g., [1, 10, 17, 37, 41, 45, 54, 59, 70, 75]). Others have investigated new matching and ranking algorithms (e.g., ranking code higher with method names or class names matching the keywords) so that more results presumed to better match the topic described by the keywords are returned and appear towards the top of the list (e.g., [14, 20, 27, 29, 35, 42, 43, 49, 79]).

While many different approaches for improving code search exist, these approaches are generally similar in one very visible design decision: they are non-iterative approaches. They expect a query and optimize on returning the best matching results for the query, occasionally offering filters to help scope the results (e.g., programming language or file type filters) [2, 5]. This focus on a non-iterative design for search engines is mirrored in how search engines are evaluated [46]. Typically, a group of experts score the performance of search engines by the results returned for some representative set of queries, with the score reflecting how on topic the results are.

While a search engine that returns the code the programmer is looking for after the first query appears ideal, many times the programmer is not sure what she is looking for and does not search for code with a single query. Instead, the programmer issues multiple queries [12, 15, 34, 67, 71], where, after receiving results, the programmer modifies their query by removing keywords, addingkeywords, or some combination of both, and repeats this processmultiple times [12,34,67]. That is, search looks like an iterative process where programmers often submit a query, get results, reflecton and learn from the results, submit a modified query in responseto the results, get new results, and so on, until the programmerstops searching.

Cognitive processes in which programmers engage possibly explain why code search is often iterative. Particularly, when programmers are working on a programming problem, what theyare working on, a solution, is often not immediately understood[19,32,69]. However, as programmers begin to look at some code or consider possible ideas, they are faced with constraints or different perspectives not previously considered, changing their understanding, and a new understanding will often change the next code andideas considered [19,21,44,53]. The implication of this on codesearch is that, when programmers search for code not clearly understood (e.g., cases when learning or needing ideas) code results can cause them to change their understanding of what they are looking for and, thus, the next code searched for — making the search iterative.

Our research investigates what happens when programmers are explicitly supported in searching iteratively for code. It particularly answers the following research question:

What is the impact of explicitly supporting software developers insearching iteratively on the experience, time, and success of the codesearch process on the Internet?

The key insight we explore to support iteration is that the code returned for a query tend to serve as inspiration or triggers forthe next queries issued. We introduce two search engines, Code-Exchange (CE) and CodeLikeThis (CLT), specifically aimed to enable the user to directly leverage the results in formulating thenext query. CE [47], previously developed but now built on the Specificity ranking algorithm [43], provides a set of four features supporting the programmer to use characteristics of the results tofind other code with or without those characteristics. For example, if a result is undesirable because it is too complex, then the user can refine her query to find code that is less complex than the undesirable result. Rather than using particular characteristics,CLT supports simply selecting an entire result to find code that is analogous, to some degree, to that result. For example, if the user receives an implementation of an AI for chess but wants to see other similar approaches to learn from, then she can select the entire result to find other similar approaches.