Supporting Software Engineering with LLMs

Written by Kyrill Schmid | Dec 31, 2023 11:03:56 PM

Software is pivotal for almost all services that we use in our daily lives from using the dishwasher to driving our car. Paradoxically, although software helps us to do things fast and efficient, building software often is a cumbersome process with many tedious manual and mostly unautomated steps.

While there are various well established process frameworks for building software a lot of effort and costs stem from manual labour, e.g. during the design phase like defining requirements, writing user journeys and epics or when designs are being translated into software entities like classes, functions and attributes or when writing tests and code. Additionally, for software services providers creating an offer for requests (i.e. RfPs) is a time-consuming task because it takes a lot of time to fully understand the requirements, before being able to estimate efforts and costs.

Figure 1: Software Development Lifecycle

Recently we have seen a lot of evidence that AI is capable of generating high-quality text, images, voice and more that is often undistinguishable from human generated content. However, when naively asking ChatGPT to generate an app to manage your personal finances or create a website for an online shop with all the bells and whistles there is only a little chance that something overly helpful will be outputted. While initially you might get some promising results it is very likely that at some point ChatGPT will lose touch of relevant context if the conversation gets longer or that partial results will be inconsistent which requires the user to be technically skilled to spot mistakes and put things together correctly.

Even if GPTs and the likes will get much larger context windows in the near future, i.e. are able to handle more information per request, it is unlikely that a model will be able to generate high-quality artifacts along the software development lifecycle from a vague and highly underspecified system description. As LLMs are predicting tokens based on previous tokens it is much more likely that at some point the model will deviate from what we need, and due to its autoregressive nature this output will be used later on amplifying the initial error at each step which might bring undesirable final results. What can be done to prevent such a behavior?

Figure 2: Artifacts

One effective way is to keep the human in the loop, to steer the process, readjusting the model after each step and keeping it from going wild [1]. In such a scenario, we could use the model for all the time-consuming and laborious things (luckily things the model is typically best at). However, the model is only as good as we enable it by a) defining a given task as precise as possible for example through the right prompt and b) providing only relevant information for the task at hand. Finding the right prompt to encode a task is the general goal of prompt engineering and by now various frameworks exist to tackle different problem classes (Chain-of-Thought prompting [2] and variations like Tree-of-Thoughts [3] or Graph-of-Thoughts [4]). Providing relevant information for a given task can be achieved by separating the data from the LLM and searching relevant data with semantic search to feed only relevant parts to prevent distracting the model (also know as retrieval augmented generation (RAG)).

When the model has done its job of generating an artifact (story, epic, test,...) the human can take back control by adjusting the model output thereby preventing the model to build upon a flaw that it made in an earlier step. In return, the user can take a more holistic perspective by guiding the process as the cognitive load will not be consumed by creating the write-intensive details.

Figure 3: Thoughts on the future of RE

[1] The Rise and Potential of Large Language Model Based Agents: A Survey https://arxiv.org/pdf/2309.07864.pdf

[2] Chain-of-Thought Prompting Elicits Reasoning in Large Language Models https://arxiv.org/pdf/2201.11903.pdf

[3] Tree of Thoughts: Deliberate Problem Solving with Large Language Models https://arxiv.org/pdf/2305.10601.pdf

[4] Graph of Thoughts: Solving Elaborate Problems with Large Language Models https://arxiv.org/pdf/2308.09687.pdf

View full post