How StackOverflow Is Adapting in the Face of Generative AI
OverflowAI revolutionizes content creation by streamlining the process through its powerful generative AI loop.
Join the DZone community and get the full member experience.
Join For FreeStackOverflow, the most commonly used platform by software developers for programming support, has been through a rough ride lately. Despite an impressive 69% of questions answered, StackOverflow’s traffic has been in decline. Similarweb’s data shows that their traffic dropped 14% year over year (StackOverflow says it’s closer to 5%). Nevertheless, the trend is downward and is explained primarily by the emergence of AI coding products like ChatGPT and GitHub Copilot. These products have meaningful code-writing capabilities and are, therefore, able to provide programming support, at least partly as good as StackOverflow does. Ironically, several of the large language models (LLMs) behind these AI products were trained using scraped StackOverflow data.
The company has gotten pretty harsh media coverage with these developments. Business Insider, in their article Death by LLM, wrote:
Welcome to the future of the internet in an AI world. Online communities like Stack Overflow and Wikipedia thrived as hubs for experts and curious browsers to come together and share information freely. Now, these digital meeting places are being pillaged by big tech companies prowling for human data to train their large language models.
The new products emerging from this generative AI boom are putting the future of these online forums in doubt. The chatbots answer questions clearly, automatically, and often pleasantly — so humans don’t need to deal with other humans to get information.
In the midst of all this attention, StackOverflow has played a steady hand and articulated its two-pronged approach to addressing this challenge:
- A few weeks back, they announced that they will start charging large AI developers who use the platform’s 50M+ questions and answers for model training (we dug into this issue in the data scraping article earlier).
- Last week, they launched the OverflowAI product, which is a set of actually useful generative AI features that can help kick off their second innings — we will focus on this today.
In this article, we’ll dive deep into:
- AI code writing tools disrupting StackOverflow.
- What OverflowAI does.
- Underlying trends from the StackOverflow strategy.
AI Code Writing Tools Disrupting StackOverflow
There are several AI code writing and editing tools available in the market today. These are either independent products (like OpenAI Codex, ChatGPT, Google Bard) or products that are natively integrated inside existing platforms (like GitHub Copilot, Replit Ghostwriter, Amazon CodeWhisperer). They have a broad range of capabilities, including code generation, code editing, autocomplete, and debugging.
The products with native distribution (like GitHub Copilot) are at a large advantage because they can operate seamlessly within environments that programmers already use today, and we will see more products attempting to get plugged into existing environments. For example, CodeGPT has a plugin that lets developers use the product from within Visual Studio Code (a popular code editing tool).
Existing AI code-writing tools are good at certain tasks. For example, this Reddit thread captures feedback from several web developers about GitHub Copilot — the overarching theme is that the product is useful in a subset of situations where developers have to write net new code and don’t want to spend time writing from scratch. Even for those situations, it’s often hit or miss.
The reason is not surprising. Conceptually, large language models (LLMs) take in a ton of data and generate output on the basis of this construct: in a particular context, for the question you asked, what is the most likely word/text to follow the previous word? It’s essentially calculating the probability of a word following another and generating output based on that. Despite this construct, given the amount of data that’s gone into training these models, the results for the more general ChatGPT use cases (like drafting an email or summarizing a page) have been nothing short of impressive. But it’s important to remember that language models, by design, have limited analytical/math capabilities. In other words, when you ask the model, “What is 2+2?” it may give you the right answer — not because it knows math but because it has seen that text pattern before in its training data.
Similarly, when it comes to code generation, the model does not really “know” the underlying concepts behind programming but is predicting results based on its training with a ton of text data. The consequence of this is the GitHub Copilot feedback above — it is sometimes good at generating the base code you need, but its ability to actually understand code, debug, and provide you explanations is limited. This will get better over time, but it’s hard to say if it will ever get to the point of high accuracy/high reliability.
StackOverflow CEO Prashanth Chandrasekar describes it succinctly:
One problem with modern LLM systems is that they will provide incorrect answers with the same confidence as correct ones and will ‘hallucinate’ facts and figures if they feel they fit the pattern of the answer a user seeks.
At some point, you’re going to need to know what you’re building. You may have to debug it and have no idea what was just built, and it’s hard to skip the learning journey by taking shortcuts.
This is the opportunity for StackOverflow — their traffic drop may be permanent, and it’s very likely that programmers come to StackOverflow less often for simpler questions (e.g., they might not visit StackOverflow anymore for an off-the-shelf sorting algorithm). But where the product can shine is: 1) providing high accuracy / high-reliability answers to more complex questions that language models might not have the capability to answer, and 2) providing answers to questions in new technologies/problem spaces that the models have not had previous data to train on. OverflowAI is designed to directly tap into this opportunity.
What OverflowAI Does
There are three key facets they are betting on — direct answers to questions, usability from within developer environments, and supercharging knowledge within enterprises.
OverflowAI Search provides direct answers to users in a Q&A format (similar to ChatGPT) but provides several links to actual StackOverflow posts. Besides helping create trust, this also provides users with the opportunity to go deeper where the answer provided by AI does not fully solve the user’s problem. This strikes the delicate balance of giving a direct answer when the question is simple but also guiding the user along a more exploratory path for difficult questions.
Published at DZone with permission of Vignesh Balagopalakrishnan. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments