Evolving Software Practices for the Age of AI

For decades, we’ve meticulously honed our software development practices. Deterministic builds, established release cycles, and static threat models form the bedrock of our digital infrastructure. These disciplines remain vital. However, as we venture deeper into Artificial Intelligence (AI), particularly Generative AI (GenAI), it’s clear that simply overlaying these traditional paradigms isn’t enough. We don’t want to discard our hard-won knowledge, but we do need to think about expanding our toolkit to navigate the unique opportunities and challenges AI presents.

The temptation to rely solely on familiar processes is understandable. Yet, AI operates with characteristics that differ significantly from traditional software. As research from institutions and firms like Google, Statsig, Evidently AI, and Lumenova AI increasingly shows, AI behaves more like an evolving ecosystem than a static, packaged product. While our foundational software engineering principles provide stability, failing to adapt our approaches for AI-specific challenges can cause us to miss out on AI’s full potential.

This article explores how we can build upon existing strengths while also identifying those areas that need updating, and those that need to be written from scratch. We’ll talk about how to gear up to put AI systems into production, and how to responsibly use demos and prototypes to build trust. But first, we’ll need to investigate why AI demands a rewrite to our existing playbooks.

Why AI is a Different Beast Entirely

Why does AI demand such a radical departure? It begins with The Stability Myth. Modern software methodologies seek to provide durable and steady systems, services, and applications. Further, traditional software is (ideally) deterministic, meaning the same inputs typically yield the same outputs. On the other hand, AI has variability and randomization baked in; as they say, “it’s a feature, not a bug.” This means that identical inputs can produce wildly different outputs, even under identical starting conditions. It also means that standard test suites—those that insist on a single “correct” answer—will no longer suffice. A deterministic yardstick simply cannot measure a probabilistic engine.

Beyond its probabilistic nature, many advanced AI models present an interpretability challenge. While traditional software logic can be methodically traced, complex AI systems often operate as “black boxes.” Unpacking the precise reasoning behind a specific AI output can be incredibly difficult, if not impossible. This creates a demand for new tools and techniques that offer interpretability and explanation (Explainable AI), not just for debugging, but for building user trust, meeting regulatory requirements, and ensuring the system aligns with intended ethical goals, especially in critical applications.

Given these challenges—non-determinism and limited interpretability—our approach to quality assurance (QA) and release management must change. Processes and procedures for testing and deployment need to change; for instance, by measuring performance with confidence intervals rather than absolutes, or using shadow deployments to evaluate against live interactions. This necessary vigilance expands traditional CI/CD into what is termed Continuous Evaluation (CE). For teams managing their own models, CE might involve retraining models, or employing reinforcement learning with human feedback (RLHF). However, with the increasing reliance on vendor models and API’s, CE becomes a process of benchmarking model versions against domain-specific data, dealing with rate limits, and fine-tuning with proprietary data sets.

These new requirements turn AI development into what can be described as pipelines without a finish line, where the traditional idea of a stable build no longer applies. Such pipelines acknowledge that an AI system is never truly “done”; instead they are living system that require constant nurturing, monitoring, and adaptation throughout the entire lifecycle. Whether you fine‑tune in‑house or track a vendor’s changelog, success hinges on updating to this new way of thinking about AI-powered software systems.

Hiding uncertainty fosters overconfidence in AI outputs, potentially leading users to accept hallucinations as fact. We must learn to leverage ambiguity, not bury it.

The commitment to perpetual improvement shifts focus from code assets to data assets. Test harnesses, datasets, and eval dashboards must be versioned alongside source code because even small data changes can alter system behavior. Crucially, this deep reliance on data introduces unique ethical dimensions and the specter of bias. AI models learn from the data they are fed; if this data reflects historical or societal biases, the model can inadvertently perpetuate, and even amplify these harms. This demands proactive assessment and mitigation strategies throughout the AI lifecycle. Consequently, ethics reviews and data governance become critical components of the Software Development Lifecycle (SDLC).

Finally, when “maybe” and “I don’t know” are both reasonable answers, the need for an updated User Experience (UX) becomes critical. Hiding uncertainty can foster overconfidence in AI outputs, potentially leading users to accept hallucinations as fact. As such, we must learn to leverage ambiguity, not bury it. This means being deliberate about employing what is known as “affordance” (e.g. including visible probability scores or ranked alternatives along with model outputs). Tooltips, color-coded badges, and quick-feedback widgets become essential so that ambiguous outputs can be flagged and routed through feedback channels. Indeed, feedback loops become critical product infrastructure—a thumbs-up or correction from a user is QA gold.

While these new ways of looking at the AI-SDLC are important to consider, it is not the only factor when judging the success of AI endeavors. One of the biggest roadblocks AI projects experience are, in fact, not technical at all. Rather, they come from predominantly human factors, including the way we talk about and manage our AI efforts.

The Perils of Mismanagement

Forcing AI into traditional software frameworks can lead to predictable and damaging outcomes—stifled innovation, significant wasted resources, and misaligned processes that frustrate skilled teams and risk talent attrition. More insidiously, it can create significant security, governance, and compliance challenges. In particular, with AI, the threat surface widens considerably. Prompt injection, data-poisoning, and model-extraction attacks are all mainstream techniques for exploiting AI systems. The detection and remediation of these attacks extend far beyond the capabilities of standard software security tools. This makes new governance playbooks, like NIST’s AI Risk Management Framework, absolutely essential. This also means that leadership must adopt and adapt to the ever growing list of requirements conducive to successful AI strategies.

Your first AI deployments need to be as close to “bulletproof” as possible. A misstep can set an organization’s AI ambitions back significantly, poisoning the well for future, more ambitious projects.

Leadership also needs to understand that AI deployments are often make or break, especially in the early days. The current climate is thick with AI hype; many are (understandably) skeptical and actively looking for examples to confirm their biases and disbelief. If early AI initiatives falter because they’re managed like conventional software projects, trust evaporates, and regaining it can be an arduous battle. This means your first AI deployments need to be as close to “bulletproof” as possible. A misstep can set an organization’s AI ambitions back significantly, poisoning the well for future, more ambitious projects. The pressure is immense, and it underscores the need for a tailored approach from day one.

Compounding this risk is the simple fact that AI systems are, effectively, unbounded input machines; no test suite can truly enumerate the combinatorial space of prompts a model will face in the wild, meaning bugs often surface in front of users first. The now infamous Google AI Overviews suggesting people glue cheese to pizza, or Microsoft’s Tay chatbot devolving into hate speech within sixteen hours, were not sophisticated attacks. They were ordinary edge cases and trolling that slipped past traditional QA, because no one had envisioned the need to account for prompt injections, “leet-speak“, or Twitter-fueled red-teaming attacks.

These glaring, user-facing, issues demand a rethink on how we manage our AI systems, and therefore a change in how we talk about them. The old paradigms of QA and Verification & Validation (V&V) no longer suffice. What we really need is a new strategic framework, one that takes the complexities and uncertainty of AI systems into account.

Charting a New Course: A Strategic Framework for Enterprise AI

So, how do we navigate this complex landscape and achieve those critical early wins? It’s all about augmenting our IT wisdom with AI-specific thinking.

Navigating enterprise AI should start with a discovery process built for probabilistic systems, not deterministic software with guarantees. OpenAI’s Identifying and Scaling AI Use Cases guide lays out a pragmatic recipe. Begin with a focused sprint where each domain team inventories tasks that are tedious, blocked by scarce expertise, or steeped in ambiguity; exactly the conditions where AI excels, and traditional software comes up short (otherwise, we would have already created a solution, right?). Map every candidate task to one of six capability buckets: content creation, automation, research, coding, data analysis, or strategic ideation. This taxonomy exposes reusable patterns and keeps excitement grounded in concrete capabilities.

Position your first deployments as trust‑building probes. Skip the ‘holy grail’ fantasies.

With the use-case catalog in hand, run each item through an Impact vs. Effort matrix alongside a preliminary Risk and Ethical Consideration review. This helps ensure that even ‘quick wins’ align with responsible AI principles. Green-light the high‑impact, low‑effort, and ethically sound quick wins, and consciously park the high‑effort moonshots for later cycles. Treat the matrix as a living backlog that gets refreshed quarterly, because model upgrades often turn tomorrow’s stretch goals into today’s easy victories.

Once the list is prioritized, position your first deployments as trust‑building probes. Skip the ‘holy grail’ fantasies. Instead, partner with forward‑looking teams and select low‑risk, high‑value proofs that can survive public scrutiny. The objective is twofold: prove what is possible and ship something undeniably useful. Success should be strategically planned to resonate with different internal audiences: providing tangible, undeniable results to convert the skeptics; offering clear ROI and practical benefits to validate the pragmatists; and showcasing innovative potential, even in smaller applications, to satisfy and harness the energy of enthusiasts.

Keep feedback loops open, publish what you learn, and treat models as living products rather than frozen code.

A recent anecdote from inside NASA Jet Propulsion Laboratory (JPL) illustrates the above approach. After a string of California wildfires that led to the loss of thousands of homes, I led a small team that built a Slack‑based assistant to answer natural‑language questions about FEMA processes, insurance claims, wildfire preparedness, and more. The underlying datasets were curated by a team of volunteers to ensure quality and accuracy. We incorporated prompt logging, provided a feedback funnel for bad answers, and ran regular evaluations (a core tenet of Continuous Evaluation) so model or data changes would not break canonical queries. Fixes shipped within hours. Feedback poured in from colleagues directly affected by the fires, many describing the bot as a genuine lifeline. An internal write‑up of the project quickly turned into Exhibit A for why GenAI deserves further investment.

A modest, month‑long project unlocked broad executive sponsorship and sparked a flurry of interest, demand, and buy-in for similar initiatives across the lab. Success stories like these prove the value of starting focused, proving results, and building trust. To keep that momentum, give every use-case a clear owner and sponsor, track accuracy and latency targets, and watch for drift with lightweight monitoring. Keep feedback loops open, publish what you learn, and treat models as living products rather than frozen code. Sustained, disciplined iteration turns today’s spark into an enterprise capability that can reshape the future.

Conclusion – Leading the Charge into the AI Frontier

AI runs on probabilities, not certainties. For organizations steeped in deterministic software disciplines, this demands a profound shift in how success is measured, monitored, and secured. Treating AI as just another software update is a recipe for mediocrity and a high-stakes gamble with organizational trust.

To truly leverage AI’s power, we must acknowledge its unique, living-system nature and adapt our organizational DNA – our processes, structures, and mindset. This means fostering a culture of curiosity, strategic experimentation, and continuous learning, all while diligently managing risk and perception. It requires leadership to champion this shift, ensure those initial deployments are meticulously planned for success, and invest in the new literacies required to elucidate high-gain use-cases.

The organizations that adapt their pipelines, roles, and interfaces to embrace variability will not only ship better AI products faster, building on a foundation of earned trust, but also unlock new avenues of innovation and redefine what’s possible within their industries. Those who delay, clinging to the elusive stability of old frameworks, will find themselves struggling to catch up.


What are your thoughts? How is your organization adapting its approach to deploy AI successfully and build lasting trust?

Leave a comment