Category: Systems

  • Technical debt and vibe-coding

    When looking at any system as-it-is, my perspective remains: tech debt is the state of a system that captured the understanding of the problem when it was created and doesn’t fully reflect today’s understanding.

    The (obsolete) understanding applies to both functional and non-functional requirements, so it might be about the business, technical, scaling, or another aspect of that system (also organizational).

    This might NOT help with “business” understanding what is meant by “tech debt”, but put it in terms of business understanding embodied in code, you might stand a chance to affect the roadmap.

    Don’t forget – we will still produce (some) technical debt for the future as soon as the business or operational requirements change.

    And, yes, vibe-coding will produce a lot of technical debt. Slop is not long-term.

    Inspired by the exchange on LinkedIn.

    Knowledge map

    Additional references

    Quote: Technical Debt

    Shipping first time code is like going into debt. A little debt speeds development so long as it is paid back promptly with a rewrite… The danger occurs when the debt is not repaid. Every minute spent on not-quite-right code counts as interest on that debt. Entire engineering organizations can be brought to a stand-still under the debt load of an unconsolidated implementation, object-oriented or otherwise.

    — Ward Cunningham

    https://en.wikipedia.org/wiki/Technical_debt

    https://de.wikipedia.org/wiki/Technische_Schulden (German article is quite different and worth reading)

    Quote: Vibe coding

    … a chatbot-based approach to creating software where the developer describes a project or task to a large language model (LLM), which generates code based on the prompt. The developer does not review or edit the code, but solely uses tools and execution results to evaluate it and asks the LLM for improvements.

    https://en.wikipedia.org/wiki/Vibe_coding

  • Software quality

    Before working in an architect role, software quality seemed to me like a very abstract measure, based on ones subjective perception and experience.

    While hard to define, there are some attributes commonly used to evaluate quality of a software component or system.

    Note: as some other articles on this blog, this is a living document that will change in time.

    Quick attempt

    I think everyone would agree that well written software should have the following attributes.

    • It does what it is supposed to do,
    • It does not contain defects or problems,
    • It is easy to read, maintain, and extend.

    While true and well intentioned, this list expresses what I meant with abstract and prone to subjective perception.

    Detailed approach

    Even though (I find) the standards can be abstract and overly dry, the “Product quality model” of the “ISO/IEC 25010:2011” pretty well defines the attributes.

    Aside the specialised ISO audits, I think with a pragmatic approach such attributes can be measured and improved.

    Most of these can be regarded as non-functional attributes, and only the first addresses the functional requirements that arguably brings value to a software product. On the other hand the product worth can quickly erode if the other attributes are not covered as well. This is important for everyone involved to understand, which can be especially hard for non-technical stakeholders.

    Here’s the list with notes and potential ways to measure and improve each.

    Functional suitability

    • Functional completeness
    • Functional correctness
    • Functional appropriateness – how well specified tasks and objectives can be accomplished

    Notes

    This is the only (!) set of attributes that addresses the fulfilment of functional requirements, even if these are the ones that arguably bring value to a software product. On the other hand the product worth quickly erodes if the later attributes are not covered as well. This is important for all the stakeholders to understand. (I know I am repeating myself on this one)

    Measuring

    This can be measured with manual and automated functional testing, fulfilment of acceptance criteria, and user feedback.

    Apart from bug reports, in practically all systems I worked with, this could not be followed nor measured in operation.

    Improving

    Based on user feedback, bug reports, and internally measured usage KPIs.

    Performance efficiency

    • Time behaviour – the response and processing times, and throughput rates of a product or system
    • Resource utilisation
    • Capacity

    Measure

    The performance can be measured by performance testing and resource monitoring during those tests. Of course, operational monitoring will also bring insights, but this is in most cases too late.

    Improving

    Static code analysis tools can help, but profiling is still irreplaceable to improve.

    Compatibility

    • Co-existence
    • Interoperability

    Notes

    This becomes extremely important in system architecture and large systems.

    Measure

    A set of questions that can help evaluate a system are:

    • Is there a very specific set of requirements for deployment?
    • How about operational requirements?
    • Does the system have its own way of integrating with other systems, unlike the others?

    With the container and serverless deployment and execution models, the co-existence becomes less of a problem, where the performance efficiency becomes important.

    Improve

    Interoperability is all about the API design, choreography/orchestration, and system openness. Smart endpoints and dumb pipes principle applies here as well.

    Usability

    • Appropriateness recognisability – how well can the user recognise whether a product or system is appropriate for their needs
    • Learnability
    • Operability
    • User error protection
    • User interface aesthetics
    • Accessibility

    This set of attributes is oriented to measure the fit with the end user. Software engineers (architects as well) are notoriously bad at this, and cooperating with UX/UI designers/engineers is crucial.

    Measure

    This is somewhat dependent on user technical orientation, subjective relationship to the product and previous familiarity. There are tools and platforms like UserTesting or Accessibility Insights that have defined a clear set of measurements (no affiliation or promotion, just the ones I am aware of).

    Improve

    Align the product implementation to the feedback and measurements. Introduce UX/UI design if not present. Introduce accessibility experts. Promote disability inclusion.

    Reliability

    • Maturity
    • Availability
    • Fault Tolerance (recovery)
    • Recoverability (data)

    Notes

    This is about system design, deployment and operations models, network and product configuration.

    Measure

    Documenting your system with reliability block diagrams and performing fault tree analysis.

    Improve

    Chaos Monkey, redundant deployment, reduced dependencies.

    Security

    • Confidentiality – authorisation
    • Integrity
    • Non-repudiation – how well actions or events can be proven to have taken place
    • Accountability
    • Authenticity

    Notes

    Too often, this is taken as an afterthought, but is absolutely essential in having a system run properly and preserve the data as intended.

    Measure

    Security audits, code analysis, penetration testing, bounty programmes. Identify critical business data and business risks.

    Improve

    Use coding standards, especially take care of potential attack vectors and keep the attack surface as low as possible. Do not expose to the internet anything that is not absolutely needed.

    Maintainability

    • Modularity
    • Reusability
    • Analysability
    • Modifiability
    • Testability

    Notes

    Again, too often, this is taken as an afterthought, but is absolutely essential in building a sustainable system. This will have a hard impact on time to market, especially in the long run.

    Measure

    Code test coverage, code audit, pull requests, documentation (!), architecture, static code analysis, profiling.

    Improve

    Increase code test coverage, create unit tests, run tests in CI/CD, do code reviews, run

    Portability

    • Adaptability
    • Installability
    • Replaceability

    Notes

    Again, too often, this is taken as an afterthought, but is absolutely essential in building a sustainable system. This will have a hard impact on time to market.

    Measure

    Are the components using standard mechanisms of integration and

    If you have a mobile app in the stack – what is the device and operating system compatibility.

    Improve

    Increase code test coverage, create unit tests, run tests in CI/CD, do code reviews.

    Further reading

    https://martinfowler.com/

    W3C Web Accessibility Initiative (WAI).

    Fault Tree Analysis (Wikipedia)

    Hexagonal architecture

  • Normal Accident

    The term “Normal accident” really caught my attention – how can an accident be normal?

    Are complex software systems, especially ones with technical debt and many knowledge transfers, destined to have (catastrophic) failures?

    Find below some quotes from the related Wikipedia articles (emphasis mine) and the reference.

    —-

    A system accident (or normal accident) is an “unanticipated interaction of multiple failures” in a complex system.

    This complexity can either be of technology or of human organizations, and is frequently both.

    A system accident can be easy to see in hindsight, but extremely difficult in foresight because there are simply too many action pathways to seriously consider all of them. 

    Charles Perrow first developed these ideas in the mid-1980s. William Langewiesche in the late 1990s wrote, “the control and operation of some of the riskiest technologies require organizations so complex that serious failures are virtually guaranteed to occur.

    Safety systems themselves are sometimes the added complexity which leads to this type of accident.

    Once an enterprise passes a certain point in size, with many employees, specialization, backup systems, double-checking, detailed manuals, and formal communication, employees can all too easily recourse to protocol, habit, and “being right.” …

    In particular, it is a mark of a dysfunctional organization to simply blame the last person who touched something.

    Perrow identifies three conditions that make a system likely to be susceptible to Normal Accidents. These are:

    • The system is complex
    • The system is tightly coupled
    • The system has catastrophic potential

    Reference

    en.wikipedia.org/wiki/System_accident

    en.wikipedia.org/wiki/Normal_Accidents

    en.wikipedia.org/wiki/Groupthink

  • Grady Booch – A thread regarding the architecture of software-intensive systems.

    Quoting Twitter thread by @Grady_Booch on 4th of September 2020.

    There is more to the world of software-intensive systems than web-centric platforms at scale.
    A good architecture is characterized by crisp abstractions, a good separation of concerns, a clear distribution of responsibilities, and simplicity.

    All else is details.
    You cannot reduce the complexity of a software-intensive systems; the best you can do is manage it.
    In the fullness of time, all vibrant architectures must evolve.

    Old software never dies; you must kill it.
    Some architectures are intentional, some are accidental, most are emergent.
    Meaningful architecture is a living, vibrant process
    of deliberation, design, and decision.
    The relentless accretion of code over days, months, years
    and even decades quickly turns every successful new project into a legacy one.
    Show me the organization of your team and I will show you the architecture of your system.
    All well-structured software-intensive systems
    are full of patterns.
    A software architect who does not code is like
    a cook who does not eat.
    Focusing on patterns and cross-cutting concerns
    can yield an architecture that is smaller, simpler, and more understandable.
    Design decisions encourage what a particular stakeholder can do as well as what constrains what a stakeholder cannot.
    In the beginning, the architecture of a software-intensive system is a statement of vision. In the end, the rchitecture of every such system is a reflection of the billions upon billions of small and large, intentional and accidental design decisions made along the way.
    All architecture is design, but not all design is architecture.

    Architecture represents the set of significant design decisions that shape the form and the function of a system, where significant is measured by cost of change.

    https://threadreaderapp.com/thread/1301810358819069952.html
    https://twitter.com/grady_booch/status/1301810358819069952?s=21