Over the past two years, I analyzed with a couple of colleagues how we judge how much quality is “good enough”. Such a statement is usually an outcome of a discussion and a consent that needs to be found on every different project and every different company.
It is also quite different, depending on your product life cycle. This talk starts with this “balance” of how-much-quality you need and investigates how this knowledge can be taken into the interactions of you and other people in the team.
For the entertainment part, it’s wrapped into an cheesy story about ancient gods and titans – arguably a bit too cheesy 😉
Yes, I am a Trekkie. And while I usually do not share such stuff it is the little bit of context setting I need for this blog post.
Since everyone moved to their home offices in the wake of corona, our team put some deliberate effort into their ceremonies, to make them something “special”, to compensate for not meeting IRL any more. Thus, I had the honor to design our first online retro. I received fantastic feedback, thus I thought it may be quite interesting for others to try something like this, too.
I chose a format that I call 2 : 4 : 8 (I believe I found this label somewhere else, but unfortunately, I wasn’t able to dig it up and link to it again).
The idea of the format is the following: In normal Retros there are up to 10 devs and 2 – 3 other roles. Due to the nature of the roles, also a lot of tech problems are usually shared. In the democratic dot-voting afterwards, the developers often promote the tech topics. I am not saying that this is bad intention, it’s just a natural outcome based on the interests of the different groups in a retro.
To avoid this effect, we went for a format which supports “minorities” and focuses on how important a topic to a certain person is. In this format usually topics come up that are super important to some people but can in the other format never be shared other than in a side note. It’s a very inclusive retro edition.
Here is how it works:
each person writes down what the two most important things are to them. There is no categorization: could be things to start or stop or change or …, there is just limitation that it could only be 2 things in total for each person. (Most retros allow unlimited amounts of contributions but restrict the categories, this was the other way around: unlimited/no categories but restricted to 2 contributions).
after each person found two issues that were most important to them, the participants were paired up. Then, 2 paired-up people need to agree on the most 2 important thing inside their pair. (So 2 people each bring 2 topics = 4 issues in the pair. Then the pair needs to pick the top 2 of them)
after the pairs found two issues they go into group discussion (2 pairs = 4 people now). Again, they should discuss the two most important issues.
finally, all get together with a total of 4 issues left. They are discussed, action items are derived and assigned. Finally, the retro is closed.
So I wrapped the retro into a Star Trek Story. I used the video conferencing system Zoom which allowed me to share video and audio from my computer. And it allowed me to design “breakout rooms” which I could then assign to the people. This was particularly helpful to have all the different discussions with different amounts of people. I also used google slides as the collaborative tool.
I think one can make it work with any tool and there are tools out there which work even better. This combination was just handy for me and I will use this as the example tool going forward.
Setting the Mood
Before the first people joined, I started my screen sharing with the presentation. The first slide had a video and background music. In this way, the people who joined early had a bit of space themed ambience music while waiting. That was already a positive and freshly surprising start.
After officially starting the retro, we watched a tiny teaser to get into the Star Trek mood:
We followed with a “Weather Check” to see how everyone felt and continued with a “Safety Check” to understand if the level of trust in the Retro was high enough. I used an anonymous google form to collect that data, that was a bit cumbersome to set up but an easy, quick and secure way to collect this information.
After that, we went into the 2 : 4 : 8 exercise: The Google Presentation Deck was shared with all and each person randomly picked a Star Trek movie character and wrote (5 min timebox) 2 things to change or stop or improve.
Then the people were asked to team up. There were prepared zoom breakout rooms in the call with names fitting the theme (“Bridge”, “Captain’s Ready Room”, …). On one slide there was a pre-selection which characters would meet in which room. In this way it was a random meet & mix of people who would share their thoughts.
After the pair discussions, we came back once more and divided into two groups with 4 or 5 people, each. Then both smaller groups reported the two main issues back to the whole group and we discussed actions for them.
This gave us quite some action items for 4 issues we found. It also turned out that no-one in the team (but me) watched Star Trek. However, everyone enjoyed the retro and some said it was “the best they ever had in their live”.
The full slide deck is here as a PDF. This is for inspiration, I don’t think that one could just copy & paste the whole thing.
This is a template for the retro. We were 9 people + me facilitating, thus, this is how it’s structured. The format itself also works for 30 people.
When great thinkers think about problems, they start to see patterns. They look at the problem of people sending each other word-processor files, and then they look at the problem of people sending each other spreadsheets, and they realize that there’s a general pattern: sending files. That’s one level of abstraction already. Then they go up one more level: people send files, but web browsers also “send” requests for web pages. And when you think about it, calling a method on an object is like sending a message to an object! It’s the same thing again! Those are all sending operations, so our clever thinker invents a new, higher, broader abstraction called messaging, but now it’s getting really vague and nobody really knows what they’re talking about any more. Blah.
When you go too far up, abstraction-wise, you run out of oxygen. Sometimes smart thinkers just don’t know when to stop, and they create these absurd, all-encompassing, high-level pictures of the universe that are all good and fine, but don’t actually mean anything at all.
These are the people I call Architecture Astronauts. It’s very hard to get them to write code or design programs, because they won’t stop thinking about Architecture. They’re astronauts because they are above the oxygen level, I don’t know how they’re breathing. They tend to work for really big companies that can afford to have lots of unproductive people with really advanced degrees that don’t contribute to the bottom line.
A recent example illustrates this. Your typical architecture astronaut will take a fact like “Bitcoins are a blockchain service to exchange money anonymiously” and ignore everything but the architecture, thinking it’s interesting because it’s blockchain, completely missing the point that it’s interesting because it allows to exchange money anonymously.
All they’ll talk about is blockchain this, that, and the other thing. Suddenly you have blockchain conferences, blockchain venture capital funds, and even blockchain backlash with the imbecile business journalists dripping with glee as they copy each other’s stories: “Blockchain: Dead!”
The Architecture Astronauts will say things like: “Can you imagine a concept like Bitcoin where you can exchange anything, not just money?” Then they’ll build applications for supply chains that they think are more general than Bitcoin, but which seem to have neglected that wee little feature that lets you send and receive money without giving away your name — the feature we wanted in the first place. Talk about missing the point. If Bitcoins weren’t blockchain but they did let you swap money anonymously, it would have been just as popular.
Another common thing Architecture Astronauts like to do is invent some new architecture and claim it solves something. Kotlin, ProtoBuffer, REST, RabbitMQ, GraphQL, IOT, Micro Services, oh lord I can’t keep up.
I’m not saying there’s anything wrong with these architectures… by no means. They are quite good architectures. What bugs me is the stupendous amount of millennial hype that surrounds them. Remember the Microsoft .NET white paper?
The next generation of the Windows desktop platform supports productivity, creativity, management, entertainment and much more, and is designed to put users in control of their digital lives.
Currently, the Apple MacBook Pro whitepaper claims:
Processor and Memory working at the speed of thought
Oh, good, so now I can finally stop thinking on my own since apple found a way to cut years of advance research in quantum computing and artificial intelligence.
Microsoft and Apple are not alone. Here’s a quote from a Sun Jini:
These three facts (you are the new sys admin, computers are nowhere, the one computer is everywhere) should combine to improve the world of using computers as computers — by making the boundaries of computers disappear, by making the computer be everywhere, and by making the details of working with the computer as simple as putting a DVD into your home theater system.
And don’t even remind me of the fertilizer George Gilder spread about Java:
A fundamental break in the history of technology…
That’s one sure tip-off to the fact that you’re being assaulted by an Architecture Astronaut: the incredible amount of bombast; the heroic, utopian grandiloquence; the boastfulness; the complete lack of reality. And people buy it! The business press goes wild!
Why the hell are people so impressed by boring architectures that often amount to nothing more than a new format on the wire for RPC, or a new virtual machine? These things might be good architectures, they will certainly benefit the developers that use them, but they are not, I repeat, not, a good substitute for the messiah riding his white ass into Jerusalem, or world peace. No, Apple, MacBooks are not suddenly going to start reading our minds and doing what we want automatically just because they got a new processor. No, Sun, we’re not going to be able to analyze our corporate sales data “as simply as putting a DVD into your home theater system.”
Remember that the architecture people are solving problems that they think they can solve, not problems which are useful to solve. gRPC may be the Hot New Thing, but it doesn’t really let you do anything you couldn’t do before using other technologies — if you had a reason to. All that Distributed Services Nirvana the architecture astronauts are blathering about was promised to us in the past, if we used DCOM, or JavaBeans, or OSF DCE, or CORBA.
It’s nice that we can use GraphQL now for the format on the wire. Whoopee. But that’s about as interesting to me as learning that my supermarket uses trucks to get things from the warehouse. Yawn. Mangos, that’s interesting. Tell me something new that I can do that I couldn’t do before, O Astronauts, or stay up there in space and don’t waste any more of my time.
All I did was to replace the mentioned technologies in this 19 year old text with recent examples, as my friend and former colleague Herman Vöcke suggested on Twitter.
The result is an up-to-date blog post. And what does that actually mean?
It means that while we developers like to concentrate on solving technical problems we still have to – also after 19 years – solve problems in the real world. And it is a good example of how everything old is new again in our industry.
Since the 90s we know that Service Integrations via a shared file storage should be avoided. The system cannot be tested very well, debugging is hard and data not trustworthy as it’s manipulated by many sources.
Surprisingly, what we QAs of ThoughtWorks have observed in different teams with different clients over the past few months is a very strong tendency to service integrations via S3 buckets in Amazon Web Services (AWS). Many people (e.g. while Inceptions at the beginning of a project) that were working for our clients for many years actually suggested to read from / write to existing buckets. The main reason for those buckets to be widely used and accepted throughout the organizations were the data science projects that were starting or even already established in the companies.
Hence, the justifications we heard for such an integration between services was along the lines of “the data is there anyways” and/or “the data scientist will also need this data”. But being a data-centric or even a data-driven company should not compromise the way the integrations are built on the technical side. Doing “Data Science” does not require anti-patterns in architecture.
Also, we believe that this setup makes it actually more difficult to do accurate data science. No one could point us to a resource that helped to get an overview of what data is stored where. It’s unclear which data is used for what. Even worse, it’s not always possible to connect any two data stores in the data analysis. If we think of more than a couple of services, the integrations via S3 buckets looks like in the following diagram:
All of those services integrate via some S3 buckets. Those buckets are distributed throughout the architecture. Most of them already have old data but no one is really sure if it’s old data or still used as they have multiple read & write accesses and there is no way to keep an overview. This results in the current big data swamps we are working with. Mind the plural: swamps. Also, we don’t see them as clean lakes. Those buckets are sometimes not connected at all and the data is a pile of mud. This is what we reference to as a “data swamp”:
As QAs, we like to have clean integrations that can easily be tested and a clean flow of data through the system that enables debugging. It does not necessarily need to be predictive for big data, but an understanding of what type of data is flowing from Service A to Service B would undoubtedly be helpful. It is very hard to write integration- or maintainable end-to-end tests for those kinds of architectures with integrations of that kind. That is obviously one of the main motivations for us QAs to look more into this issue.
We advocate for a clean data storage for the data engineers and in our role we are actually in a good spot to do this: QAs typically care more than other people about Cross Functional Requirements (CFR). Storing masses of data in a proper system is as much a valid CFR as looking out for security, performance or e.g. usability.
Hence, in the last weeks we started discussing and we are wondering where this (quite enthusiastic!) buy-in for old anti-patterns is coming from. We are trying to understand more of the root causes of this type of architecture (besides doing “data science”). Does it come down to the typical “it’s quick to develop”, “we all understand the technology” or “we did not know what else to use”?
A more sustainable approach would obviously be to enable services to directly communicate with each other and not via shared file storage. That would allow any client to have stateless services that each store just the information they need.
That would allow us to build a more scalable system. It would allow us to test the integration between services much better – or at all for that matter! It does not necessarily all be via well defined APIs in synchronous calls. When one providing service has many consumers which only need new information at some point of time or if you have synchronous calls are cascading all through your system you may want to use something like the AWS simple notification service (SNS) instead (or Kafka to not just talk amazon here or rabbitmq or…). But any separation that is more structured will enable a company to actually scale faster.
But implementing APIs alone won’t do the trick. This suggestion for improvement has the base assumption of well split domains. I explained this in detail in this blog post.
Once you build a more structured architecture and have clean integrations, not only the future development of this application will benefit, there will also be a huge improvement for the the data scientists. When each service stores the relevant data on its own one can “simply copy”¹ the data and “do” any science on it. In one single place.² That would allow the data people to work on their own without the need of product teams building reports and data pipelines.
“simply copying” the data may also involve some queues/messages. It should not be a local bash script that is literally copying. For static data analysis you won’t need real-time data anyways, in this case it’s fine if you need some minutes to aggregate, process and store the data.
“in one single place” may actually be different data stores: one for public information (like flight dates), one for not public information (like usernames/user identifiers) and yet another one for information that even most people of a company should not have (like financial data or user passwords).
Traditionally, when talking about the quality of our software, of our product, we discuss how we can ensure a certain level of quality. In this context quality is seen as a minimum requirement. The business value of quality from this perspective is in the prevention of potential losses, e.g. by downtime of a platform.
Today, we can see that this picture shifts dramatically. More and more companies move to a devops culture and we can see the same “shift left” happening in the QA space. Likewise, today’s businesses need to react fast to changes in the market. New technologies emerge faster than companies can pick them up.
We want to talk about a high quality product in this context. The high standard in our products does not just ensure our business and mitigate potential risk. High quality in our software allows us to build better software faster. It makes today’s product more resilient to changes in the future. We change our perspective and do not think how to ensure a certain (minimum) level of quality. But how high quality enables our business.
This talk is a shared effort with my colleague Nina.
Did you know you can enable your team to build better software faster while having a stronger team culture? Too good to be true?
In recent years, agile has influenced early involvement of testing in the development cycle. With this more and more testers are testing new functionality as soon as a commit is pushed. Yet such teams still fail to deliver high quality software. Why? What is missing?
Working with various diverse teams across multiple projects, Finn realised that testing doesn’t actually improve software quality. It’s just a bar assuring a certain level of quality that already exists. In order to actually improve we must get involved into much more than simply testing and think about the product as a whole.
In this session, Finn will share specific examples of how engaging with the business, engineering, process optimisation as well as the entire cross-functional team can lead to significant improvements in the product’s quality. At the end of the talk, you will know how to start with a holistic approach to improving product quality throughout the entire software delivery lifecycle.
— this is a talk I am currently doing on various conferences. Some people asked me if I can share the slides, but the files are just a bit too large. So instead I decided to include a recording of the talk (with the slides) here, as some of them only play well with the presentation.
As pointed out before: With continuous deployments the time from commit until live for any commit is the time you need until a bug can be fixed – at best! How can you push on this limit?
It usually requires that all people are aware of the code, the business, the infrastructure, the test and the pipeline. If each person has all this knowledge only then can you react quickly, without gathering the “right” people to ship a fix. In other words: you need a high performing and truly cross functional team. Let me emphasise this here once more: from our perspective there is a clear advantage for the quality of the product here: it is about reacting to (breaking) changes (even) faster.
What is the leverage you as quality advocate have? Talking about errors! Your team will do mistakes, this is good. We would be out of jobs if they did not. We are here to find them and to help the team to prevent them going forward. We are the experts on errors – in a way. So make sure that your team does not waste a mistake, but learns from it.
There are some mistakes that your team will be doing:
Sloppy mistakes. This is the most common. There are probably a dozen of them in this very blog post. The human brain can only concentrate for about 10-15 minutes, until it needs a mini break. If we skip that break, we tend to do small mistakes. A typical mitigation strategy for this is pairing of any two people. When one is not fully concentrated for a moment the other one most often is.
Aha-Moment Mistakes. This is when you encounter a (small) new learning by doing a mistake and finding out about it. It may happen if you understood something by reading or even much more so while explaining something. We also mitigate this by pairing – ideally by different skilled developers. If a more senior person explains a lot to a more junior person both have a higher chance to do this error – and learn something from it right away.
Stretch Mistakes. In this scenario, we are quite away that we step out of our comfort zone. But when you need to or want to try something new you have to at least try, even if you already know that it’s more error prone than business as usual. Our way around such requirements are so called “spikes”: small, time boxed stories that make sure we can trial-and-error in a save environment. Thus, the result of the spike can be a small prototype that is thrown away. A new service that got some basic things just working fine. Or a branch that is ready for an all-team-code review.
There are also some high-stakes mistakes. Those are equally risky as the stretch-mistakes but usually there is much less to gain. Those are not worth the effort. Prevent your team from doing these.
There are probably many more ways to categorise errors. There are also many more mitigation strategies, i.e. in our case we have the safety nets of our test-driven development. But differentiating the different types from one another is something that is typically quite easy for analysts who mostly work in the field of quality.
By encouraging your team to do errors (in a safe environment) you automatically get the basics right for a great error culture. And as soon as the impact of an error is reduced people will be much less afraid of doing errors. And if you are less afraid at work you are usually more creative. If a broken built is then nothing bad or painful (while still an urgent matter!) you will have a more relaxed, trustworthy and creative atmosphere. And guess what – in the end a lot less error happen in such an environment. Another boost for Quality!
The last fine tuning is to make sure your team has fun. Fun? Really? Yes.
Just like the point before: in an environment where people like to come to work, where they are happy to communicate and interact, where it’s fun to get work done, people will also be more focused and more passionate about what they do. As a result, they will do less errors with smaller impact. The ultimate boost for Quality!
Sometimes people say that QAs are the headlights of a project. The developers are the train engine and the BAs are the map of the surrounding area – they know where the train in the night shall ride. And it will somehow get moving. But with headlights that inform you early on about potential obstacles you can maybe find another route on your map.
Sometimes people say that QAs and BAs and devs work on the same product – just in different times: The BAs have really good idea what is coming next. They are constantly thinking about the product in one month – rather in the future. Devs are working on the product of the next days, max. next week – thus the immediate future. And you as a QA usually know best how the product behaves today – you know about the present product.
When it behaves unintended you talk to the same stakeholder as a BA who identifies intended behavior changes to be implemented. It is incredible useful to bring these perspectives together!
Usually, QAs know their way around today’s product in details and can point out weaknesses that can be fixed with new requirements as well as shortcuts when introducing new features. That is all incredible valuable for the BA. The outcome for the QA? If you consequently collaborate on analysis and story writing (to some extent, it’s not necessary to type the story together) you can think about weird edge cases and scenarios already long before the story is implemented and discuss the desired behavior with the BA right away. Baking quality in can be so timesaving!
Again, you cannot make up all possibilities in your head, you will still need to conduct some creative, exploratory testing. But by working with BAs at the beginning you reduced this effort by 60, 70 or sometimes 80%. Your work load after development decreases, because quality already increases before the software is developed. Well done!
Once the process helps us to focus on fewer things and enables us to collaborate and test as a team, I as a “tester” will have more time. And I should spend that time where the product is actually baked: I should get more involved with the developers.
Many QAs are a bit hesitant to move somewhere close to the actual application code. But they shouldn’t be: our main contribution is not that we can code. There are other people who are much more specialised in. We call them developer. They are much better coders. And as such, much better suited to write automated tests. What is then left for a QA?
We bring very strong analytical skills. One of them can analyse all tests on all levels from a strategic point of view. What is the share of unit- integration and end-to-end tests? Where do we cover what business logic, where do we have gaps in our test coverage.
Many QAs who specialize in Front-End-Test-Automation write lots and lots of End-To-End / User-Journey tests. These tests are typically the most flaky, hard to maintain and cost intensive tests you can imagine. Hence, we usually advocate to add as few of them as late as possible.
Instead QAs should aim to understand the big picture of the system architecture: what services do we have? How are they connected? How are they split? Does each service have a distinct capability? If so what is it and how does it relate to you business?
Once you figured that out you can assess how to test each of these services or domains in your service independently. If this works have a look at the communication between the domains and ensure this. If all of this is covered you may want to add a slight hint of an end-to-end test on top.
Obviously, the second approach is much more difficult – but that is what you are for and what you contribute in the team? You are the one to keep the big picture and consult your team members where to add what test in what way. What is the key assertion in a given test case? What is the right level for it? With a strong coder and your analytical abilities in testing you can ensure that things are working while they are implemented. That does not only improve quality early on, it also significantly decreases the time you need to spend testing (and reporting defects) afterwards.
Still, some defects will be released. No matter how much money you invest, there is no way to ensure bug free software at all (even if you are the NASA and spend more than 320 Million $ on a project). The second thing you can figure out with your dev team is how to identify and catch them. With the lean process (see above) that you established you can be sure to ship a potential fix very fast. The way to detect them is a helpful monitoring setup. This involves to visualise the amount of errors, as much as server/database request (and possibly a deviation to 24hours before). If you go to real professional levels you want to think about anomaly detection so that your system can notify you on its own once something is off.
The last open question is how to react to breaking changes that you may have accidentally released to production. We are running a two fold strategy here. We try to minimize our time-to-market for bug fixes and mitigate risks of other larger issues with feature toggles. Let me go into some details:
Usually, in a classic release management process you have a plan how to do your releases and if there are major problems afterwards you execute a predefined rollback. If this is – for one reason or the other – not possible there is usually a hot-fix-release-branch-deploy process defined by someone somewhere. Here is the problem: If you need a hot fix then your team is probably already on fire. In this very moment you need to follow a process that most people are unfamiliar with which usually bypasses a lot of security measures you have previously established in your release cycle. That is quite a bad habit concerning the production problems one has in this very moment.
Our goal is to make the fasted possible release process our standard process. Thus we drive our teams to deploy every single commit to production. That also means that every commit has to be shippable – with enough tests to make sure our product is still working. This is baking quality in already!
Still, things will break. But with a quick way to react and deploy a fix we do not even need rollback strategies any more. But being able to deploy a hot fix very quickly implies that you can also quickly analyse the root cause. But that is not always true. If you know what commit was faulty you can of course deploy a revert. But sometimes a new feature in it’s complexity across stories and commits is just not working right. Thus, we work a lot with feature toggles, making sure that all new functionalities are toggled off in production. We also make sure that we can toggle those features independent of deployments. Thus, we decouple the technical risk of a deploy with the business risk of a feature toggle. This reduces our needs for reverts by about 90% and most deployments run automatically and without problems. Every few days a new feature is toggled on. People are then usually more aware of the situation and monitor the apps behaviour more closely (at least they should). Problems can then be identified and either quickly be fixed with a tiny commit or, if you encounter major blockers, you toggle the feature off again.
In conclusion, we have way fewer way less troublesome releases, while we can activate new features in a very controlled way. Thus, we do not only deliver value fast, we also achieve higher quality at the same time. However, a very experienced and disciplined team is needed to work on such a high level of quality commit by commit.