Dimensions of Quality

One year ago I introduced the “muffin concept” in a small blog post, “how we do Quality at ThoughtWorks”. Ever since then I have been to various Meetups and Conferences to discuss the idea and all the concepts behind it. After another year and dozens of discussion, there are more thoughts around how to bake quality in. It covers quite different aspects, thus I decided to split this post into a miniseries of 5 posts.

Enjoy the read and please give me feedback: tell me what you think!


People say that quality is like the chocolate on a muffin. Is it? Let’s say the product we build was indeed a muffin. The business analyst brought the recipes, and the developers baked it. Afterwards, the testers put chocolate on top.

If I imagine the muffin, it’s still a bit dull. The muffin is only perceived to be of really high quality if there are some chocolate chunks on top: like testing software.

10797887872_IMG_1353.jpg

The only problem is that – just like testing in the software delivery process – the chocolate is only “applied” after baking the major part of the product. It looks good and smells good. But does a muffin with a very few chocolate chunks only on top really taste better?

10797691648_IMG_1357.jpg

No, because there is no chocolate inside the muffin, just like testing does not improve the quality of a software:

When you test software you basically analyse a (hopefully) isolated system in a controlled environment. And no matter what you do, that system does not change. You may find behaviours in the system that are unexpected (which are the bugs / defects we are trying to find). But they were in the system already (before you started your test case) and they will be in there afterwards. No system under test does ever change its state (exception: quantum mechanics). Thus, the system does not evolve or improve (in quality) while you test it. Yet, another cycle of development in necessary to actually improve quality.

But that is quite sad. I am a quality analyst. An enthusiast. Caring about the quality of my product is my job description. Usually I am the team member most passionate about it. How can I be the only one who is not able to actually improve the quality?

With this mini-series of blog posts we want to investigate how we can be involved to improve quality in software early on – how to bake chocolate into the muffin!

10797458624_IMG_1362.jpg

Usually, a typical day in the life of a tester may look like this, where you pick a new build, deploy it to a test server, run smoke test and your extensive test suite. Possibly its (partly) automated. When no blockers are found one would monitor the production environment, ensure everything is healthy and announce & ship the build to production. Maybe you have a test suite running in production to ensure your delivery there:

Screen Shot 2019-02-15 at 20.21.34.png

However, that is only the last bit of a longer process. Normal, agile software delivery teams have a process that looks similar to the following one:

Screen Shot 2019-02-07 at 10.01.57.png

Each column is often reflected in tools like Mingle, Trello or Jira: “in analysis” is the step where Product Managers or Business Analysts work out the requirements for the projects. Once they are done they move into the next column. That could be a planning meeting where a sprint backlog is filled. We call the backlog “Ready for Dev” column. At some point devs pick up a story, works on it, finish it and put it into “Ready for QA” until a QA picks it up, works on it and ships it. Then a story is finally done.

If a defect is found in the QA work in the best case the ticket needs to go back to the devs or all the way back to in analysis. With these long feedback loops it can take a while until all kinks are out of a new piece of functionality.

Here we want to tighten the feedback loop and get involved earlier. Here is exactly the point where we can improve quality early on and where we can measure it. We identified four different fields where we are usually involved and where we have an actual impact on the quality of our product. You can read about each one of them in an individual (small) post:

  1. How changes to your process increase the quality of your product. (2 min)
  2. How to get involved earlier in the software development life cycle: be involved! (3 min)
  3. Joint forces of the analysts: improving the quality of software even before its built. (1min)
  4. How establishing a trustful error culture in your team gives you the final boost in quality. (2 min)

Those four points is our recipe to bake quality in: You add some chocolate early on by process improvements. Then we add some technical strawberries along with the right amount of cream in the business space. We finish it off with some colourful sugar toppings in the team culture and voila… we really bake quality in!

10798083424_IMG_1367.jpg

With this holistic approach, we also step beyond being pure “Quality Analysts”. We still analyze the quality of software. But we also specialize on so many more things that lead to a better product. Thus, we truly are Product Quality Specialists.

Pure Performance

Episode 21: How ThoughtWorks helped Otto.de transform into a real DevOps Culture

Finn Lorbeer (@finnlorbeer) is a quality enthusiast working for Thoughtworks Germany. I met Finn earlier this year at the German Testing Days where he presented the transformation story at Otto.de. He helped transform one of their 14 “line of business” teams by changing the way QA was seen by the organization. Instead of a WALL between Dev and Ops the teams started to work as a real DevOps team. Further architectural and organizational changes ultimately allowed them to increase deployment speed from 2-3 per week to up to 200 per week for the best performing teams.


Episode 22: Latest trends in Software Feature Development: A/B Tests, Canary Releases, Feedback Loops

In Part II with Finn Lorbeer (@finnlorbeer) from Thoughtworks we discuss some of the new approaches when implementing new software features. How can we build the right thing the right way for our end users?
Feature development should start with UX wireframes to get feedback from end users before writing a single line of code. Feature teams then need to define and implement feedback loops to understand how features operate and are used in production. We also discuss the power of A/B testing and canary releases as it allows teams to “experiment” on new ideas and thanks to close feedback loops will quickly learn on how end users are accepting it.

Are we only Test Manager?

This is a translation of the original blog post that I wrote with Diana Kruse, Natalie Volk and Torsten Mangner. While writing this blog once more just in another language, I took the liberty of adding some personal notes here and there in italic font.


In every development team at otto.de there is at least one tester / test manager / QA… or however you would call the person, who is shaping the mindset for quality.

Until recently, “test manager” was the dominating description at Otto – a very rigid and bureaucratic term. Although the intention was good to emphasize that we do not only execute tests, but we also manage them! Meanwhile, even managing tests is only a very small part of the value we deliver.

In a bigger workshop Finn and Natalie, two of our “test managers” picked up on this contradiction and worked things out. We were sure that it would not be sufficient to write “agile” in front of test manager. Hence, we developed a new understanding of our role that looks and feels like this:

coaching

We are the teams’ Quality Coaches

We support the teams to understand “quality” as a collective responsibility. We achieve this by working intensively with all roles of the team rather than talk about generic concepts. We establish knowledge and practical approaches regarding the topic of quality.

lifecycle

We See Through the Entire Story Live Cycle

Together with the team we take care that our high standards of quality are regarded long before the development of our product starts: We suggest alternative solutions during the conception of the story and indicate potential risks. We avoid edge-case problems later on by thinking about them while writing the story. We pair with developers, so that we know that the right things are tested in the right place. Thus, we have more time to talks to our stakeholders and users during the review. With the right monitoring and alerting we are able to observe our software in production.

continuous

We Drive Continuous Delivery / Continuous Deployment

One central goal is to deploy software to the production environment as risk free as possible. Therefore we try to change as little of our codebase as possible and roll out every single commit automatically. We are using feature toggles, to switch on new functionality independent of these deployments. This has two major advantages: we can roll out our software to customers (almost) at the speed of light and get fast feedback for new developed features.

pyramid.jpg

We are Balancing the Test Methods of the Testing Pyramid

We know how to test what on which level of the testing pyramid. We use this concept to create a lot of fast unit tests, a moderate number of integration tests and as few end-to-end tests as possible. This does not only speed up our pipelines but it makes our tests more stable, more reliable and easier to maintain.

Additionally, in our tool box we can find all kinds of tests (acceptance tests, feature tests, exploratory tests), methods (eg. test first, BDD) and frameworks (like Selenium or RSpec). We know how to use those tests, methods and frameworks on all levels of the testing pyramid.

(as a side note: this indeed implies to run eg. Selenium tests on a unit test level if applicable)

agile.jpg

We Help the Team to Choose the Right Methods for a High Quality Product

Being specialists, we know all (dis) advantages of different methods and can help the team to benefit from the advantages. We learned that pairing will enable knowledge transfer, communication, faster delivery and higher quality.  Besides pairing, test driven development is one of the key factors to create a high quality product from the beginning.

Flexible software can only emerge from flexible structures. This is why we are not dogmatic about processes and methods but decide together with the team what mix of processes we really need to get our job done.

pairing.jpg

We are Active in Pairing

We do not only encourage our developers to pair, we also have fun pairing ourselves. In tThis way we can point to problems even while the code is being written. To avoid finding all edge cases only during development we also like to pair and communicate with Business Analysts, UX-Designers and Product Owners. Together with the operations people we will monitor our software in production environment.

The pairing with different people and different roles allows us to further develop our technical as well as domain knowledge.

challenging

We Represent Different Perspectives

By taking on different positions we prevent unidirectional discussions. We try to avoid typical biases by challenging assumptions about processes, methods, features and architectures. This enables us – from time to time – to show a different solution or an alternative way to solve a given problem. It helps us to reduce systematic errors, money pits and to objectively evaluate risks while developing our software.

communication.jpg

We are Communication Acrobats

We are the information hub for all kinds of things inside and outside the teams. We make special, constructive use of the grapevine, a phenomenon that practically occurs in every company with more that 7 people.

We are enablers for communication. This may be the communication of a pair of developers, between many or all team members or between teams throughout the organization.  By facilitating this coordination we can reduce obscurities about features or integrations of systems and hence get our software into a deliverable state faster.


After developing this role, we engaged more and more of our “agile test managers” with this concept. They were so enthusiastic about it, that they wanted to apply for the job once more right away. The only thing missing was a good name: As in every cross functional team we have different specialists and one of those people is the driving force for high quality we found the perfect name: the Quality Specialist.

specialist.jpg

(Side note: the German term for “quality assurance” (QA) is “QualitätsSicherung” (QS). Using the same abbreviation made it even easier to adopt the new term.)

Quality Specialist is a very well fitting name for this stretched role. Although we are broad generalists, our core value lies in shaping a quality mindset and a culture of quality in a team.

Those were the first steps on a very exciting journey. The next thing to do is talking with other roles in order to find out how this new comprehension of the role changes our daily work. Furthermore, almost no one fulfills this role description today. Thus, we need to grow, level up and reflect on our development. The most fun part is that we can learn a million things in different domains from different people.

Process Automation and Continuous Delivery at OTTO.de

This post is all about deploying every single commit to the production environment.

All manual steps in a release cycle can be automated – even if you want to check your designs. This post explains step-by-step how to automate each single one and what to consider when releasing a couple of times per day. You can find my article in the Otto dev blog. Or you can read it below.


Whenever we present how we release features and deploy our code in one of OTTOs core functional teams, we are met with a certain set of questions, e.g..: “Why do you want to deploy more than once a week?”, “If you automate release and test management, what are the release and test managers doing?”, “How can we prevent major bugs to enter the shop?”, “Where is the final control instance to decide if something goes live?”, or the typical question “Who is responsible if something breaks?” or simply “Why the heck would someone want to do this?”

Let us answer those questions. Let us guide you through our way of working. Let us show you what processes we have (and which ones we do not have) and give you a hint on how to increase productivity and quality at the same time (without firing the test manager). All you have to do is to sit back, relax and let go of your concerns to lose control. Don’t worry, you won’t lose it.

If you have a look at a general release process for a deployment, it will look similar to this scheme:

process.png

The image illustrates a release life cycle: Occasionally, a new release candidate is built. If the code compiles and first tests are successful, we speak of it as a “green build”. The code of this release candidate is deployed to a test server and after a smoke test a full test suite can run. Depending on the number of test servers and your (integration) test setup, you may want to repeat steps 2-4 for more than one server. If all tests pass for a specific build version, and the live platform is stable (→monitoring step) you can announce the live deployment and ship the new build. Probably, some tests will ensure that the live deployment was successful.

Not a single one of those steps requires human interaction. The entire process can be automated. One of the many advantages is that you simply do not have to spend time on this process. The time that is now free (most of the times this will apply for the Quality Analyst) can be spent on other tasks. In our case, we could almost double the time the QA spends with the developer and business designer.

Before that, the Quality Analysts were only able to evaluate the quality in a given piece of code after the implementation. If this code did not meet the expectations for “quality”, they would need to convince stakeholders and developers that the quality was not sufficient and the developers would start the story once again. This was a very time intensive and thus expensive process.

Now, the Quality Analysts have more time to review the business requirements, think about edge cases and report them to the developer before implementation. Furthermore, the QAs are pairing with the developers and can make sure, that “quality” is engraved in the product during implementation.

1newbuildThe build that triggers the entire process, has a lot of tests itself already. We keep tight track of our test pyramid in this first step of our test automation. At this point we have a huge amount of unit and a fair portion of acceptance tests. They not only test our Java code base. We apply the same principles to our JavaScript: to reduce the number of frontend (Selenium) tests possibly needed at the end of our build pipeline, we prefer fast feedback of a lot of JavaScript tests in the initial build step, using Jasmine.

If all those tests pass, we consider a build “green”. Our build runs for every single git commit.

2deploytestserverThe next step is to deploy a green build to a test server and continue testing the new software. Talking about deployments, one often forgets that it is code executing all the steps necessary to provision a server with new software. Even this code can fail and thus, we recommend a small smoke test to be executed right after the deployment. This can be as easy as checking the version number on a status page or the git-hash in the meta information on the front page. You will save a lot of time to not execute tests on old code.

3testsuiteHaving the software successfully deployed to the test server, we then continue testing. After covering the base of the test pyramid in the build step, we now take care of the top of it. Here we will execute more acceptance and functional tests, some of them in Selenium. Furthermore, we can run first integration tests with other teams, other services and maybe third party software. For integration testing, we do not rely on Selenium alone. We have a wide set of so called CDC tests (consumer driven contract tests) with other teams. If other teams have specific requirements e.g. for our APIs (= they consume our API) they would write a test that runs within our build pipeline, e.g. a pact-test. In this way we can make sure that all requirements other teams have towards us are fulfilled for every single commit.

Maybe you do not have just one test server, but two (e.g. for different kinds or levels of tests). Then you would execute the deployment-and-test steps two or more times. In any case, the number of tests should decrease with every step, otherwise there is something fundamentally wrong with your test pyramid.

One big concern I am met with is that no one looks at the product before it goes live. “Automation is nice, yes, but nothing beats the pattern sensing of a human brain” is what people mention in response to all the automation. The statement is true, no doubt. But the point is, that the value of a human brain is not necessarily needed here and can be better applied earlier in the process of the software development.

4toggleTo explain this, let me tell you about one fundamental requirement to release automatically: that is the consequent use of feature toggles. Using toggles means that new features are not released by a deploy but by a flip of a button. This has two major advantages: First, the feature will have a shorter time to market. Just a few minutes after the last commit is pushed the entire feature code is deployed. One does not have to wait until the end of e.g. a sprint cycle. Second, despite all human and automated tests sometimes something just goes wrong. (And it does not even have to be a technical problem). Thus, if we release a feature with a toggle, we can also toggle it off in just one second. We do not have to rollback our deploys and hence we do not affect other features that were in the same deployment. The process automation made our deploys an absolute “non-event”, while the side effects of the quick deployments made feature releases a lot easier.

With the fact that (almost) all, especially the frontend changes, are toggled, no deployment should ever change the face of our product. And this is difficult to test for humans. Human brains are activated by mismatching patterns. Different paddings for otherwise equal elements or a picture that is out of its box are very easy to spot for us. But if one link in a list of maybe 20 links is missing on a page, almost no one will notice. If the link would turn green, or would have a different font than all other text, we would discover it right away. If it’s simply gone, we barely notice it. Hence, for our kind of deployment we need either a human with an identic and photographic memory – or a machine. We decided to go with the latter. Inspired by other tools, such as “wraith” by BBC we built a small ruby gem (lineup) that uses selenium to take screenshots of defined pages of our product before and after the deploy. It will realize as soon as just one pixel changes and fail the test step. This lets us detect, whether or not our feature toggles were implemented correctly and discover undesired front end changes before they go live. Here is an example:

5imagecomparison

On the left side is an entry page before the deployment of the new code and on the right side after the deployment. Unless starting complicated measurements, no human would notice the increase in the top-margin of the headline of the smartphones and the gaming console. The image comparison (middle) between the base (left) and new (right) image reveals the difference right away by marking all pixels that have changed between the left and the right image.

6monitoringIf the build passes this last test, it is good to be deployed to the live platform. To ensure, that our platform is always stable enough for a deployment we constantly monitor the servers and databases. This is (and needs to be) a shared team responsibility – just as any other step of the entire process. We achieved this by simply putting up a couple of monitors that are in the line of sight of every team member. Every day, we discuss the error rates and possible performance problems in front of the big screens. This general discussion and the come-togethers around the common screens enhanced our culture of constant monitoring. With more and more services being built we are now investigating ways to focus on the most important metrics. As the issues on our live servers are different every day, we cannot determine which metric “is key” for what service. Hence, we have to automatically analyse all our metrics and present only the most relevant ones to the team. The most relevant ones are usually the weirdest. Thus, our investigations currently go into the direction of anomaly detection.

The growing number of services (as a result of the change towards Microservices) helps us 7deploymentto keep the impact on any other system but the deployed one as small as possible. Having only loosely coupled services, removes the need to announce every deployment to all other (~dozen) teams. If other teams were affected by our changes and/or deployments we would have a fundamental flaw in our architecture (or in our CDC tests). To develop and enforce hard- or software locks at the end of the release process in order to limit the deployments is not a solution for this rudimentary architecture challenge. Hence, there is no need to announce deployments to the entire IT department. It is probably a good thing though to let the ops people know about our deployments in general. And one should also have a single gate that can be closed for all deployments if something is preventing deployments in general at a particular moment. Finally, the last thing we need is a deployment reporting for documentation purposes. This usually only includes what git hash/build version went live at what time including a changelog.

8releaseAs described above: the deployment to the live servers became an absolute non-event and thus there is nothing noteworthy for this blog entry for this step. After the deployment is finished, we run a small test suite to make sure that our code was successfully released and our core functionality is still in place.

And then we are already live, multiple times a day. And while we increase our shipping speed, we have even more time to ensure that our product is built in a good quality. To execute all the steps, we have created a wide range of tools. For most steps, the available open source tools did not fit with one primary need: The entire process is automated, thus coded. This code, as any other, needs to be tested. Hence, we think of our release pipeline as testable code. This is reflected in the build tool „LambdaCD„. Additionally, we built the described image comparison tool „Lineup„. Another team at OTTO developed a monitoring solution („Oscillator„) and even for tracking deployments, feature toggling and other events, we built our own set of tools. To be open sourced soon.

For further reading check out:

Have a look at the features of our open source projects. And – please! – give us feedback about your opinion and experiences.

Your FT3 Team