How to get involved earlier in the software development life cycle: be involved!

December 23, 2017June 5, 2019 / Finn Lorbeer / Leave a comment

Once the process helps us to focus on fewer things and enables us to collaborate and test as a team, I as a “tester” will have more time. And I should spend that time where the product is actually baked: I should get more involved with the developers.

Many QAs are a bit hesitant to move somewhere close to the actual application code. But they shouldn’t be: our main contribution is not that we can code. There are other people who are much more specialised in. We call them developer. They are much better coders. And as such, much better suited to write automated tests. What is then left for a QA?

We bring very strong analytical skills. One of them can analyse all tests on all levels from a strategic point of view. What is the share of unit- integration and end-to-end tests? Where do we cover what business logic, where do we have gaps in our test coverage.

Many QAs who specialize in Front-End-Test-Automation write lots and lots of End-To-End / User-Journey tests. These tests are typically the most flaky, hard to maintain and cost intensive tests you can imagine. Hence, we usually advocate to add as few of them as late as possible.

Screen Shot 2019-02-15 at 20.09.11.png — Picture: 100 End-2-End test to rule them all

Instead QAs should aim to understand the big picture of the system architecture: what services do we have? How are they connected? How are they split? Does each service have a distinct capability? If so what is it and how does it relate to you business?

Once you figured that out you can assess how to test each of these services or domains in your service independently. If this works have a look at the communication between the domains and ensure this. If all of this is covered you may want to add a slight hint of an end-to-end test on top.

Obviously, the second approach is much more difficult – but that is what you are for and what you contribute in the team? You are the one to keep the big picture and consult your team members where to add what test in what way. What is the key assertion in a given test case? What is the right level for it? With a strong coder and your analytical abilities in testing you can ensure that things are working while they are implemented. That does not only improve quality early on, it also significantly decreases the time you need to spend testing (and reporting defects) afterwards.

Still, some defects will be released. No matter how much money you invest, there is no way to ensure bug free software at all (even if you are the NASA and spend more than 320 Million $ on a project). The second thing you can figure out with your dev team is how to identify and catch them. With the lean process (see above) that you established you can be sure to ship a potential fix very fast. The way to detect them is a helpful monitoring setup. This involves to visualise the amount of errors, as much as server/database request (and possibly a deviation to 24hours before). If you go to real professional levels you want to think about anomaly detection so that your system can notify you on its own once something is off.

The last open question is how to react to breaking changes that you may have accidentally released to production. We are running a two fold strategy here. We try to minimize our time-to-market for bug fixes and mitigate risks of other larger issues with feature toggles. Let me go into some details:

Usually, in a classic release management process you have a plan how to do your releases and if there are major problems afterwards you execute a predefined rollback. If this is – for one reason or the other – not possible there is usually a hot-fix-release-branch-deploy process defined by someone somewhere. Here is the problem: If you need a hot fix then your team is probably already on fire. In this very moment you need to follow a process that most people are unfamiliar with which usually bypasses a lot of security measures you have previously established in your release cycle. That is quite a bad habit concerning the production problems one has in this very moment.

Our goal is to make the fasted possible release process our standard process. Thus we drive our teams to deploy every single commit to production. That also means that every commit has to be shippable – with enough tests to make sure our product is still working. This is baking quality in already!

Screen Shot 2019-02-15 at 20.15.03.png — wikimedia.org

Still, things will break. But with a quick way to react and deploy a fix we do not even need rollback strategies any more. But being able to deploy a hot fix very quickly implies that you can also quickly analyse the root cause. But that is not always true. If you know what commit was faulty you can of course deploy a revert. But sometimes a new feature in it’s complexity across stories and commits is just not working right. Thus, we work a lot with feature toggles, making sure that all new functionalities are toggled off in production. We also make sure that we can toggle those features independent of deployments. Thus, we decouple the technical risk of a deploy with the business risk of a feature toggle. This reduces our needs for reverts by about 90% and most deployments run automatically and without problems. Every few days a new feature is toggled on. People are then usually more aware of the situation and monitor the apps behaviour more closely (at least they should). Problems can then be identified and either quickly be fixed with a tiny commit or, if you encounter major blockers, you toggle the feature off again.

In conclusion, we have way fewer way less troublesome releases, while we can activate new features in a very controlled way. Thus, we do not only deliver value fast, we also achieve higher quality at the same time. However, a very experienced and disciplined team is needed to work on such a high level of quality commit by commit.

How we do “Quality” at ThoughtWorks Germany

December 23, 2016June 5, 2019 / Finn Lorbeer / Leave a comment

Thanks to the other QAs in ThoughtWorks Germany for contributing thoughts to this topic over the past months: Sarah, @DizMario, @Nadineheidrich and @bratlingQA

What is testing?

Testing is a method to analyze the quality of a given software. It is a method that is applied after the software is developed. If the software has an insufficient level of quality, yet another cycle of development & testing is needed to increase and measure the quality again.

Testing is bug detection.

Metaphor: Testing is like putting chocolade on a muffin, but exclusively after baking it.

What is QA?

There are other methods that can improve / increase the quality of a software while it is developed. Those methods decrease the amount of cycles of development & testing that are required to reach a certain level of quality.

QA is bug prevention.

10797887872_IMG_1353 — We want to consider chocolade while baking and only put some additional on top.

How we are testing

In addition to the tools and processes that allow us to build high quality software from the first line of code (see below), we have the highest standards for testing software. As mentioned before, we are well aware that testing can only analyze a software’s current state and show the presence of issues/defects. Another cycle of development is needed to actually improve the quality. That leads to the known problem: whenever the testing is “successful”, the cycle time of stories increases and the delivery of a new feature needs to be postponed.

To minimize this delay, we apply the most efficient testing approaches to provide fast feedback for developers. This reduces the overall time to market of new features. With these methods we are able to reduce the cycle times significantly while improving the overall quality of the software in different projects:

Our tests are fast and effective. We run Unit-, Integration and End-to-End tests in a well shaped testing pyramid. This allows us to quickly check if our application behaves as expected. Writing the right tests on the right level reduces the time we need for regression tests from days to minutes. This includes – amongst other things – a 100% automation of regression tests.
Of all the tests in the pyramid, we take special care of the integration tests (of different services) to assure the architecture’s resilience. One of our favorites are the consumer driven contract tests. They allow different teams to work independent with loosely coupled services while ensuring that the entire system behaves well altogether.
For us, testing is an integrated activity within the software development team and not an independent, separate discipline. There are two ways to get a story tested:
1. When a story needs QA attention, you move the ticket into a “QA” or “ready for QA” column. The person who is in the role of the Quality Analyst then picks up the story as soon as possible.
  (this is push & role-focused ⇒ that is the Scrum-way with experts in the team)
2. When a story needs QA attention, you look out for the capability in the team. Any person with some capabilities in testing (often but not always the QA) rotates into the story.
  (this is pull & capability-focused ⇒ that is the Kanban-way with cross-functional people in the team)
  Guess what. I prefer the 2nd approach. The 2nd one is real team-play. And it decreases the cycle time of a single story and thus increases your velocity! Devs will learn (more) about testing and QAs can pair on the programming part, eg. to sort unit- and integration tests into the pyramid ⇒ baking the chocolade inside.
For exploratory testing we do not always apply the same standard methods but acknowledge the individual context we are in. Only then can we make use of the various advantages of different test methods. We find all kinds of tools in our box: Behaviour Driven Testing, Acceptance Testing, End-To-End Testing, Scenario based Testing, User Journey Testing, Integration Testing, System Testing, Risk based Testing, Penetration (Security) Testing, UX testing, performance Testing, Guerrilla Testing…

How we do QA

As mentioned before, we really need to learn all about testing. And its a mastery to study. However, its only a (small) part of our job and every day live. Besides testing we look into other things, as we know that good software is only the first step towards a high quality product.

We create a culture in the team, where the aspect of quality is an important asset for each team member. In this context we can address the team’s current needs with a wide set of processes, frameworks and tools that we as ThoughtWorkers already use or create if they do not yet exist (e.g. Selenium).

We acknowledge that we cannot build defect free software. Hence, we focus on defect prevention and establishing an overall quality mindset. We assure a high level of quality in software through tools and processes that allow us to prevent defects and find errors fast:

Well designed services to ensure a resilient architecture. There are so many things to work on if you want to improve the resilience. You can have the best software without bugs. It wont help you if your servers are down for the bigger part of the day. From a QA point of view, we are interested in circuit breaker (self-healing systems), feature toggles, well designed APIs and a kick-ass monitoring:
Monitoring! This is so important. And so many people think that “only” Ops should care. What a misunderstanding. Constant monitoring of all services and environments is a shared discipline to be able to react quickly on any arising issue. No matter how good you test (see above), some defects will slip through to production. The best way to reduce their impact is a combination of a good monitoring and quick deployment. If we are able to release a fix fast (= best case: 20 min after a bug is found), we can reduce the impact significantly. This is what monitoring is for. Learn more in this podcast.

We have a strong focus on Continuous Integration and Continuous Deployments best practices, such as fully automated regression testing. You can read all about it in the other post as well as talk in Ljubljana.
Test Driven Development for high test coverage and fast feedback during development is also an important thing to notice. While most developers know that a tests are written first in TDD, not all know about the testing pyramid, much less of its benefits. Hence, a pairing of QA and dev while practicing TDD can be of high value. This is how you bake quality in!

I just mentioned the consistent pair programming. Pairing allows best designs and fewer defects from the beginning. Make sure to rotate frequently and across the roles. Pairing is a general activity for most tasks in a team. Pair-programming is just one of them.
We love Feature Toggles™. They give us maximal control over the features in production and canary releases. A very easy method I usually use is to give chocolate while the standup to the pair that implemented a toggle the previous day. This is a fun way to talk about it, remember it and give a sweet incentive to build it in when it was forgotten. If you find the time to use them it will make your lives much easier. You will not need rollback strategies any more and it is a very, very good safety net. Quality “assurance” at its best!

The mix for the win.

Of course we combine the two aspects of QA and testing. And this is the biggest challenge for us. Where to focus at what point of time. Where do we need more attention and what part of the application / system / team is running smooth? Ultimately, we try to pick the right tools and create the right mindset to build a high quality product.

Pure Performance

November 21, 2016 / Finn Lorbeer / Leave a comment

Episode 21: How ThoughtWorks helped Otto.de transform into a real DevOps Culture

Finn Lorbeer (@finnlorbeer) is a quality enthusiast working for Thoughtworks Germany. I met Finn earlier this year at the German Testing Days where he presented the transformation story at Otto.de. He helped transform one of their 14 “line of business” teams by changing the way QA was seen by the organization. Instead of a WALL between Dev and Ops the teams started to work as a real DevOps team. Further architectural and organizational changes ultimately allowed them to increase deployment speed from 2-3 per week to up to 200 per week for the best performing teams.

Episode 22: Latest trends in Software Feature Development: A/B Tests, Canary Releases, Feedback Loops

In Part II with Finn Lorbeer (@finnlorbeer) from Thoughtworks we discuss some of the new approaches when implementing new software features. How can we build the right thing the right way for our end users?
Feature development should start with UX wireframes to get feedback from end users before writing a single line of code. Feature teams then need to define and implement feedback loops to understand how features operate and are used in production. We also discuss the power of A/B testing and canary releases as it allows teams to “experiment” on new ideas and thanks to close feedback loops will quickly learn on how end users are accepting it.

Meetup in Berlin: “ThoughtWork Presents”

November 1, 2016February 15, 2019 / Finn Lorbeer / Leave a comment

On November 1st we had a very nice meetup evening in the ThoughtWorks Werkstatt in Berlin. I was able to present some ongoing thoughts about the field and work of a QA and how to level up.

Here are the slides (PDF, opens in new tab): building a high quality product-handout

Quest For Quality in Ljubljana

October 15, 2016February 15, 2019 / Finn Lorbeer / Leave a comment

The Quest for Quality is a conference in Ljubljana for first time ever. It was very well organized by Nikola and Evelina from Comtrade. I was very happy to talk for the closing note on the first day.

Find the slides of the talk here (PDF, opens in new tab): level-up

Process Automation and Continuous Delivery at OTTO.de

November 24, 2015February 15, 2019 / Finn Lorbeer / Leave a comment

This post is all about deploying every single commit to the production environment.

All manual steps in a release cycle can be automated – even if you want to check your designs. This post explains step-by-step how to automate each single one and what to consider when releasing a couple of times per day. You can find my article in the Otto dev blog. Or you can read it below.

Whenever we present how we release features and deploy our code in one of OTTOs core functional teams, we are met with a certain set of questions, e.g..: “Why do you want to deploy more than once a week?”, “If you automate release and test management, what are the release and test managers doing?”, “How can we prevent major bugs to enter the shop?”, “Where is the final control instance to decide if something goes live?”, or the typical question “Who is responsible if something breaks?” or simply “Why the heck would someone want to do this?”

Let us answer those questions. Let us guide you through our way of working. Let us show you what processes we have (and which ones we do not have) and give you a hint on how to increase productivity and quality at the same time (without firing the test manager). All you have to do is to sit back, relax and let go of your concerns to lose control. Don’t worry, you won’t lose it.

If you have a look at a general release process for a deployment, it will look similar to this scheme:

The image illustrates a release life cycle: Occasionally, a new release candidate is built. If the code compiles and first tests are successful, we speak of it as a “green build”. The code of this release candidate is deployed to a test server and after a smoke test a full test suite can run. Depending on the number of test servers and your (integration) test setup, you may want to repeat steps 2-4 for more than one server. If all tests pass for a specific build version, and the live platform is stable (→monitoring step) you can announce the live deployment and ship the new build. Probably, some tests will ensure that the live deployment was successful.

Not a single one of those steps requires human interaction. The entire process can be automated. One of the many advantages is that you simply do not have to spend time on this process. The time that is now free (most of the times this will apply for the Quality Analyst) can be spent on other tasks. In our case, we could almost double the time the QA spends with the developer and business designer.

Before that, the Quality Analysts were only able to evaluate the quality in a given piece of code after the implementation. If this code did not meet the expectations for “quality”, they would need to convince stakeholders and developers that the quality was not sufficient and the developers would start the story once again. This was a very time intensive and thus expensive process.

Now, the Quality Analysts have more time to review the business requirements, think about edge cases and report them to the developer before implementation. Furthermore, the QAs are pairing with the developers and can make sure, that “quality” is engraved in the product during implementation.

1newbuild The build that triggers the entire process, has a lot of tests itself already. We keep tight track of our test pyramid in this first step of our test automation. At this point we have a huge amount of unit and a fair portion of acceptance tests. They not only test our Java code base. We apply the same principles to our JavaScript: to reduce the number of frontend (Selenium) tests possibly needed at the end of our build pipeline, we prefer fast feedback of a lot of JavaScript tests in the initial build step, using Jasmine.

If all those tests pass, we consider a build “green”. Our build runs for every single git commit.

2deploytestserver The next step is to deploy a green build to a test server and continue testing the new software. Talking about deployments, one often forgets that it is code executing all the steps necessary to provision a server with new software. Even this code can fail and thus, we recommend a small smoke test to be executed right after the deployment. This can be as easy as checking the version number on a status page or the git-hash in the meta information on the front page. You will save a lot of time to not execute tests on old code.

3testsuite Having the software successfully deployed to the test server, we then continue testing. After covering the base of the test pyramid in the build step, we now take care of the top of it. Here we will execute more acceptance and functional tests, some of them in Selenium. Furthermore, we can run first integration tests with other teams, other services and maybe third party software. For integration testing, we do not rely on Selenium alone. We have a wide set of so called CDC tests (consumer driven contract tests) with other teams. If other teams have specific requirements e.g. for our APIs (= they consume our API) they would write a test that runs within our build pipeline, e.g. a pact-test. In this way we can make sure that all requirements other teams have towards us are fulfilled for every single commit.

Maybe you do not have just one test server, but two (e.g. for different kinds or levels of tests). Then you would execute the deployment-and-test steps two or more times. In any case, the number of tests should decrease with every step, otherwise there is something fundamentally wrong with your test pyramid.

One big concern I am met with is that no one looks at the product before it goes live. “Automation is nice, yes, but nothing beats the pattern sensing of a human brain” is what people mention in response to all the automation. The statement is true, no doubt. But the point is, that the value of a human brain is not necessarily needed here and can be better applied earlier in the process of the software development.

4toggle To explain this, let me tell you about one fundamental requirement to release automatically: that is the consequent use of feature toggles. Using toggles means that new features are not released by a deploy but by a flip of a button. This has two major advantages: First, the feature will have a shorter time to market. Just a few minutes after the last commit is pushed the entire feature code is deployed. One does not have to wait until the end of e.g. a sprint cycle. Second, despite all human and automated tests sometimes something just goes wrong. (And it does not even have to be a technical problem). Thus, if we release a feature with a toggle, we can also toggle it off in just one second. We do not have to rollback our deploys and hence we do not affect other features that were in the same deployment. The process automation made our deploys an absolute “non-event”, while the side effects of the quick deployments made feature releases a lot easier.

With the fact that (almost) all, especially the frontend changes, are toggled, no deployment should ever change the face of our product. And this is difficult to test for humans. Human brains are activated by mismatching patterns. Different paddings for otherwise equal elements or a picture that is out of its box are very easy to spot for us. But if one link in a list of maybe 20 links is missing on a page, almost no one will notice. If the link would turn green, or would have a different font than all other text, we would discover it right away. If it’s simply gone, we barely notice it. Hence, for our kind of deployment we need either a human with an identic and photographic memory – or a machine. We decided to go with the latter. Inspired by other tools, such as “wraith” by BBC we built a small ruby gem (lineup) that uses selenium to take screenshots of defined pages of our product before and after the deploy. It will realize as soon as just one pixel changes and fail the test step. This lets us detect, whether or not our feature toggles were implemented correctly and discover undesired front end changes before they go live. Here is an example:

5imagecomparison

On the left side is an entry page before the deployment of the new code and on the right side after the deployment. Unless starting complicated measurements, no human would notice the increase in the top-margin of the headline of the smartphones and the gaming console. The image comparison (middle) between the base (left) and new (right) image reveals the difference right away by marking all pixels that have changed between the left and the right image.

6monitoring If the build passes this last test, it is good to be deployed to the live platform. To ensure, that our platform is always stable enough for a deployment we constantly monitor the servers and databases. This is (and needs to be) a shared team responsibility – just as any other step of the entire process. We achieved this by simply putting up a couple of monitors that are in the line of sight of every team member. Every day, we discuss the error rates and possible performance problems in front of the big screens. This general discussion and the come-togethers around the common screens enhanced our culture of constant monitoring. With more and more services being built we are now investigating ways to focus on the most important metrics. As the issues on our live servers are different every day, we cannot determine which metric “is key” for what service. Hence, we have to automatically analyse all our metrics and present only the most relevant ones to the team. The most relevant ones are usually the weirdest. Thus, our investigations currently go into the direction of anomaly detection.

The growing number of services (as a result of the change towards Microservices) helps us 7deployment to keep the impact on any other system but the deployed one as small as possible. Having only loosely coupled services, removes the need to announce every deployment to all other (~dozen) teams. If other teams were affected by our changes and/or deployments we would have a fundamental flaw in our architecture (or in our CDC tests). To develop and enforce hard- or software locks at the end of the release process in order to limit the deployments is not a solution for this rudimentary architecture challenge. Hence, there is no need to announce deployments to the entire IT department. It is probably a good thing though to let the ops people know about our deployments in general. And one should also have a single gate that can be closed for all deployments if something is preventing deployments in general at a particular moment. Finally, the last thing we need is a deployment reporting for documentation purposes. This usually only includes what git hash/build version went live at what time including a changelog.

8release As described above: the deployment to the live servers became an absolute non-event and thus there is nothing noteworthy for this blog entry for this step. After the deployment is finished, we run a small test suite to make sure that our code was successfully released and our core functionality is still in place.

And then we are already live, multiple times a day. And while we increase our shipping speed, we have even more time to ensure that our product is built in a good quality. To execute all the steps, we have created a wide range of tools. For most steps, the available open source tools did not fit with one primary need: The entire process is automated, thus coded. This code, as any other, needs to be tested. Hence, we think of our release pipeline as testable code. This is reflected in the build tool „LambdaCD„. Additionally, we built the described image comparison tool „Lineup„. Another team at OTTO developed a monitoring solution („Oscillator„) and even for tracking deployments, feature toggling and other events, we built our own set of tools. To be open sourced soon.

For further reading check out:

https://dev.otto.de/2015/06/01/microservices-ci-with-lambdacd-the-underlying-infrastructure-13/ (Infrastructure Plans for our Application and the delivery of it)

Have a look at the features of our open source projects. And – please! – give us feedback about your opinion and experiences.

Your FT3 Team

Faster Better Stronger

Delivering High Quality Products

monitoring