Quest For Quality in Ljubljana

October 15, 2016February 15, 2019 / Finn Lorbeer / Leave a comment

The Quest for Quality is a conference in Ljubljana for first time ever. It was very well organized by Nikola and Evelina from Comtrade. I was very happy to talk for the closing note on the first day.

Find the slides of the talk here (PDF, opens in new tab): level-up

How To: Consumer Driven Contract Tests

October 6, 2016June 5, 2019 / Finn Lorbeer / Leave a comment

In my last post I wrote about how to test micro services. When I wrote about the most difficult part of the testing, the integration, I said that it would blow the size of a post to go into the necessary details. I am sorry and I want to make up for it here.

When we design and create the interaction (and thus integration) of different services with each other we rely very much on the principles of “Consumer Driven Contracts“, in short CDCs. These Contracts can be tested. While the basic idea seems simple (“let other test the things of our service they think they need”) it implies a change in the way how we think of who-tests-what. So I want to spend some time explaining the concept step by step in many pictures:

Imagine you are part of a team (the blue team) and work with another team (the purple team). The purple team’s service will ask the service of the blue team about the email address of a specific user (ID). They agreed – for the sake of the argument – to communicate in json format via a RESTfull interface. In other words: the consumer (purple team) has a contract with the provider (blue team).

Now our friends in the purple team want to make sure, that the blue team will always give them an email as an answer. The first important thing is that the purple team needs to trust the blue team in figuring out the right email address for the given user. If the purple team starts to validate the blue team’s results for the actual content you have a lack of trust. This is a different, much more severe problem. Writing CDCs does not help you to solve this.

Hence, the purple team tests that they get any answer and that it contains an email address (at all). They will not test which one.

So assuming the blue team does it all right, the purple team would then just write a test that sends any ID and gets back any email in a valid json.

Now, whenever the purple team wants to know if the contract is still valid they let the test run. But then think about when the contract would potentially break: in this picture the contract will only be broken if the blue team introduces (unintended, not communicated) changes to their service. So isn’t it a much better idea to run this specific test whenever there is a change to the service of the blue team?

And this is what we do: the purple team – the consumer – drives the the contract to make sure it holds up to the agreed functionality of the blue team’s service. In this way – having the test of the purple team in the CI-System of the blue team – the blue team can make sure to not accidentally break the contract.

A few days later, when everything works smoothly and both teams feel comfortable, the blue team does not need the purple team any more to do a release. The blue team knows that the test they received from the purple team discovers all flaws. The purple team can concentrate on something else.

This is it! This is the basic idea of the CDC. The purple team writes and maintains the test. But the blue team executes it. If the request would go the other way around (the blue team sends an email and expects back an ID), then the test would be written by the blue team and executed by the purple team.

But this is only the start. Of course, the blue team has more than only one service. And there is not just the other purple team, but many teams doing requests to the different services of the blue team. Now the blue team members have to be quite disciplined and ask the other teams for all the CDCs. This sound a bit like an overhead and quite exhausting. But here comes the big benefit: the blue team is in all its work (and more important: deployments) independent of the other teams and independent of the availability of the other teams services in test environments.

Anyone who tried to get multiple services of various teams running in different (test) environments will smile now knowing of all the pain this usually inflicts. In this picture the blue team mitigated this risk/problem.

While the blue team is doing quite well there is more to it. Being independent of other teams and their services is nice. But still you have dependencies between your own single services. You should also resolve those. Cover the connections between the services of the team with CDC tests, too. It will give you:

fast feedback (surprise!)
independent deploys
reduced complexity

Each of those three points on its own leads to a quicker development of features and a reduce in cycle time. As I get the combined benefit in this situation there is nothing I would argue against, so lets do it!

If all tests are in place, each service of the blue team is totally independent. It can be deployed at any time and still make sure the other services work properly.

And did you realize what this means? We do not need a big, heavy, error prone and expensive set of selenium end-to-end tests. We mitigated so many risks already and can be sure of interactions that are working.

You cannot test anything here, but certainly a lot. In any case, improving a lower layer of the testing pyramid usually implies that you can reduce something at the top. That means: less end-2-end test => less time for maintenance => more time to grow out of the role as a classic test manager.

Let’s embrace this!

A Guide to Testing Microservices

October 5, 2016June 5, 2019 / Finn Lorbeer / Leave a comment

Over the last couple of months I have often been asked about micro services, especially how to test them. Building a small (e.g. a micro) service usually requires a different approach towards the architecture and towards the services themselves. In many cases also the team structures changes (like Conway’s Law, just the other way around). The question that arises then is how we can “compensate” for it while testing. Or to be more precise: how to adopt our way of working to ensure a high quality software.

There are three major themes that need to be taken care of when we think about testing micro services:

The many times referenced testing pyramid principle apply here as much as for a monolith. And not just for one single service. The principles are as important when it comes to testing different service’s interactions.
Automation plays a key role in order to speed up your build and deployment pipelines. Maybe this applies even more for micro services than for other architectures.
Its not easy to start with a good micro service landscape and there certainly is some overhead in setting things up (the first time). But the benefit of small and independent services is the fast feedback you can receive.

The second and third points are mentioned most of the time, when someone talks or writes about working with micro services. The testing pyramid is not really new and there is not too much detail about the testing of those little services. So we will look into this even more, as indeed, there are some pitfalls you should avoid in order to have a properly testable service landscape.

If we have a look at a monolith, things are straight forward: it is pretty clear where and how to apply the testing pyramid. The basic components of our services are unit-tested, their interaction is integration tested and then there is probably one end to end test around the entire service.

Screen Shot 2019-02-15 at 20.52.18.png — The testing pyramid for a micro service. The scheme of the service is by Toby Clemson from martinfowler.com

Now a “micro” service is (in this abstraction) nothing but a small monolith. In other words: treat the service itself as you always did. Be sure that the unit test coverage is really good. Find the places where you need to test the integration and then put as few as possible (in some cases: 0) end-to-end tests on top.

Up to here, there is nothing breaking new. But any project dealing with only 1 micro service has no big issues with testing. The real problems arise, once you have two, three or many services. The big difference to the monolith-world is that you have a lot more interaction via the HTTP clients. Lets have a look at an example.

Lets assume that the product we want to build is some kind of messaging system. There are three mayor features (see green boxes in the image):

The dashboard page where the user is going after login and sees a greeting. On this dashboard, also unread messages are displayed.
The inbox itself, all messages can be viewed here
The entire messaging system has some super strong encryption.

Although those features are very clear, we have a look at the business domains. The first domain we can identify is the user (“Finn”, the black circles). The user has a dashboard and an inbox. The second domain are the messages (yellow circles). They are displayed on the dashboard and in the inbox. The messages are also encrypted. The third domain is the encryption itself (blue circle). The encryption only works with messages.

Those circles represent the domains. If we cut our system according to the circles, we will end up with a clear domain for every service.

Service A deals with the user (and probably owns the html). Service B takes care of the messages. And Service C is doing the de-/encryption. For each single one of those services we apply – as described the testing pyramid. Then we put them together and view the entire system. And we consult our pyramid once more:

Screen Shot 2019-02-15 at 20.53.48.png — (side note: the level of tests we are discussing here is completely independent of the tools you may use. The integration tests can be PACT tests or CDCs, they could be written in Selenium or you could use Appium. But we do not want to talk about single tools here, rather about the general concept)

The single service itself is the unit. We know its properly tested and the unit is working as expected. There are a lot of tests in place to make sure this is the case. The base of the pyramid is ok.

The integration becomes very tricky. This is new and was/is not needed in the world of monolithic applications. There is one thing you really – really! – need to be aware of: test the interaction itself – and not two services.
If you have to get two services up and running to make sure that one is working properly it does not sound too bad. But if you scale and you have to get 27 services up and running to test one it just will not work. Too much complexity for a test run.
The challenge is that you will want (need) to test the integration of “your” services without the other ones. An approach that works really well are the consumer driven contract tests. It will blow the scope of this post to explain the concept here again – so I wrote a different post only about those CDCs. Make sure to dig into this!

If you got the trick with the integration, then the rest is “easy”. You will run the standard set of end-to-end tests. And once you have a good feeling about the integration and contract tests, then you will probably also reduce the amount of costly end-to-end tests. If you got all of this in place it would be perfect.

So much for the theory.

While this seems to be a simple principle, the real world looks very different. In many cases I have seen micro services being carved out of an already existing monolith. Furthermore, this usually happens under time constraints and has to be done fast.

This is exactly where a major error occurs, which afterwards propagates through the entire software development cycle: When teams start with the first, small service, the business domain is often unclear. As a result, the system with the Services A, B and C is designed not as above but very different. Remember the three major feature of our examples? Giving just a quick thought about cutting the services, many teams will cut their architecture by those features:

One service handles the dashboard (and thus needs to know about user and encrypted messages, black circle). On service handles the inbox and displays messages (and thus need to know about the user, too, as well as the encrypted messages, yellow circle). The encryption holds/stores all the messages (blue circle).

That means, that the domains of “messages” and “users” is not contained to one service, but instead propagates through the system. And this is the critical point: if we now have a look at how to unit test those domains, the units spread across the services. We need two or three services to write the unit tests. Writing integration tests becomes incredibly complex – or more accurate: hell. Then its also no wonder that some people then tend to leave the integration tests aside and rather cover the entire system by end-to-end tests. The result will look similar to this:

In this situation, the tests are most likely very unstable: as already indicated in the picture there is no clear scheme of what to test where. The integration point of the domains are randomly somewhere in the services. It will be difficult to mock things. If all services are always needed to be available it usually leads to “flakiness” of tests in pre-production environments and it becomes very hard – if not impossible – to test on a local machine. For every fail of the end-to-end tests someone needs to check the error log, in order to find out if it is a “real” error or not. Let me repeat, this is gonna be incredibly complex – or more accurate: hell.

If we then start to automate the entire thing… (you remember: fast feedback and so on) we will end up with a workload that is way beyond what we experienced with the monolith. At this point of time it is perfectly reasonable to do a reality check, whether or not things became easier with the micro services. For all teams that chose this “approach” that I have worked with, we actually figured that we did not improve compared to a monolith. But realizing this is important and valuable:

If your finding is, that it got more complicated then better stop and think how to improve. Because otherwise, as soon as you start to build more and more services and scale your application, it will only get worse. Thus, do yourself a favor and spend a lot of time thinking about the domain split. It will be easier tot test – and scalable!

And at last, the biggest advice I can give to people – teams (!) – that start working with smaller services is to

“Test” in time.

What do I mean? The team needs to be involved early:
Engage with the people designing the services. Talk to the business and understand the domains your are about to form in your services. And make sure that all people in your team have a clear understanding what you are doing and why you are doing this. This is the time where our role stretches far out of the testing area. This allows all of the former testers to grow out of the test manager role. We can make a real difference on the flexibility that our service landscape will provide to our business. Please make sure, that your team and your business benefits from the approach to micro service. Show them where the pitfalls are. And guide them.

To summarize:

Apply the testing pyramid. Make testing cheap and reliable. (as usual)
Automate everything (that you can). Each manual step is a real show blocker in a micro service environment.
Make it fast. Value the fast feedback from quick running pipelines. Be flexible.
Start in time. Get a clear picture about your business domains and how you think it is – should be – reflected it in your services.

The beauty is: once testing is easy and helpful its a real cool amplifier when you continue to build and improve your system. Have fun 🙂

Are we only Test Manager?

July 7, 2016February 18, 2021 / Finn Lorbeer / Leave a comment

This is a translation of the original blog post that I wrote with Diana Kruse, Natalie Volk and Torsten Mangner. While writing this blog once more just in another language, I took the liberty of adding some personal notes here and there in italic font.

In every development team at otto.de there is at least one tester / test manager / QA… or however you would call the person, who is shaping the mindset for quality.

Until recently, “test manager” was the dominating description at Otto – a very rigid and bureaucratic term. Although the intention was good to emphasize that we do not only execute tests, but we also manage them! Meanwhile, even managing tests is only a very small part of the value we deliver.

In a bigger workshop Finn and Natalie, two of our “test managers” picked up on this contradiction and worked things out. We were sure that it would not be sufficient to write “agile” in front of test manager. Hence, we developed a new understanding of our role that looks and feels like this:

We are the teams’ Quality Coaches

We support the teams to understand “quality” as a collective responsibility. We achieve this by working intensively with all roles of the team rather than talk about generic concepts. We establish knowledge and practical approaches regarding the topic of quality.

We See Through the Entire Story Live Cycle

Together with the team we take care that our high standards of quality are regarded long before the development of our product starts: We suggest alternative solutions during the conception of the story and indicate potential risks. We avoid edge-case problems later on by thinking about them while writing the story. We pair with developers, so that we know that the right things are tested in the right place. Thus, we have more time to talks to our stakeholders and users during the review. With the right monitoring and alerting we are able to observe our software in production.

We Drive Continuous Delivery / Continuous Deployment

One central goal is to deploy software to the production environment as risk free as possible. Therefore we try to change as little of our codebase as possible and roll out every single commit automatically. We are using feature toggles, to switch on new functionality independent of these deployments. This has two major advantages: we can roll out our software to customers (almost) at the speed of light and get fast feedback for new developed features.

We are Balancing the Test Methods of the Testing Pyramid

We know how to test what on which level of the testing pyramid. We use this concept to create a lot of fast unit tests, a moderate number of integration tests and as few end-to-end tests as possible. This does not only speed up our pipelines but it makes our tests more stable, more reliable and easier to maintain.

Additionally, in our tool box we can find all kinds of tests (acceptance tests, feature tests, exploratory tests), methods (eg. test first, BDD) and frameworks (like Selenium or RSpec). We know how to use those tests, methods and frameworks on all levels of the testing pyramid.

(as a side note: this indeed implies to run eg. Selenium tests on a unit test level if applicable)

We Help the Team to Choose the Right Methods for a High Quality Product

Being specialists, we know all (dis) advantages of different methods and can help the team to benefit from the advantages. We learned that pairing will enable knowledge transfer, communication, faster delivery and higher quality. Besides pairing, test driven development is one of the key factors to create a high quality product from the beginning.

Flexible software can only emerge from flexible structures. This is why we are not dogmatic about processes and methods but decide together with the team what mix of processes we really need to get our job done.

We are Active in Pairing

We do not only encourage our developers to pair, we also have fun pairing ourselves. In tThis way we can point to problems even while the code is being written. To avoid finding all edge cases only during development we also like to pair and communicate with Business Analysts, UX-Designers and Product Owners. Together with the operations people we will monitor our software in production environment.

The pairing with different people and different roles allows us to further develop our technical as well as domain knowledge.

We Represent Different Perspectives

By taking on different positions we prevent unidirectional discussions. We try to avoid typical biases by challenging assumptions about processes, methods, features and architectures. This enables us – from time to time – to show a different solution or an alternative way to solve a given problem. It helps us to reduce systematic errors, money pits and to objectively evaluate risks while developing our software.

We are Communication Acrobats

We are the information hub for all kinds of things inside and outside the teams. We make special, constructive use of the grapevine, a phenomenon that practically occurs in every company with more that 7 people.

We are enablers for communication. This may be the communication of a pair of developers, between many or all team members or between teams throughout the organization. By facilitating this coordination we can reduce obscurities about features or integrations of systems and hence get our software into a deliverable state faster.

After developing this role, we engaged more and more of our “agile test managers” with this concept. They were so enthusiastic about it, that they wanted to apply for the job once more right away. The only thing missing was a good name: As in every cross functional team we have different specialists and one of those people is the driving force for high quality we found the perfect name: the Quality Specialist.

(Side note: the German term for “quality assurance” (QA) is “QualitätsSicherung” (QS). Using the same abbreviation made it even easier to adopt the new term.)

Quality Specialist is a very well fitting name for this stretched role. Although we are broad generalists, our core value lies in shaping a quality mindset and a culture of quality in a team.

Those were the first steps on a very exciting journey. The next thing to do is talking with other roles in order to find out how this new comprehension of the role changes our daily work. Furthermore, almost no one fulfills this role description today. Thus, we need to grow, level up and reflect on our development. The most fun part is that we can learn a million things in different domains from different people.

Process Automation and Continuous Delivery at OTTO.de

November 24, 2015February 15, 2019 / Finn Lorbeer / Leave a comment

This post is all about deploying every single commit to the production environment.

All manual steps in a release cycle can be automated – even if you want to check your designs. This post explains step-by-step how to automate each single one and what to consider when releasing a couple of times per day. You can find my article in the Otto dev blog. Or you can read it below.

Whenever we present how we release features and deploy our code in one of OTTOs core functional teams, we are met with a certain set of questions, e.g..: “Why do you want to deploy more than once a week?”, “If you automate release and test management, what are the release and test managers doing?”, “How can we prevent major bugs to enter the shop?”, “Where is the final control instance to decide if something goes live?”, or the typical question “Who is responsible if something breaks?” or simply “Why the heck would someone want to do this?”

Let us answer those questions. Let us guide you through our way of working. Let us show you what processes we have (and which ones we do not have) and give you a hint on how to increase productivity and quality at the same time (without firing the test manager). All you have to do is to sit back, relax and let go of your concerns to lose control. Don’t worry, you won’t lose it.

If you have a look at a general release process for a deployment, it will look similar to this scheme:

The image illustrates a release life cycle: Occasionally, a new release candidate is built. If the code compiles and first tests are successful, we speak of it as a “green build”. The code of this release candidate is deployed to a test server and after a smoke test a full test suite can run. Depending on the number of test servers and your (integration) test setup, you may want to repeat steps 2-4 for more than one server. If all tests pass for a specific build version, and the live platform is stable (→monitoring step) you can announce the live deployment and ship the new build. Probably, some tests will ensure that the live deployment was successful.

Not a single one of those steps requires human interaction. The entire process can be automated. One of the many advantages is that you simply do not have to spend time on this process. The time that is now free (most of the times this will apply for the Quality Analyst) can be spent on other tasks. In our case, we could almost double the time the QA spends with the developer and business designer.

Before that, the Quality Analysts were only able to evaluate the quality in a given piece of code after the implementation. If this code did not meet the expectations for “quality”, they would need to convince stakeholders and developers that the quality was not sufficient and the developers would start the story once again. This was a very time intensive and thus expensive process.

Now, the Quality Analysts have more time to review the business requirements, think about edge cases and report them to the developer before implementation. Furthermore, the QAs are pairing with the developers and can make sure, that “quality” is engraved in the product during implementation.

1newbuild The build that triggers the entire process, has a lot of tests itself already. We keep tight track of our test pyramid in this first step of our test automation. At this point we have a huge amount of unit and a fair portion of acceptance tests. They not only test our Java code base. We apply the same principles to our JavaScript: to reduce the number of frontend (Selenium) tests possibly needed at the end of our build pipeline, we prefer fast feedback of a lot of JavaScript tests in the initial build step, using Jasmine.

If all those tests pass, we consider a build “green”. Our build runs for every single git commit.

2deploytestserver The next step is to deploy a green build to a test server and continue testing the new software. Talking about deployments, one often forgets that it is code executing all the steps necessary to provision a server with new software. Even this code can fail and thus, we recommend a small smoke test to be executed right after the deployment. This can be as easy as checking the version number on a status page or the git-hash in the meta information on the front page. You will save a lot of time to not execute tests on old code.

3testsuite Having the software successfully deployed to the test server, we then continue testing. After covering the base of the test pyramid in the build step, we now take care of the top of it. Here we will execute more acceptance and functional tests, some of them in Selenium. Furthermore, we can run first integration tests with other teams, other services and maybe third party software. For integration testing, we do not rely on Selenium alone. We have a wide set of so called CDC tests (consumer driven contract tests) with other teams. If other teams have specific requirements e.g. for our APIs (= they consume our API) they would write a test that runs within our build pipeline, e.g. a pact-test. In this way we can make sure that all requirements other teams have towards us are fulfilled for every single commit.

Maybe you do not have just one test server, but two (e.g. for different kinds or levels of tests). Then you would execute the deployment-and-test steps two or more times. In any case, the number of tests should decrease with every step, otherwise there is something fundamentally wrong with your test pyramid.

One big concern I am met with is that no one looks at the product before it goes live. “Automation is nice, yes, but nothing beats the pattern sensing of a human brain” is what people mention in response to all the automation. The statement is true, no doubt. But the point is, that the value of a human brain is not necessarily needed here and can be better applied earlier in the process of the software development.

4toggle To explain this, let me tell you about one fundamental requirement to release automatically: that is the consequent use of feature toggles. Using toggles means that new features are not released by a deploy but by a flip of a button. This has two major advantages: First, the feature will have a shorter time to market. Just a few minutes after the last commit is pushed the entire feature code is deployed. One does not have to wait until the end of e.g. a sprint cycle. Second, despite all human and automated tests sometimes something just goes wrong. (And it does not even have to be a technical problem). Thus, if we release a feature with a toggle, we can also toggle it off in just one second. We do not have to rollback our deploys and hence we do not affect other features that were in the same deployment. The process automation made our deploys an absolute “non-event”, while the side effects of the quick deployments made feature releases a lot easier.

With the fact that (almost) all, especially the frontend changes, are toggled, no deployment should ever change the face of our product. And this is difficult to test for humans. Human brains are activated by mismatching patterns. Different paddings for otherwise equal elements or a picture that is out of its box are very easy to spot for us. But if one link in a list of maybe 20 links is missing on a page, almost no one will notice. If the link would turn green, or would have a different font than all other text, we would discover it right away. If it’s simply gone, we barely notice it. Hence, for our kind of deployment we need either a human with an identic and photographic memory – or a machine. We decided to go with the latter. Inspired by other tools, such as “wraith” by BBC we built a small ruby gem (lineup) that uses selenium to take screenshots of defined pages of our product before and after the deploy. It will realize as soon as just one pixel changes and fail the test step. This lets us detect, whether or not our feature toggles were implemented correctly and discover undesired front end changes before they go live. Here is an example:

5imagecomparison

On the left side is an entry page before the deployment of the new code and on the right side after the deployment. Unless starting complicated measurements, no human would notice the increase in the top-margin of the headline of the smartphones and the gaming console. The image comparison (middle) between the base (left) and new (right) image reveals the difference right away by marking all pixels that have changed between the left and the right image.

6monitoring If the build passes this last test, it is good to be deployed to the live platform. To ensure, that our platform is always stable enough for a deployment we constantly monitor the servers and databases. This is (and needs to be) a shared team responsibility – just as any other step of the entire process. We achieved this by simply putting up a couple of monitors that are in the line of sight of every team member. Every day, we discuss the error rates and possible performance problems in front of the big screens. This general discussion and the come-togethers around the common screens enhanced our culture of constant monitoring. With more and more services being built we are now investigating ways to focus on the most important metrics. As the issues on our live servers are different every day, we cannot determine which metric “is key” for what service. Hence, we have to automatically analyse all our metrics and present only the most relevant ones to the team. The most relevant ones are usually the weirdest. Thus, our investigations currently go into the direction of anomaly detection.

The growing number of services (as a result of the change towards Microservices) helps us 7deployment to keep the impact on any other system but the deployed one as small as possible. Having only loosely coupled services, removes the need to announce every deployment to all other (~dozen) teams. If other teams were affected by our changes and/or deployments we would have a fundamental flaw in our architecture (or in our CDC tests). To develop and enforce hard- or software locks at the end of the release process in order to limit the deployments is not a solution for this rudimentary architecture challenge. Hence, there is no need to announce deployments to the entire IT department. It is probably a good thing though to let the ops people know about our deployments in general. And one should also have a single gate that can be closed for all deployments if something is preventing deployments in general at a particular moment. Finally, the last thing we need is a deployment reporting for documentation purposes. This usually only includes what git hash/build version went live at what time including a changelog.

8release As described above: the deployment to the live servers became an absolute non-event and thus there is nothing noteworthy for this blog entry for this step. After the deployment is finished, we run a small test suite to make sure that our code was successfully released and our core functionality is still in place.

And then we are already live, multiple times a day. And while we increase our shipping speed, we have even more time to ensure that our product is built in a good quality. To execute all the steps, we have created a wide range of tools. For most steps, the available open source tools did not fit with one primary need: The entire process is automated, thus coded. This code, as any other, needs to be tested. Hence, we think of our release pipeline as testable code. This is reflected in the build tool „LambdaCD„. Additionally, we built the described image comparison tool „Lineup„. Another team at OTTO developed a monitoring solution („Oscillator„) and even for tracking deployments, feature toggling and other events, we built our own set of tools. To be open sourced soon.

For further reading check out:

https://dev.otto.de/2015/06/01/microservices-ci-with-lambdacd-the-underlying-infrastructure-13/ (Infrastructure Plans for our Application and the delivery of it)

Have a look at the features of our open source projects. And – please! – give us feedback about your opinion and experiences.

Your FT3 Team

Testing with Frank

August 22, 2014February 15, 2019 / Finn Lorbeer / Leave a comment

This is the first post I ever wrote, explaining how to implement testing with Frank, an open source tool for the automated testing of iPhone apps. Read the full article in the Wimdu tech blog or continue below.

A long, long time ago… we started to work on an iPhone app for the merchants of Wimdu (= our hosts). We want to provide them with a handy tool to manage all their bookings. We call it our “host app”.

While the project comes to the final stages of development we started to think about testing the application. It was clear from the very beginning, that we wanted to build automated tests to easily check every release for stability, quality and reliability.

We had a very specific set of requirements for our tests. We want our mobile app not to be a stand-alone or isolated platform, we will have interaction between the hosts (merchants) using the iPhone and our guests. Those interaction are mainly messages between the two parties as well as the actual bookings. If we want to simulate and test end-to-end user behaviour, we need to run simulations on our web- and mobile app simultaneous. The tests for the web app are written in Ruby, using Watir to hook into Selenium and drive Firefox. So we needed a system that could run the scripts for the web app simulation. Furthermore, we wanted to develop test driven, so we needed quite specific feature descriptions and scenarios for our test cases.

Naturally, the first step was to look into the UI Test Automation that is built into Xcode. Thinking, that as it’s fully integrated into Xcode and tthe perfect tool to work with, we tried to build some small test cases. The limitation of this tool is quite obvious: it can only test the iOS app. There would be a lot of manual steps involved to switch between the test cases of the iOS app and the web app. Hence, for this approach the build in tool box was not sufficient.

Amongst others we looked into Testing with Frank built by Thoughtworks. It was the best fit for our need. It uses the two languages we need, Cucumber to have a Product-Understandable side of the tests, where they can define and read test cases. On the other hand its build on Ruby. We love Ruby, so we tried it.

Frank is charming, what happens is pretty much explained in one picture:

frank

However, after running frank setup we stumbled upon the first problem as we were using Cocoapods for dependency management in our projects. You can fix it by passing the workspace and scheme to Frank like:

frank build --workspace Host.xcworkspace --scheme Host --configuration Debug

If you build Frank in release mode, then the “Frank Driver” will not access the “Frank Serve” any more. Other than that the build will succeed and we can follow Peter’s blog to write our first tests.

Getting started with the Cucumber tests we quickly realized that it would be a good idea to strictly divide the two “languages” Ruby and Cucumber. It creates a bit of an overhead for smallest step definitions, but as soon as things get a little more complicated it helps to organize the code.

In Example:

Cucumber reflects the specification from the Product Manager.
It lives in the default folder features/request.feature

Scenario:
as a host, who has a booking request I want to see the request in the app with the guest name displayed
  Given I have a booking request
  Then I should see 1 request
  And I should see the name of the guest

the step definitions are also in the default folder features/step_definitions/request_steps.rb. Here we decided to write as little Ruby as possible, to not have to places where the actual code is running. We do only require the according class:

require_relative "lib/requests"

requests = Requests.new

Given /^I have a booking request$/ do
  requests.pending_booking
  steps %{
    Given I log into the app with user #{requests.host}
    Given I touch "Requests"
  }
end

And /^I should see the name of the guest$/ do
  requests.find_guest_name
end

We tell the Ruby-Part of our tests to create a new booking request for a new host and gather information. We can also include other Cucumber steps if it fits the purpose of the scenario. The last step is now to write the actual test, it lives, as you can see already in features/step_definitions/lib/requests.rb:

class Requests
  include Frank::Cucumber::FrankHelper

  attr_accessor :host, :guest, :offer

  def initialize
    @app = Webapp.new
    self.host = @app.host
    self.offer = @app.create_offer(host)
    self.guest = @app.guest
  end

  def pending_booking
    booking = @app.create_booking(offer, guest, checkin_day, checkout_day)
  end

  def find_guest_name
    name_displayed = frankly_map("label marked:'Guests Name'", 'text')
    begin
      name_displayed.include?(guest.name).should == true
    rescue => e
      raise "Guest with #{guest.name} not found in the view"
    end
  end
end

All interaction with our web application is handled in the Webapp class. In this case a new host is created, a new offer for this host and a guest to book the offer. As we want to develop end-to-end user tests we do not create dummy entries in the API for the app but rather interact with both apps at once. The label we use in frankly_map is set in the Xcode project in the first place, so that the element is accessible for us in Frank.

How we work (with Watir) to write tests and simulations in our Webapp will be the next post to follow.

Faster Better Stronger

Delivering High Quality Products

Author: Finn Lorbeer