Gherkin Golf

Over on LinkedIn John Ferguson Smart posted an interesting comparison between two Gherkin scenarios illustrating a business rule. This is from his excellent book, BDD in Action.

The original scenario is:

Scenario: Earning point test case
  Given I open the Purchase Flights page
  And I purchase a Return ticket from London to New York
  And I upgrade the return flight to Business
  And I complete the flight
  When I log on to the Frequent Flyer site
  Then I go to the account summary page
  When I click on "Earned Points"
  Then I see the total number of points of 1350

This is a common occurrence in the wild. It's usually a sign that a manual test script (remember those?!) was transcribed over to an automated test in a like-for-like fashion. Or if it's greenfield, it's written by someone with extensive experience in manual script-based testing who is still applying that frame of reference to scenario formulation.

It's okay. Not great though. This kind of test is doing quite a lot, masking the actual business rule under test. The scenario is tightly coupled to the user interface, which is error-prone and difficult to reason about. Usually, it's because the test is trying to test both the business rule and the particular user interface. It loses focus in doing so. And woe betide anyone who has to debug one of these when they fail for seemingly arbitrary reasons!

John suggests a much nicer improvement:

Background:
  Given the following flight points schedule:
    | From   | To       | Class    | Points |
    | London | New York | Economy  | 550    |
    | London | New York | Business | 800    |
  And Stacey is a Frequent Flyer member

  Scenario: Flights earn points based on distance travelled and cabin class
    Given Stacey has purchased the following tickets:
      | From     | To       | Class    |
      | London   | New York | Economy  |
      | New York | London   | Business |
    When she completes the flights
    Then she should earn 1350 points

This does so much right. It's decoupled from the UI - how this is presented as a system interface does not change the business rule, therefore it doesn't impact the expression of the rule under test. Sure, under the hood these steps are doing more work than the steps in the original example. The use of tense makes the context, action and expectations clear. I really like this solution.

Here's my take on how I'd formulate this business rule:

Feature: Frequent flyer points allocation

  Rule: Points allocated upon return flight completion

    Scenario: Transatlantic flight with upgraded return leg
      Given frequent flyer Freda has purchased a return journey
        | From | To | Class | Points |
        | London | New York | Economy | 550 |
        | New York | London | Business | 800 |
      When she completes the return flight
      Then she has earned 1350 points

It's in a very similar spirit to John's example, but with some key differences. I'll outline what I think are the important parts.

Drop the I, It's Not About Me

Seb Rose and Gáspár Nagy in The BDD Books (vol 2: Formulation) discourage the use of "I":

Use of the first person "I" in scenarios is common throughout industry and is advocated in many tutorials. However, we encourage people to avoid this ... because it encourages imprecision. In particular, consider that the reader of a scenario will identify themselves as "I". This leads to the meaning of a scenario being dependent on the person who is reading it

Instead, we recommend that all scenarios are written in the third person, identifying all actors by role or persona

Therefore, similar to John, I've utilised a persona. I originally used Stacey as the persona name because he did, but I've changed it to Freda because the alliteration ("Freda the Frequent Flyer") helps me easily recall the context. Gherkin doesn't have formal support for personas - I normally create a personas.md file alongside the .feature files to outline each one.

Rule, Feature, Scenario

The Rule keyword is a more recent addition to the Gherkin language. A feature may be an aggregate of several distinct rules, each of which can be illustrated by a series of example scenarios. Looking at John's solution, I was curious that frequent flyer points were only allocated on completion of a return flight. But from experience (don't ask), not all flights have returns. So what happens there? It seems like calling out the behaviour specifically for a return flight has value. That's what a Rule is for!

Furthermore, that rule can be broken down into explicit scenarios. The example given is one where the return leg of the journey is at Business class. Sounds like someone got an upgrade! Using real-world data is useful for retaining the context of the test. Matt Wynne's Example Mapping, an activity that's fantastic to perform at the Discovery stage of BDD, recommends examples have names that fit into the naming convention of an episode of Friends - "the one where...". I decided to tell a bit of a story - Freda's booked a transatlantic trip (how exotic) and received a business class upgrade on the return leg.

Is the Background Necessary? No!

A big difference between John's example and mine is that I've dropped the Background in favour of putting the data in the scenario. It's the reason mine is more concise.

John argues that the Background section is useful for establishing shared facts - his example clearly maps to "set up"-style methods common in automation tools. Feature files with lots of different scenarios benefit from the top-level shared context. Advocates of "don't repeat yourself" will be appealed.

However, Rose and Nagy argue against a Background section in most cases because:

it has a negative effect on the readability of the scenarios. When reading a scenario, it is necessary to bear in mind what steps (if any) are included in the Background ... which is tedious and error-prone.

It's hard to disagree with that. Higher cognitive load, higher maintenance burden, difficult to reason about the impact of a change. Don't like!

My perspective is that the Given step establishes the context. It mentions the persona, so I'd encode the automation tooling behind-the-scenes to set up the persona appropriately in the implementation of the Given step. The flights and the frequent flyer points live in the Given block's data table, too. Do the frequent flyer points make sense to be expressed at this level, or should they have remained in the Background? I can see arguments both ways here. What swung it for me is that in the real world, there'd be more than one scenario in this file. For that reason I didn't want to have a Background section containing a large mapping of journeys/classes -> points combinations. Each scenario should be distinct enough to encapsulate its own data. On balance, given the Given clause already has a data-table outlining the journey(s) under test, adding an extra row didn't feel too naughty.

Have I Lost Something In Doing This?

Ooh, good question. Note the focus of my test versus John's is actually slightly different. John's Scenario is that "Flights earn points based on distance travelled and cabin class", whereas my business rule is "Points allocated upon return flight completion". Perhaps John's example was written against a version of Gherkin that didn't support Rule and is therefore serving dual purposes of outlining a rule AND an example. But it does mean that the point John was making explicit has become implicit in mine. Maybe that's because the business logic for point calculation needs separated out entirely from the rules regarding when those points are allocated to a user.

One to chew on a bit more, I think. And a good example that we all come at these problems from different angles. Finding consensus and consistency is key!

What About the Should?

A few years back I wrote about my dislike of the word should in a test case. In short, "should" is different from "must" with regards to precision. So, in my example, I removed it in favour of a more active and direct expression.

If the test passes, it's because Freda has 1350 points. If it fails, it's because she doesn't.

Daniel Terhorst-North and Liz Keogh both disagree, advocating for the world "should". The reasoning is compelling, so read it for yourself. As John points out over on LinkedIn, it's usage in BDD is deliberate, as it's intended to make you question what's there:

The word "should" has a long history in BDD, and the uncertainty it introduces is actually very deliberate: it is designed to make you think "should it?" rather than accepting a rule on face value.

This is an important point and I don't disagree. My mind works the same. Where we differ is that I've found the "should it" discussion is more fruitful during discovery and early formulation. By the time the scenario is formulated and essentially ready to be committed to the repository - I'd prefer those "shoulds" to be "musts". Otherwise, there's more discovery to do! My experience is one of existing tests expressed as "shoulds" making me more uncertain about the current state, and therefore hinders my reasoning of the impact of my current change.

But, I'm used to being frequently and reliably told I'm wrong on this one!

Golfin' Makes Me Feel Good

It's a good to flex the muscles on stuff like this when not under the pressure of a deadline. It helps clarify thinking and gain some experience when the principles and the real world intersect. Whether you agree or disagree with my solution, or my reasoning, I don't mind. I'd love to see your take on this (please do get in touch!). If anything, I hope it helps you deepen your knowledge on how to write clear Gherkin.