The lead developers have been experimenting with story points recently. I wrote a few weeks back about how I thought we might decide on the meaning of a point.

We decided that our scale would range from 1 to 377, so that the upper end catches the complex, major projects that we sometimes work on.

Then we decided that we’d start with 1 story point being things we’d traditionally have expected to take a quick, experienced developer about an hour.  Working that all the way up the scale, a 377 story point is the sort of thing we’d traditionally have estimated at 6 months or so.

Then we dug around our recent work and our future plans for things that fitted along the scale.  We put them all in a big pot and decided two or three for each level on the scale which were the most useful examples for the future for how to size work.  We came up with a few useful rules where you might want to move up a level because they make the work more complicated:

  • adding a lot of behat and/or unit testing;
  • lots of javascript and/or css;
  • working in an area that is already very complex; and
  • working with a third party community / system integration.

We’ve recently been given a single prioritised backlog from our Learning & Teaching colleagues, so we decided to try out our shiny new story points on their list.  This morning we played planning poker for the very first time.

We set up a spreadsheet with the business description, a supplementary techie speak description, and a column for each person.  Each person then had a couple of days to use the story point exemplars and enter their estimate.  Each person hid their column on completion so that no-one else’s estimate was biased.

To be fair, I found I mostly thought in days still and worked back to story points.  It’s going to take some time to be able to look at a description and think in points instead.

Today we worked out the median of every-one’s scores, picked the nearest point on the scale and decided what we wanted to announce as our final estimate.  I was pleasantly surprised by the amount of consensus we achieved. There were some items with wildly differing opinions but that usually just highlighted that the request was poorly defined still and we had differing assumptions on what we would do.

We decided upon one or two more “rules”:

  • If the range is very diverse because the item is ill-defined, we refuse to estimate until we have more detail.
  • If the median is the same as the estimate from the person who’s subject matter expert for the area of work, we go with that for our final estimate.
  • If the median is one step either side of the estimate from the subject matter expert, that person has final say to overrule or accept the median.
  • If the median is more than one step either side of the estimate from the subject matter expert, further discussion is required to clarify and agree a final estimate.  There were actually very few stories which fell into this category.

Using this approach we managed to get 6 people to estimate just under 100 stories in just under an hour.  That’s a lot quicker than any of us expected.  We ended up with an estimate of just over 2,500 story points for 3 months work by 8.4 people.  Gut instinct suggests that we might manage about 75% of that.  But since there were about a dozen things we couldn’t estimate, maybe that’ll be more like half in reality?

So with my project manager hat on, I now have to work out whether we should have estimated for the “top slice” tasks: support, bug fixing, advice, technical debt stories…  We probably should have, but I’m not sure if the same approach works. Especially for “keep the wheels on” server monitoring activity.

And now that I have a set of estimates for most of the things our business partners want, we have to communicate that back to them.  There will, presumably, be some iteration while we refine the requirements for the things we refused to estimate, attach names to tasks that no-one thought they were doing but which had high priorities, and drop some stuff off the bottom of the list.

I’m really interested to see what we actually deliver at the end of January, so we can for the first time get a feel for how what our development teams velocity might be. Remind me to tell you about that when the time comes!

Well not prizes in this case! No, I’m referring to story points. One of the next things on my to do list is to look into exemplar stories with points to give us a baseline to start from as we move to estimating development in points rather than hours.

In some ways this seems pretty easy:

1) define a 1 point story
2) define your scale
3) think of some things that fit at the higher story points.

One is OK. The smallest piece of Moodle work that I can think of is to update a language string, or change a style on a discrete screen element. These are the sort of thing that take a few minutes to do, plus a bit of testing. Maybe an hour in total. So that’s what I’m starting from as a single point.

Two is OK. We’ll use the common fibonacci approach, so our scale will be 1, 2, 3, 5, 8, 13, 21, 34. And that’s probably far enough. If one point starts off looking a bit like 1 hour, then 34 points is a little over a week’s work. And the theory says that if you’re estimating something as more than a week’s work, then its an epic not a story. We might quote for epics in bigger numbers – lets say 55, 89 and 144. At 144 story points, we’re talking about 4-5 weeks work and our ability to estimate accurately is probably quite poor. Then there would be 233, 377 and 610. Now we’re in the region of 5 months work which we would probably describe as a theme with even less accuracy. And there’s not a lot of point of counting points after this, so the next step is infinity or “nobody knows”.

But for my current focus, I want to categorise up to 34. And it is at step 3 that it starts to get a bit more challenging. Here are the kinds of things that I’ve been thinking about:

* add a configuration setting to control an existing feature
* write a block to display a message
* add a capability to control permission over an existing feature
* add a field to a form and save the result in the database
* add a log event on a specific page

I seem to be able to think of lots of really little things, and lots of really big things, but I’m rather struggling with the things in the middle.

So, do you use story points to estimate Moodle development work? If so, are you willing to share what yours look like?

Two weeks ago I posted that the OU Moodle development team were going to spend a fortnight writing Behat scripts.

I promised to write up how we got on, so this post fulfills that promise, I hope.


Our aims for the fortnight were

  1. to generate automated Behat tests for a significant proportion of OU plugins that we can use in future;
  2. to create the know-how within the team of how to create and run those tests; and
  3. to establish writing tests like this as part of what we do as part of any development

Of the 15 developers, 4 had written Behat scripts before and acted as mentors for the team.  By the end of the fortnight another 8 developers had got Behat setup and 7 had successfully written scripts.

The OU has over 200 plug-ins in our Moodle code base.  Before the sprint only a couple had any Behat coverage.  Now about 16% do.

We wrote over 3,000 lines of Behat-related code in the past fortnight, about 10% of which are for custom steps.

Our handover checklist now says that all new features should come with Behat scripts, unless there’s a really good reason not to.

People generally found the week useful both in terms of learning the same thing together so there were people able to help each other out and that Behat itself is beneficial to our work. Some found bugs and fixed them as they went along, so there has been a (small) positive impact on system quality.   All agreed that it is enjoyable to see the scripts working and that this sort of automation is what computers are for.  Hopefully we’ll use this approach when we need to learn something new again.

Obviously, that’s not the end of the story.  We had a retrospective and came up with some tips & tricks to share with each other and a few questions.  And we’ll carry on doing Behat and getting better coverage for our more of our plug-ins.

Tips and tricks:

  • The Site admin tree takes too long to expand sometimes so times out before the next script step which then fails.  Either restart the PC, or if that fails,  write an “I expand” step then “I navigate to”.
  • Sometimes when filling in forms, care is required over the enabling of other fields or buttons before moving to next step.  You may need to alter the order of the steps, or inject extra steps like check the button exists.
  • Cron only runs 1 per minute so if your script runs cron repeatedly with the “I trigger cron” step,  you may need a step to reset cron so that it will work.  This bug has been reported to HQ.  In any case if you have a cron task which doesn’t fire on every cron run, will need to set this up specifically in your script so that it your conditions are met.
  • It is possible to use PHPUnit generators as a way of doing some background setup, e.g. if you want to set up a number of activities with specific properties.  It is quicker to setup the course in this way than writing Behat steps to setup through the UI.
  • Lots of our plug-ins are interdependent, with if x is installed PHP so we can still share them individually.  If you want to test integration between plugins you can write a custom step to check if the plug-in installed and throw a SkipException if not installed.
  • Use names visible in the UI to identify screen elements rather than css ids.  This is a) more readable by humans, b) if your feature is built this way it is likely to be more accessible to screen-readers.
  • If you have to use a complex xpath to identify a screen element, add a comment to explain to human reader which bit of the page you’re trying to test.
  • If you including test files it is ok to put them in the fixtures folder but because they’re in the webroot they must only include publicly available materials.
  • If you are repeatedly writing the same collection of steps, you should convert these to a custom step.
  • Don’t forget to update the steps list occasionally to see what’s available.  Take care when relying on a custom step stored another plugin – can you be sure the plugin will be there (e.g. for contributed plugins)?  You may need to copy the custom step to your plug-in.
  • When writing custom steps, consider if they are Moodle generic and if so they should be given to core e.g. date picker.
  • Write scripts where you set up a feature as one user and then switch log in as an account with lower permissions to test actual output.

Area of concern:

The latest FF upgrade to 32 causes behat to fail.  In summary, don’t upgrade or use Chrome for a while.  This has also been reported today in the GDF forum.

There was some discussion over the difference between test scripts and Behat scripts – they should not be identical.  Test scripts should test more/different things to Behat or test the bits that cannot be Behat scripted easily.



Perhaps you have a view on some of these, or some better tips and tricks?  If so, please share them in the comments.



writing behat scripts!

Here at the OU’s learning systems development team we’ve been watching others work with Moodle and behat for some time.  I first saw it demo’d at the Perth Hackfest about 18 months ago.  We’ve experimented with automated testing with selenium and ranorex in the past, but with little success, so  we’ve been a little cautious about jumping on the behat bandwagon.

But the stars have aligned to change that.  We have recently updated our codebase to Moodle 2.7.x which heralds the usual spree of regression testing to make sure that nothing broke.  That, along with the fact that Moodle HQ are looking for behat scripts as part of integration testing now, prompted Sam Marshall to suggest that we should use this opportunity to write behat tests for our plug-ins.

It seems like a good idea to write the scripts to perform the regression testing that we’d otherwise be doing by hand.  Hopefully for 2.8.x then this process will be quicker!  So we decided to dedicate the entire team to a fortnight’s behat sprint.  People will be planning, writing and/or running automated tests all at the same time, so we can learn from each other as we go along.

Our aims for the fortnight are

  1. to generate automated behat tests for a significant proportion of OU plugins that we can use in future;
  2. to create the know-how within the team of how to create and run those tests; and
  3. to establish writing tests like this as part of what we do as part of any development.

How will we do?  Check back in a fortnight to find out!


As I mentioned in a previous post, I see software as a creative process, an art form.  So this is a post about the other creative art form that I enjoy, bobbin lace making … like this.

A small part of a bedfordshire lace edging, pattern by Christine Springett from an 18th C collar

A small part of a Bedfordshire lace edging, pattern by Christine Springett from an 18th C collar, completed 2014

One of the reasons my brain starts to fry when I’m asked about suitable metrics to measure the performance of a development team, is because I’m struggling to answer the question “how to you measure art?”  So, how do I measure my lace?

Well sometimes with a tape measure, sure … wedding garters tend to need to be about 40 inches to get the ruffles right.

But there are other ways too.  You could count the stitches but that would be a bit pointless.  We often count the number of bobbins used, and whether the threads “run” or have to be sewn together in separate sections; these are signs of the complexity of the piece.  And yes, I can count how long it takes.  A garter normally takes about 1.5 hours per inch, the piece above was worked slowly over about 2 years.  Of course, I have often an idea when I want something finished by -  I may give my lace away so the deadline is Christmas or a birthday – so hitting the target date (actual vs estimate) is sometimes meaningful.  And sometimes not – I did several other pieces in the 2 years on the piece above, because they had a higher priority.  Does that mean the edging is “bad”? No.

You can also count the number of times I rework parts of a pattern to get it right.  Or the number of defects left in the finished article.  And you can turn it over to look at the back, and see how neatly it is finished (like doing a code review).

The things that are most important to me are more subjective … Does it have “wow” factor?  Does it do what I (or the person receiving it) want it to do?  What do my lace-maker friends think of it? These fall more into the realm of user & peer feedback.

And finally, did I learn something?  My first attempt at Bedfordshire lace was not very good, but this piece is the culmination of 20+ years of practice.  I got better (neater, quicker, more complex, more wow) along the way.  Some learning pieces come off the pillow and languish in a cupboard because we are not proud of them.  But they were still worth it, because we learned.

When I start a piece, I know what’s going to be important.  Right now, I’m doing a nativity set for Christmas, a bit like this.  So, this is a simple lace that I’ve done many times before.  Nothing to learn.  Neatness and wow factor are important because they’ll be on display every year, as is durability.  Time is important if I want them on display this year.  The next piece on my backlog though is a first go at Flanders.  For that, time will be meaningless, neatness and wow are unlikely unless existing skills transfer readily, but learning is critical.

Does this teach us something about software metrics?  We can judge the complexity of a request, and we can do code & peer review and count defects after release.  These are all worthwhile in my view.  But I’d like to see us judge what is important at the start of the piece, and measure against that at the end.

So “add the permission control to do roll out reports to faculty” is something we’ve done many times before, nothing to learn, neatness and durability important, time critical.  But “make annotation work on PDFs and images” will be an innovation piece – a work of art. It has  no time deadline, will definitely provide wow and requires significant learning.  Does it really make sense to estimate how long creation takes and then judge ourselves against it?  I think not.

Last night I went to the valedictory lecture of Simon Buckingham Shum as he leaves the OU for the other side of the world.  Simon gave an interesting lecture about his research over the years at the Knowledge Media Institute, focussing on the knowledge mapping tools that we integrated into the first iterations of OpenLearn LabSpace.

During the presentation Simon referred to inventors with regard to KMi researchers and developers.  That word got me thinking. He talked about software as a creative process, which is something I firmly believe in – for me it is an art form.  (Look out, if I remember, for a follow-up blog post on metrics and art but that’s a tangent of what I want to say today).

I have been honoured over the years to work on many innovation projects … the first generations of intranets, accessibility in Macromedia Director, OpenLearn and web annotation.  The part I’ve enjoyed most is the invention, the creative process of working out what you want to do and then bending the computer to your will.  For me, the technical challenge is enough, provided there’s a customer, usually an enthusiastic academic, who can tell me the value of the work.

OK, so most of my projects have been greeted by “that’s a colossal waste of time”, “why would we want to do that”, “who’s going to use that” etc.  But that’s the point with innovation / invention isn’t it?  Sometimes it fails, or lasts a while then gets overtaken by the next new thing.  So my early intranets are long gone, Macromedia finally put out their own accessibility libraries, and the jury is still out on whether students are ready for web annotation yet.  For me, all these projects get boring when the invention part is over.  As they enter the mainstream, they tend to loose their inventiveness, become more risk-averse.  At that point, there’s less creativity, less art and less fun.  And a massive backlog of little things to keep you busy.

There’s been a lot of discussion here recently about innovation and risk.  I think it was an inevitable response to the recent funding changes that we tightened our belts and became much more careful where we spent our money.  It’s no bad thing that there are always people checking whether our project is a colossal waste of time or not … because it keeps us honest.

But I’d like to find a way to bring the invention, the innovation back into my life and the developers around me.  We should all have something creative and fun to do.  Even if we are reinventing the wheel, ours should be the best darn wheel out there.

I look forward to working out how we do this.  Maybe its 20% time, maybe a “dating agency” for academics with ideas and developers with enthusiasm, maybe rotation through an R&D team?  Maybe you have an idea you can share?





I went to the CETIS annual conference this week. One of the sessions that I didn’t get to was entitled ebooks: the learning platform of the future.  I needed to clone myself that morning – all of the sessions looked interesting.  So I had a quick chat over coffee with some-one that did go to that session.

Warning – when it comes to e-books, I know not of what I speak…

… and neither did the person I was chatting to.  The key message he took away was that e-books do other stuff too – media, discussion, assessment.  Isn’t that cool for a book!

My immediate reaction was “VLEs do that stuff too”.   So my naive question – why have VLEs at all, why not just have ePubs?  I guess that’s why the title of the presentation!

One obvious answer may well be that the discussion and assessment tools in e-books may not be as “good” as those in VLEs (if you disagree – remember the warning please!)  I’ve lost count of the time we’ve been told about these wonderful things where the assessment turns out to be just multiple choice with very little flexibility for feedback.

So I tweeted my initial thoughts and got back a couple of good responses:

from @tjhunt You can have a discussion waiting for the lift. You still need meeting rooms.

from @bmuramatsu still need a repo for assets & assessments–you don’t want it all in epub or vle–& a plat to collect, analyze & display results

Later in the day, reflecting on those and @audreywatters comment during her keynote about VLEs, or LMS as they’re known across the pond, being Management (that’s what the M stands for) and admin systems, I started to think about the increased demand we’re seeing for student dashboards – everything all in one place, summarised, neatly packaged, easy to find.  And at the conference it became clear that it isn’t just OU students that have that requirement.

But what does that tell us about our VLEs?  Aren’t they meant to be the spine that everything hangs off?  The one place they find their content, assessment, grades, discussions?  If they want another place for that stuff, are they really saying the the VLE isn’t good enough?

How should that change the way we build the next generation of (better integrated, personalised, flexible …) systems for our students?


Get every new post delivered to your Inbox.

Join 289 other followers