Could Phoenix’s fiasco be a decision management problem?

Hafedh Mili
Sep 11, 2017
8 min read

Unless you have been living under the proverbial (IT news) rock, you surely have heard of the problems with the new Phoenix payroll system of the government of Canada.

• http://thechronicleherald.ca/novascotia/1488695-no-end-in-sight-for-phoenix-pay-fiasco

• http://www.ctvnews.ca/politics/phoenix-pay-problems-spike-again-amid-retroactive-labour-deals-summer-hiring-1.3441079

• https://www.tpsgc-pwgsc.gc.ca/remuneration-compensation/paye-centre-pay/mise-a-jour-phenix-phoenix-updates-eng.html

Let us start off with some facts. The Spring 2010 Report of the Auditor General to the Canadian Parliament had identified aging information technology systems as a major risk for many government departments, who run the risk of failing to deliver the services that taxpayers depend on, in case of a major IT failure[1]. Well, there is no service more palatable to taxpayers than receiving their wages, especially when those wages are owed by the government itself! Ironically, failing to deliver wages is exactly what happened when the federal government tried to modernize its payroll systems. We knew that the world to hell is paved with good intensions, but this is one for the books!

Just to get an idea about the scope of Phoenix: it was built to manage the payroll of over 300000 government employees, from close to a hundred organizations, including the various ministries and the multitude or government agencies. By extension, Phoenix embodies over one hundred collective agreements, each with its job categories, pay scales, rules for vacation, overtime, admissible expenses, and the like. Phoenix is meant to process over 9 million transactions per year, totalling over 20 billion $ in payments. On paper[2] , the goals and general strategy pursued by the main sponsors of the project sounded eminently reasonable:

Phoenix was meant to replace a system that was more than 40 years old, and human resource intensive—we will come back to this later
the old system used obsolete technology, which had become harder and harder to maintain[3]
The processes were fragmented and laborious
The payroll business knowledge was becoming scarce, through of a combination of attrition (retirement) and heavy turnover; you can’t motivate young IT professionals straight out of college to work with 40-year old technology[4]
Employees and managers requiring more flexible services.

At a projected cost of a bit over 300 million $, the modernization initiative consisted of two-pronged approach:

Building a single (parameterized) IT system to handle payroll for the 100+ government agencies and 100+ collective agreements, for a cost of about 190 million $
Centralizing the payroll processing for the various agencies in a new payroll center in New Brunswick, for about 120 million $.

Once deployed, the new system was to result into recurrent yearly savings close to 80 million $.

Early progress reports on the initiative by deputy ministers from Public Works and Government Services Canada (fall of 2013 and 2014) stated that the project was on schedule. Hum...However, they identified several lessons learned from this multi-year initiative. These lessons touched upon many aspects relevant to initiatives of this level of complexity, but also some specific to dealing with the government. Reading between the lines of the various euphemisms, one can recognize some of the lessons that we, as a profession, ought to have learned, but that we keep relearning over and over again, such as: a) adopting SQA best practices, 2) clearly-defined operational requirements, 3) high-level (and sustained/sustainable) management commitment, 4) parsimonious use of the initiative ’s contingency funds, as in not using them early in the project :-), 5) careful and detailed risk analysis, and continuous risk monitoring, 6) involving end users early in the process, 7) clearly defined and delineated responsibilities between the various stakeholders, 8) timely decision-making, etc.

What makes a software project successful? To each their list of success factors, but let us say that, generally speaking, it involves a combination of people, process, and technology, where "people" encompasses all of the stakeholders, including competent and harmonious development teams, committed upper-level management, competent project management, and engaged and happy clients, and "process" encompasses development processes, as well as managerial and governance processes.

To be fair, a lot can go wrong in initiatives of this level of complexity, in terms of functional requirements, duration (5-year initiative), the variety of stakeholders, with potentially conflicting interests--both between them, and with the project’s overall goal, the subject of another article :-)--the scope of the required organizational change, and Politics, with a capital P—I mean on an intergalactic scale, with 100 government organizations involved, including several ministries.

I won’t speculate on what should or could have been done differently, from the previous government’s approach to ‘rationalizing’ government operations to the choice of the commercial software, or the risk management or project governance processes, but I will focus on the one nail, that my decision management hammerhead can identify, which in my opinion, would have, minimally, eased the pain of dealing with the systemic complexities of the initiative.

The decision logic is in people’s heads

You might have a decision management problem, if a lot of the decision logic to perform a given business process is still held in people’s heads, even though you have a IT system that is supposed to automate those processes[5].

In its effort to “rationalize” the workforce, the previous government eliminated close to 700 people working on payroll, and the skills gap that resulted has been recognized by all the stakeholders, from various unions, to the deputy minister(s) responsible for the project, as a major problem. But wait: how come you need that many people with compensation knowledge to process the payroll of 300 000 employees? Let us do some rough math: 700 compensation specialists—that is just the ones who departed—for 300000 employees, that is about 430 employees per compensation specialist. If each one of those gets paid once every two weeks, that means that each one of our compensation specialists spends an average of 10 minutes, per federal employee, per pay period[6]. Heck, they could it by hand, at this rate! And that is just the positions that were eliminated!

How does an IT system get to be so labor-intensive, and how can a process “as simple as payroll” require so many “case workers”? stuff like that happens to systems that are 40 years old. Either the decision logic was never put in the system, in the first place, or it started out as a much simpler decision logic that grew increasingly complex. Difficulty to maintain such a decision logic leads to patches and workarounds that only people familiar with the most intimate details of the implementation will know. Thus, generations of developers or business analysts will develop arcane knowledge about how to exploit the ‘under-parameterized system’ to do things way beyond its initial range. Eventually, we run out “special codes” to trick the system into doing things it was not meant to do (that is the subject of another future entry ;-)).

I suspect that the “compensation knowledge” that was lost, as the deputy ministers (or their communication staff) referred to it, is not necessarily labor law knowledge, or familiarity with the 100+ collective agreements, both of which can be readily looked up at any time: it is most likely arcane knowledge about ways to work around the limitations of the system. Such knowledge is certainly needed to work with the old system. However, it need not be (as) useful to the system that replaced it.

One of the tenets of decision management practices is that business-level decision logic (labor law and collective agreements) be catalogued, codified and shared. If this were done properly, if would have lent itself to automation, with no need for an army of compensation analysts.

As a decision management specialist, I worked with medical doctors to capture health insurance policy underwriting rules, with accountants to implement a tax reporting and withholding application, and engineers to capture rules used for rolling stock preventive maintenance. That is the level at which these ‘conversations’ should be had.

The decision logic is jurisdiction-specific

You might have a decision management if you have jurisdiction-specific decision logic.

Phoenix is supposed to handle payroll for about a hundred different organizations, and implement 100+ collective agreements. When an hourly-wage worker submits a timesheet, or the clock hits 00:01 on pay day, each pay stub goes through one heck of an if statement. Figure 1 shows the very high levels of the topology of the decision logic.

Roughly speaking, we have different job categories corresponding to the different professions that are engaged at the federal government, e.g. hourly wage workers (e.g. janitorial services), contractual workers, clerical staff, managers, engineers, accountants, etc. Each ministry or organization can have a subset of such categories, along with various specializations specific to that organization. Then we have different collective agreements, each one of which may cover a number of such job classifications. Some/many agreements span several organizations, when the same union represents workers of a particular category at different organizations. Conversely, the same job category at a given organization may be covered by two different collective agreements, provided that two unions are active at that organization.

A closer look at the collective agreements will probably reveal a number of clauses that are common to all federal workers, or all contractual federal workers, or all permanent federal workers; others may be specific to an organization, regardless of job category (e.g. pension fund), others yet may be mandated by federal law—e.g. a driver may not be on the road more than 12 hours at a stretch, etc. A thorough cartography of the decision logic, with a particular eye towards variability/commonality analysis, will go a long way towards helping payroll business analysts (‘compensation analysts’) to navigate their way through the 100+ collective agreements, and help IT implement payroll decision logic accurately and faithfully—regardless of the implementation technology. This would increase straight-through processing, and reduce the need for manual labor / case management.

The decision logic evolves

You might have a decision management problem if the decision logic changes often enough that it becomes an operational issue.

According to a CTV report, Phoenix’s problems have worsened over the summer in part because of the need to encode (retroactively) newly signed collective agreements the required update

• http://www.ctvnews.ca/politics/phoenix-pay-problems-spike-again-amid-retroactive-labour-deals-summer-hiring-1.3441079

The bad news is that this will be a regular occurrence. The average lifespan of a collective agreement is 5 years. With 100+ collective agreements at play, one can expect an average of 20 collective agreement changes every year, i.e. one every 18 calendar days. Surely, some of these changes will be sliding salary scales, but how many IT departments do you know can implement, test, pre-release, and release a non-trivial business logic update in 12 business days?

The good news, of course, is that decision management automation in general, and business rule management systems, in particular, enable us to separate the computational infrastructure of a business application, which is meant to evolve slowly, from the fast-changing decision logic which is treated as data by the business rule execution services. Thus, the compensation pay rules embodied in a new collective agreement can be encoded, tested, and deployed, without modifying the computational infrastructure, using a much lighter lifecycle. I have worked on mission-critical business applications where the decision logic was updated daily--and in some cases, hourly.

Conclusion

Such lofty goals, such … underwhelming outcome! A lot has been said and written about Phoenix, and alas, a lot more will be, as its problems are nowhere near being solved. It is a miracle, any project of this level of organizational and functional complexity ever succeeds, giving all of the potential pitfalls. However, I believe that a decision management approach to the functional requirements and implementation strategy of Phoenix would have minimally provided:

a timely and accurate encoding of the decision logic (pay computation!) thorough the kind of business analysis that comes with any well-planned decision management initiative, with less reliance on manual labor / case management
an agile implementation and maintenance approach, enabling Phoenix to accommodate frequent changes to the collective agreements it embodies
a clear(er) delineation of responsibilities between the various stakeholders, as functional requirements—in the form of compensation rules—are front and center of the overall initiative, as is typical in decision management initiatives.

However, this would not have alleviated many of the systemic complexities of dealing, politically and administratively, with so many organizations (100), nor with the organizational changes that were to accompany the technology aspects of the initiative.

Acknowledgements

I would like to thank my friend Louis Martin, a veteran IT developer, manager, and educator, who supplied me with the research on Phoenix. Louis collected the material in preparation for interviews he gave to Radio-Canada about Phoenix. See:

· http://ici.radio-canada.ca/nouvelle/1034624/ne-blamez-surtout-pas-phenix

· http://ici.radio-canada.ca/emissions/Les_matins_d_ici/2015-2016/chronique.asp?idChronique=428617

· http://ici.radio-canada.ca/nouvelle/1015847/origine-rate-phenix-logiciel-decision

Footnotes

[1]http://www.oag-bvg.gc.ca/internet/English/parl_oag_201004_01_e_33714.html#ex2

[2] or government officials presentations.

[3] I know of a public institution that runs one of its mission critical systems on hardware that is so old that the IT department has to buy spare parts on e-Bay, the original vendor having changed names/owners about half a dozen times since that generation of the hardware first came about.

[4] If they knew what was good for them, they should learn Cobol instead of Node.js

[5] A take on Jeff Foxworthy, an American stand-up comic who was famous in the 90’s, who has a series of sketches on rednecks that he started as “you might be a redneck if …”

[6] if each compensation specialist works 70 hours/two-week period, to process about 430 pay checks, that is 10 minutes per paycheck.

#DecisionManagement #Phoenixpayrollsystem #businessagility #domainvariability