Ops Report Card: applying IT best practices to library systems teams

I used to joke that my career goal was to not be running a help desk when I’m 50.

I was the IT Manager for the University of Melbourne’s student union for many years. I only decided to become a librarian after my department was centralised and I was offered a voluntary redundancy for that role.

When I actually turned 50, I was the Team Leader Customer Support for OCLC in Australia and New Zealand. So I was still running a help desk. But I was also studying my Masters of Information Management, and it was the middle of Melbourne’s brutal COVID lockdowns, and I was happy to have a stable job that could be done from home.

After the lockdowns ended, I moved to La Trobe University to be a systems librarian. And for the last six months I’ve been managing the library systems team at RMIT University Library.

Which technically is not a help desk. But there are a lot of similarities.

The recent CrowdStrike worldwide outage and the British Library’s report on the learning lessons from the October 2023 cyber-attack on their systems both emphasise that library systems administration is still IT systems administration, and the same best practices apply.

Which is my roundabout way of linking to the Ops Report Card.

The Operations Report Card

In the site creators’ own words:

The Ops Report Card is a list of 32 fundamental “best practices” or “capabilities” that high performance sysadmin teams do. Use it as a checklist to examine where your team needs improvement.

I found this site several months ago, thought it seemed both sensible and useful, then promptly lost it in the great jungle of the internet because I couldn’t remember what it was called. I kept thinking it had “maturity” in the title, so all my searches for “IT maturity” or “help desk maturity” didn’t work.

I found it again by a suitably circuitous route – I’d linked to it in a Reddit comment, and I found it by trawling through my old comments one by one.

I have now added the link to my Toolkit > Library Matters page, so I can find it again.

And while I’m at it…

The 32 best practices

A. Public Facing Practices

  1. Are user requests tracked via a ticket system?
  2. Are “the 3 empowering policies” defined and published?
  3. Does the team record monthly metrics?

B. Modern Team Practices

  1. Do you have a “policy and procedure” wiki?
  2. Do you have a password safe?
  3. Is your team’s code kept in a source code control system?
  4. Does your team use a bug-tracking system for their own code?
  5. In your bugs/tickets, does stability have a higher priority than new features?
  6. Does your team write “design docs?”
  7. Do you have a “post-mortem” process?

C. Operational Practices

  1. Does each service have an OpsDoc?
  2. Does each service have appropriate monitoring?
  3. Do you have a pager rotation schedule?
  4. Do you have separate development, QA, and production systems?
  5. Do roll-outs to many machines have a “canary process?”

D. Automation Practices

  1. Do you use configuration management tools like cfengine/puppet/chef?
  2. Do automated administration tasks run under role accounts?
  3. Do automated processes that generate e-mail only do so when they have something to say?

E. Fleet Management Processes

  1. Is there a database of all machines?
  2. Is OS installation automated?
  3. Can you automatically patch software across your entire fleet?
  4. Do you have a PC refresh policy?

F. Disaster Preparation Practices

  1. Can your servers keep operating even if 1 disk dies?
  2. Is the network core N+1?
  3. Are your backups automated?
  4. Are your disaster recovery plans tested periodically?
  5. Do machines in your data center have remote power / console access?

G. Security Practices

  1. Do Desktops, laptops, and servers run self-updating, silent, anti-malware software?
  2. Do you have a written security policy?
  3. Do you submit to periodic security audits?
  4. Can a user’s account be disabled on all systems in 1 hour?
  5. Can you change all privileged (root) passwords in 1 hour?

Applying this

I’ve put a note in my team calendar to work through this list in December. That seems like a good time to review what we’ve been doing, and to think about which practices we’d like to adopt in 2025.

Unknown's avatar

About davidwitteveen

IT person. Zine Maker. Library Nerd. Doctor Who fan.
This entry was posted in library nerding and tagged . Bookmark the permalink.

Leave a comment