Project · Team and Process Design

The team I built to close a gap nobody owned

There are two things that can happen to a complex support issue at a SaaS company. It waits in the product queue until something external forces it to the top, or it blows up a sprint to get crammed in. I had been navigating that binary for years, doing everything I could reach inside support and packaging the rest as precisely as possible for the handoff. But there was a category of problem that belonged in neither queue nor sprint. It needed a layer that did not exist. So I built it, around three specific people and what each of them was best at.

WellSky · Human services and aging/disability SaaS for state agencies · Senior Support Engineer · 2021 to 2023

What This Proves

Diagnosed an organizational gap nobody owned and designed a structural team to close it
Built and developed people deliberately, around individual strengths and a growth path
Stood up a knowledge-transfer system that moved tribal knowledge out of a few people's heads
Measured improvement: 60% faster on team-resolved work, 40% overall support resolution, 30% faster new-hire ramp
Saw the durability risk from inside it: structure has to outlast the individuals holding it up

The Problem

Better tickets, same two endings

I had already spent a year rebuilding how support handed work to product. Tickets going up the chain were cleaner, more complete, and easier to act on. The quality problem was solved. What was not solved was what happened next.

Complex issues still had only two fates. They waited in the product queue until something external made them urgent enough to move, whether a client escalation or a federal reporting deadline two weeks out. Or they forced their way into a sprint and disrupted a team that had other plans. Either way, the window for resolution was driven by pressure, not readiness. The work that should have been happening in the middle (data cleanups, scripted fixes to underlying data problems, carefully packaged tickets handed directly to a developer) was already happening informally, through me. It just had no structure around it.

People and Impact

State agencies running programs people depend on

These were state agency clients administering human services programs, real agencies with federal reporting obligations and real consequences when those obligations failed. When a complex issue sat unresolved in the product queue, the cost was not an abstract SLA miss. It was a broken workflow for an agency that could not afford to wait.

One client had been quietly using a required NAMRS reporting field for custom configuration across tens of thousands of records, corrupting their annual federal report in the process. The issue had been invisible until someone looked. Under the structure that existed at the time, nobody was positioned to look, understand the full scope of it, and move on it before a deadline made it a crisis. That was the actual gap: not a slow queue, but no one in the right position to act.

Root Cause

The job was resolution, not escalation

The reframe was the whole thing. I had been treating the goal as "get a good ticket to the right people." That was wrong. The actual goal was "resolve the customer's issue," and once I named it that plainly, two paths appeared: either resolve it directly inside the support engineering layer, or build work product complete enough that product can act without reinvestigating on their end.

Both paths needed the same thing that did not exist: a team positioned between client-facing support and product, with enough technical depth to diagnose at the source. Not a better escalation path. A place where the hard problems could actually stop and get taken apart.

I had been trying to get the work to the right people. The fix was to build the right layer between them.

And the moment I framed it as a layer rather than a handoff, a second thing became possible. I did not have to staff it with generalists. I could build it around specific people whose depth was being underused everywhere else they sat in the org.

Solution and Iteration

Built around three people on purpose

My role in the layer covered everything it needed someone to own at once: client-facing work with agencies across forty-two states, SQL troubleshooting, process design, training, and deployment work that ran from cleanup scripts to Azure Pipelines at whatever hour the work required. I was in program director meetings Monday morning. I was deploying to production Monday night.

Alex was the SQL specialist, and before AI tools existed for that work, her syntactic precision and her ability to reason through a system's backend made her a genuine force multiplier. The handoff between us was structured: I documented problems in a specific format in Salesforce, capturing what I understood about the client environment and the likely area of impact, then brought her into a Teams session to walk through the issue live so she could build her own picture of it. Most of the SQL routed to her, not because I couldn't do it but because she was extraordinary at it and it was what she loved to do. On the hardest problems we paired.

Sunil's role was designed as a breadth-first position. The plan was for him to run first pass on high-priority, non-critical tickets (anything that was not an active fire) and to serve as backup for both Alex and me. It was also the best teaching structure available to him: exposure to the full range of work, watching how both of us approached problems, enough volume and variety to build real intuition over time.

Then the mechanisms. A daily fifteen-minute standup, as much a human check-in as an early warning system for anything starting to burn, followed every day by a full hour of office hours open to all eight support staff. The rule for what you could bring: a problem you had been stuck on for seven days or more, or anything genuinely high priority no matter how fresh. Everything else got sent back, because the point was to build independent problem-solving capacity, not to absorb everyone else's queue. The format was walk-me-through-it: talk through the issue live until we understood what it actually was.

Those hours mattered more than the number suggests, because of what existed before them. Non-client-facing documentation at WellSky was nearly nonexistent. Technical specs lived on a server that only product and engineering could access. Before this team, support and implementation staff had no central point for technical help. They sent messages to people in product and engineering and waited to see if anyone responded. When I moved into the solution analyst role and took over from the founding product manager, one of the first things she told me was that we were losing tribal knowledge every quarter and I needed to find a way to stop it. I built a Teams channel and started writing actual documentation and process flows, piecing them together in the margins of whatever I was troubleshooting at the time. Office hours became the live version of that same transfer: the channel through which what lived in a few people's heads could actually move to the people who needed it.

The NAMRS situation I described earlier is what the layer looked like when it was working. A support staff member brought a broken NAMRS report to one of our meetings, unable to diagnose it. I dug in and recognized quickly that the issue was serious and that we had approximately two months before the federal reporting deadline. I pulled the NAMRS spec, confirmed which fields were and were not part of the report, and identified that the client had been writing custom configuration data into a spec field across tens of thousands of records for years. I brought the client in, walked them through my findings, and proposed the fix: migrate the configuration data to an unused non-spec field via cleanup script, validated in sandbox before any production deployment. They agreed. I ticketed engineering for a sandbox refresh, Alex and I wrote the script together, we ran it against the client's data in sandbox, the client tested and approved, and it deployed to production. Resolved through the support engineering layer, not through a sprint disruption, and not in the two weeks before the deadline when someone would have finally cared.

The goal was to start in human services, prove the model, and expand. The longer-term vision was a central support engineering layer above all of WellSky's support teams, a position that could surface real data on issue patterns and application health directly to leadership rather than having it filtered through escalation chains. That expansion was the next step. It was not something we completed.

Proof

The numbers held up

60%

Faster on team-resolved work

40%

Overall support resolution time

30%

New-hire ramp time

The work the support engineering team moved resolved about sixty percent faster than it would have going to development or waiting in the queue. That is the team's own throughput on the problems it took directly: the class of issue that used to sit a month or two until a deadline forced it, now turned around in days.

Separately, overall support resolution time improved about forty percent. That number is the whole support organization getting faster, not just this team. I measured it from ticket open and close dates about a month into the office hours, against the prior baseline.

New-hire ramp time dropped thirty percent. The documentation and process flows I had been building since my first days in the role, written in the margins of troubleshooting work because that was the only time available, meant the knowledge that had lived in a few people's heads finally had somewhere else to live.

Handoff

What I could see from inside it

I could see the shape of the problem while I was still building the solution. Everything in the layer that required judgment (which tickets needed Alex, which ones Sunil should run first pass on, which client situations required me directly on the call) centralized on me naturally, because I was the one who understood all of it. That was effective. It was also a dependency, and I knew it while it was happening.

The team worked because of who was in it: Alex's technical depth, the structured handoff process we built together, the trust in the room that made it safe for a support staff member to say they had been stuck on something for two weeks. Those were real and they mattered. They were also particular to the people who created them.

Something built around people lasts exactly as long as those people are in their seats. Durability has to live in the structure, not in the individuals holding it up by hand. I understood that from inside this one, before anything forced me to. It is why I build differently now.

Where this went next

Everything after this was built with that in mind

This is the earliest of the three projects on this site. It is the one that shaped the approach the other two are built on. I did not learn the durability lesson after this team ended. I could see it forming while I was still inside it: the load centralizing, the judgment calls routing through one person, the structure depending on who happened to be in the seats. Recognizing it was the thing that changed how I build.

The invoice automation at ATI is the clearest example of what that shift looks like in practice. Same instinct for diagnosing the real break in a process. Different relationship with what happens after I walk away.

← All projects Next: ATI invoice automation →