The Most Valuable Tool I’ve Used in My Engineering Career

27 Aug, 2024

I was in an interview last week, a panel interview of engineers, and one of them asked me an interesting question that I hadn't thought about before. She asked me "if you could have one tool from a previous job to bring with you, what would it be?"

It's a great question, and one that I had never thought about before. It's vague enough to get an idea of how the candidate's mental processes work, but it's also very practical for a job on an engineering team. I won't name names, of course, but bravo K.

It took me a few minutes to think through my eight years of experience. I thought about IDEs, I thought about custom CLI applications that make common, company specific tasks easier. I thought about finance teams putting together thoughtful and thorough guides on how to navigate stock options. I thought about a custom application that a colleague wrote to consolidate documentation from disperate sources like Confluence, Github readmes, Jira, &c (which is a super cool project, if anyone's interested. All of these were massively helpful (well, aside from the CLI—by the time I was hired at that company, it was out of date and no longer maintained).

After some consideration, I decided that having strictly, thoroughly defined schemas of our data model was the thing I miss the most. The data model in question was massively complex, and maddeningly ambiguous. We had spent years, as data engineers, just guessing at what a certain column in a CSV or JSON value was intended to represent. Inevitably, this led to issues, churn, unhappy clients, and pushed deadlines. I won't go into details because I truly cannot afford to burn any bridges right now (hire me!), but here's the gist.

It was my team's fault, in so much as we didn't figure out a better way to deal with the problem until much later, but a lot of the blame went on the clients and our internal business partners. It turned out that nobody knew what this stuff meant. Of course, if we got it wrong, that was a massive issue, but no one was able to tell us was was right. We eventually decided to spend several months holding several meetings a week that would go as long as necessary. These meetings involved internal domain experts, client liaisons, and clients themselves when everyone else was stumped, and often times the clients were stumped as well. We decided that wasn't an acceptable answer. We created several JSON schemas, each several thousand lines long, and stayed in the room until each and every field was not only defined in concrete and unambiguous terms, but included examples and constraints for each field.

This took months. It was painful, everybody hated it, people tried to avoid these meetings. But at the end of it, we had a strict schema of our data model. We were able to use it in a huge variety of ways. We would send it to a new client and insist that they abide by our schema. If they refused, they got to help us figure out precisely how to translate their custom fields to our data model's fields. We set up a simple API that served the schema, and were able to check incoming files against our schema. If constraints or tests failed, then the data file went back to the client to fix it. The number of downstream issues that we avoided was significant, and catching these issues early generally makes them much easier to fix.

Anyway, that's the tool I'd want at every job.

About two second ago I got an email rejecting me from that job that this interview question was from. So... I'm gonna sign off, and seriously consider if this is even worth perusing any longer.

Take care y'all. Be kind to one another.

#data engineering #labor #personal #work #writing