Don’t Repeat Yourself. DRY. It’s an oft repeated axiom among software developers but do we apply it broadly enough? Where we do apply it, do we apply it too aggressively? Today I’ll look at this fine old acronym three different ways: Avoid repeating code, Avoid making the same decisions, and avoid doing work over again.
The Usual Suspect
The first thing we think of when we think “DRY” is avoiding cut and paste of code or anything where we have repetitive code. Avoiding repetitive code carries two primary benefits:
- You only have to get it right once.
- If you need to change it, you only change it once.
- You go faster overall as a result.
Hopefully that’s pretty darn self explanatory, so I won’t go into a whole lot of detail. I’ll just leave it nice and simple like that.
Decision, Decision, Decision
An area we often overlook repetitive code is in our friendly old
if statement or any other conditional logic. This also goes for any other type of control structure. Very few systems will be created that aren’t loaded with
if, but it’s surprisingly common to see the same bit of
if repeated throughout a single chain of method calls.
Consider the scenario where you have a “people storage” service. The service ends up storing the “person” portion of an Employee and also a Customer. However, it also needs to do a couple of other things as part of attempting to save the person. Fair warning: This example is extraodinarily contrived and transcendentally ridiculous in the interest of making my point simple to understand.
- Validate the person is valid.
- If the person is an employee, the employee must also have a manager who is also an employee unless their role is “CEO”.
- This also means they must have their employee-manager relationship stored in the EmployeeManagerRelationship table.
- If the person is a customer, they must have a payment method associated with them.
- This also means they must have their payment method stored in the CustomerPaymentMethod table.
So, taking this ridiculous assumption that we’re building a single service for this, let’s look at our hypothetical Save method.
public Guid Save(Person person)
if (person is Employee)
if (employee.Manager == null)
throw new ValidationException ("Employee does not have a manager.");
else if (person is Customer)
if (customer.PaymentMethod == null)
throw new ValidationException ("Customer does not have a payment method.");
if (person is Employee)
this.employeeManagerRepository.Save(person as Employee);
else if (person is Customer)
this.customerManagerRepository.Save(person as Customer);
catch (Exception ex)
Notice that the decision of whether the person is an employee or a customer is repeated. We could avoid that repetition and stay in the spirit of this code, but you either repeat the call to
this.personRepository.Save(person) or you put it first and do your type specific validations after you have called that first save. This isn’t great in most databases as you’re holding open a write lock on a row for a little bit longer. That could cause a lock escalation and thus performance issues (or worse). In that event and in my opinion, the lesser of two evils is repeating the call to
The next problem comes along when you need to add a whole other type of person (maybe a vendor?) and now you’re back to editing this code. This is where SOLID ^[If the link to Bob Martin’s site stops working, here is wikipediahttps://en.wikipedia.org/wiki/SOLID_(object-oriented_design)] comes in play. But first, let’s ask, “Are we repeating a decision that’s already been made?”
Yes. Of course we are. The caller of the Save method almost certainly knows it is dealing with an employee or a customer. Trying to make a “do anything” API is just as likely to frustrate them as help them. Oh, sure, it might be easier to call, but is it easier to handle the result of the call? In most cases, no. Why? Because the response will have some differences and therefore they’ll have to handle it and repeat the same decision over and over again!
In short, look for your decision points early and let them drive a chain of calls. In this example, you’re probably better off with an Employee service and Customer service that are separate from one another. All the reuse you get is really in that call to the personRepository and some trivial transaction manager boiler plate that could be handled with a delegate or other construct.
You Want That AGAIN?
Most of our applications are multi-tiered affairs. There’s some user interface (even when the user is another system calling an API), some sort of business logic layer, and then some sort of persistence layer. Maybe it’s all a single executable, but the layers are usually present in some fashion in even simple applications.
What this means is that while you may have the most non-repetitive code in the world, at run time that code often does the exactly same thing over and Over and OVER again. Let’s say you have a web site and that website needs to serve up the Message of the Day to inform users that there will be maintenance performed this weekend. Let’s assume that message is administered from some content administrator and so it’s stuffed in the database. The simple scenario is this:
- User navigates to “mysite.com”.
- This sends a request to a “home” service controller.
- “Home” calls into a logic layer.
- The logic layer queries the data layer to see if there are any announcements.
- The data layer calls across the network to the database and grabs the current announcements.
- That gets passed back up the stack to the logic layer.
- The logic layer sends it back to “Home”.
- “Home” receives that response and renders the html page.
- The user gets that page sent back to them.
- User 2 navigates to “mysite.com”. Repeat from step 2.
This is the simple case and many of you are screaming “just cache it!”. And we do. But we often leave it at that and miss other cache opportunities.
- Do you have a search function? Are you caching those search results?
- Are you caching user data? A great many sites have some sort of personalization feature. Are you caching a user’s settings while they’re on the site?
- What about very slowly changing data? How often does the list of US states change? Hardly ever. Given that US states have a legal 2 letter abbreviation, then we could possibly treat the abbreviation as a primary key and never actually retrieve the list of states from the DB.
I would suggest avoiding using session storage for user data and focus on a cache provider. Using session storage means it’s only useful where you have session (duh). But what if you have a separate worker service that sends emails and needs user data too? A dedicated cache service helps anywhere you need it and helps manage memory on your web servers.
I’m using cache as the go to, but any “faster storage” is better than repeating whole DB calls. In today’s cloud based “NoSQL” world, we can explore more freely than ever before. Azure and AWS both offer caching services you can use with minimal effort.
When you cache here are a couple of things to think of:
- If you use a caching service, you’ll have at least a small hop of network latency. It won’t be as fast as in memory cache.
- If you use in memory cache, how will you add servers? While you can safely use it for slowly changing data like the list of US states, user data will have problems if you add another server into the mix.
- Don’t think about it as just caching database calls. It’s often better to cache entire constructed objects so you’re not repeating the construction logic too. That is: think about caching the results of your business layer or even presentation layer, not your data layer.
As usual, there’s no brilliant ideas, just my attempt to communicate some of my thoughts and practices. This was a long one because I got it all done together and didn’t feel like being cheesy and breaking it apart. Let me know on LinkedIn if you have any favorite ways you reduce repetition in your applications, I’d love to hear them.