PASS Pro Now Available

Welcome to the next evolution of PASS. Unlock exclusive training, discounts, and networking opportunities designed to accelerate your data career. Learn More >

The official PASS Blog is where you’ll find the latest blog posts from PASS community members and the PASS Board. Contributors share their thoughts and discuss a wide variety of topics spanning PASS and the data community.

Design for the Future, Not the Past!

Every organization that manages data fights the same battle on a regular basis. Should systems be architected quickly to solve a current problem, or should they be built to withstand the test of time?

This question sounds like a no-brainer. Who wouldn’t want to build systems that scale, are well-documented, easy to maintain, and can manage the inevitably massive torrent of data that will flood it in the future?

As is often the case in software development, theory and reality differ greatly. Oftentimes building a system or feature correctly the first time is seen as expensive and time-consuming. Deadlines are real and they often influence design as much as the features themselves.  If you have ever worked on a project where the goal was an MVP (minimum viable product), then you understand how this can feel!

Tackling this problem requires accepting that both sides of it are valid and identifying areas where scalability and security can be achieved without hefty investments. Maybe we cannot build a product with all the bells and whistles, but perhaps we can identify the specific changes that will have the greatest impact and focus on them.

To do this, a great starting point is to create a simple list of design considerations that are inexpensive, but easy to neglect.  Here are a handful based on my experiences over the years:

Speed Matters!

Make sure a performance-minded person is taking part in development, QA, and testing processes.  Determine how much data will grow over time, how large it will get, and if retention or archiving policies should be built. These are easy things to forget and can result in painful emergencies when the forces of reality begin to beat up on poorly designed data structures.

When testing new applications or features, be sure to test against data that is relatively large in size and realistic in content. Any process will be blazingly fast when its data source consists of twenty rows of contrived sample data.

As an aside, compliance, contractual obligations, and privacy laws may all impact how you manage others’ data. If you know you are subject to these regulations, then be sure to not dig yourself into a hole by not developing to take them into account.

Architect Data Structures Based on Workload

Not all data are the same. When designing new data structures, consider how the data will be used prior to making all the important decisions.

Will data be transactional or analytical? Pick the technology based on that usage pattern and make every effort possible to not mix transactional and reporting data unless you are sure about what you are doing!

Is data temporary or permanent? Will it be created and destroyed a week later? Is it used for ETL or one-time data loads?  Is data no longer needed if your server or app restarts?  These questions can allow you to take great liberty in how data structures are architected.  If data does not need to be permanent, it can be put in-memory, into temporary data structures, or into fast, but non-durable tables.

Does data need to be real-time, or can it be delayed? When data can be loaded daily or hourly, rather than real-time, we can optimize processes to load the data fast in aggregate and then ensure that the data source is as speedy as possible when it is time to read from it. Maintaining data asynchronously is far easier and less expensive than having to adhere to transactional logic.

Will contention be high or non-existent? If a table is expected to be a hot-spot of activity, then be sure to design it to be as lightweight and fast to query as possible so that it can service a high volume of requests efficiently.  This will avoid locking, blocking, deadlocks, and waits that could lead to unacceptable amounts of latency.

Be Consistent

Creating and abiding by a set of design conventions does not take significant effort and will greatly improve maintainability over time. Even if the rules are imperfect, there is great value in knowing that data is structured the same way throughout a database, regardless of the developer that created it.

Naming conventions are often shrugged off as a personal preference, but can greatly reduce future development time when we ask where data is located, what it is called, or what the name means.  Should objects be named using camelCase, PascalCase, or snake_case? Should meaningful prefixes or suffixes be used on object names to assist in sorting, searching, and identification? In general, longer and more descriptive names are easier to read and understand. A table called “employee_address_detail” is very clear in its purpose, whereas “EMP_ADD_DET” may be short but is hard to understand.

Use the same data types for different objects when possible. If a date/time is needed and must be accurate to 3 decimal places, then DATETIME2(3) is ideal. A date can be stored in a DATE column, rather than a DATETIME or DATETIME2.  Duration should be a number and not a string of days, hours, minutes, seconds, and milliseconds. Strings should be as large as is needed and no more. NULL and NOT NULL tell developers if a column is required or optional. Making columns NULLable for no reason will be confusing.

Will international data be stored? If so, will foreign addresses, currencies, phone numbers (and more) be supported? If so, build data structures that have room and support for international data up-front and avoid the need for a messy localization project in the future.

Consider time zones and data storage. Storing data as SYSDATETIMEOFFSET or as UTC dates/times will ensure that your data is not dependent on the server time or some other arbitrary time zone that may be problematic in the future.

Documentation

Finally: Document. Please. Comment your code and maintain some repository or application for searchable, current documentation. The time saved here is immeasurable. The alternative is the frequent exercise of having to find a person who is knowledgeable in some aspect of an app and consume their time by asking lots of questions. The worse alternative is having to personally go code-diving to find answers that do not exist anywhere else.

Even mediocre documentation (if current) will save immense time and energy in the future as at least the basic questions of code or a data structure can be answered quickly and without the need to schedule meetings or perform exhaustive research.

What’s Next?

This is a discussion focused on the future and centered around scalability, maintainability, and quality.  Nothing presented here is overly complex or expensive, yet designing and architecting apps with these ideas in mind will result in a wide variety of positive results. Higher uptime, less bugs, less maintenance, less wasted resources, and less technical debt. What’s not to love about that!?

Edward Pollack
About the author

Ed Pollack has over 20 years of experience in database and systems administration, which has developed his passion for performance optimization, database design, and making things go faster. He has spoken at many SQLSaturdays, 24 Hours of PASS, and PASS Summit. This led him to organize SQLSaturday Albany, which has become an annual event for New York’s Capital Region. Sharing these experiences with the community is a top priority, and encouraging everyone to take the leap into public speaking and engaging others, a passion.

Please login or register to post comments.

Theme picker

Back to Top
cage-aids
cage-aids
cage-aids
cage-aids