June 28, 2010

Database Design 101a, #SQLU

Posted in Learning, SQL University at 9:00 am by David Taylor

What is #SQLU

SQL University, known as #SQLU on Twitter, is the brain child of Jorge Segarra, who blogs at The SQL Update Statement, and can also be found championing the Social side of DBAness as @SQLChicken on Twitter. The community owes Jorge a deep debt of gratitude for making #SQLU possible. The gathering and dissemination of knowledge is one of the highest achievements to which a community can aspire, and it takes great leadership to do such, which is why, I am sure, Jorge has been nominated for the prestigious Exceptional DBA Award.

Thanks for allowing my involvement

I would like to take this opportunity to express my heartfelt thanks to Chancellor Segarra for inviting me to lecture in this esteemed forum. It is said that everybody with some knowledge is capable of sharing that knowledge; I hope I can do this University the justice it deserves.

Welcome back to Monday, I hope you enjoyed the weekend

I know Professor McCown left you Friday with a lot to think about regarding women in technology. I hope you found time in your busy lives to take initiative with any women you know that are teetering on the edge to get them into this field, and will always keep that in mind.

This has been a busy semester for you all, and I know there has been a lot covered. What I’d like you to focus on this week might be construed as a bit more basic than some of the technical lessons you have received, but I feel that getting the fundamentals right will do worlds of good when those higher steps are taken.

To that end, this week’s class at SQL University, we are going to discuss database design.

Plan for the week

My plan for this week is to start with a high level overview of database design, then discuss further aspects of design, such as why it is important, who will be using it, how to implement it, what tools are available to help the designer, and what the consequences of both good and bad design, or even no design could be.

There will be no test at the end; your own work will show you how well you have absorbed the lessons. There will be homework every day of class, and keep in mind that Coach Larock will probably also have work for you to do.

Database design is a complex subject, but, as I have mentioned, getting it right is fundamental to your database being usable, having good performance, being scalable, and, above all, being pertinent to the customer.

The 50,000 foot view of DB Design

Why do we care

We data professionals need to care about the design of the databases under our care for multiple reasons. First and foremost, we make our livings from these entities, and we must justify our earnings. Also, the majority of us are charged with the maintenance of the databases we work with, and good design facilitates easier maintenance. Finally for this discussion, we all hope the businesses that entrust their data to us will grow, and the databases underlying that business will very likely grow along with it. A well designed database can handle multi-fold growth in stride – correct steps taken at the outset will allow later growth to be unimpeded.

What managers/customers want vs. what they need

It is a truism that managers and customers have an idea what they want, but they don’t know what they need until what they want has been in implementation for some period, and they can see how things work. The database designer needs to know at the outset the end user’s business needs, rather than what the customer thinks she wants, because it’s what is needed that matters. We will spend a fair amount of timing discussing how to determine those needs when we get into discussing Business Requirements.

What tools can help us out

There are several tools available, both free and not, for assisting with database design. Some are better than others, and, as in any aspect of software design, there are defacto standard tools out there. These tools can range all the way from the back of a napkin and a pencil to a full blown ERM (Entity Relationship Model) design software tool. We will briefly touch on several in this range as time allows throughout the week.

Why be Normal?

Normal in this case does not refer to where one stands in society! Rather I am talking here about data normalization. This is a vast subject unto itself, with a great number of great minds having chimed in on the subject over the years. We will be discussing the basics of normalization, the different normal forms and when one would choose the different forms as the database gets designed. Normalization is data simplification, an elimination of repetition of data within the data store.

Does it fly?

Again, I am alluding to a concept here. Databases can scale to incredible heights. Transactions per second can be in the thousands, as can the number of concurrent users. Good design will allow the database engine to handle such large numbers of transactions while not so good designs can cause such conditions to bog down servers irrevocably. This is probably one of the most crucial reasons for learning good design.

Why worry about Database Design

They’re just Excel tables inside SQL Server aren’t they?

As mentioned, most managers and business data owners think they know what they want. Just about anyone in the head office can open a copy of Excel and start typing in a collection of information and generate useful reports from that information, all within Excel. This causes those same people to think that what is deep inside SQL Server is of the same nature, that the data professional is simply building spreadsheets for users to store data from their various forms.

If you’ve not yet been exposed to good database design, you may think the same thing. I will make a confession, here, when I started out is SQL Server, I did the same. Each group of pieces of information was contained in single tables, and I thought the power of SQL Server was only that it had great facility to slice and dice the data. What I found was that it was very difficult to write queries against such tables to obtain the necessary consolidation of data into useable information, and further, I found that such arcane and convoluted queries suffered terrible performance.

Performance

Which is the reason I gave performance its own heading. SQL Server is fast. Unbelievably, almost magically, fast. If what you are doing with it is designed correctly. To give a concrete example, one of my own business tasks was to generate a report of yields from a manufacturing process. In the beginning, with my ‘Excel tables’ and convoluted queries, it took almost two minutes for the report to be rendered each time it was opened. The management team relying on the report was not happy, to say the least. No one wants to stare at a spinning indicator for two minutes, waiting for a daily report to appear.

I learned, then employed, the concepts I will be teaching this week, and found that a properly designed database yielded simple querying techniques which reduced those two minutes to sub-second performance. Because of that (and other things), I remain employed.

Business Requirements can’t be stressed enough

Performance, normalization, querying techniques – none of these matter if requirements are not met. Remember that the business’ data in many cases *IS* the business, and that without it the business would at best suffer, and at worst, cease to exist. The data within the database needs to be understood by the data professional, in enough detail that proper tables can be constructed to house the data efficiently, and that querying devices such as views, stored procedures, etc. can be easily built to gain real information from the raw data to assist the business owners and employees in the success of the business.

Talking to customers to get the requirements

The only real way the data professional can understand the business and its data sufficiently is to simply talk with the customers. The customers, in this case, means any consumer of the data, be they someone that hired you as a consultant, or they are simply higher up in the chain of command within the business. Meeting with the producers and users of the data to gain insight into what the data is and how it will be used is often the best and sometimes the only way of gaining enough understanding to properly design the database.

Understanding the business

Understanding the business is the next level of detail required. Along with tables, queries and the like, the database designer needs to also consider such concepts as security access and data usage to answer such questions as who can see and interact with the data, and what parts of the data will be used for which parts of the business? Again, these are parts of the Business Requirements which can be gleaned and understood from meeting with and talking to the customers and managers and users of the data.

Understanding the data

A corollary to understanding the business is to simply understand the data itself. Is it a collection of financial data, as in a bank or mortgage loan business? Is it orders and customers, as would be found in a retail business? These types of data are vastly different both in their collection and their usage, and understanding, at least in general, if not in detail, the data that will make up the content of the database to be designed will end in a vastly better database than one which was designed without that knowledge.

Wrap it up

What we’ll cover next time

For our next class, we will delve into actually building a (simple) database, based on requirements that I will describe. We’ll build it two ways, one bad and one better, so you can learn to discern the difference. We’ll cover normalization in a bit more depth, and I will describe some different ways of building the database, utilizing a couple of the different tools available. We’ll discuss also some methods of getting information from data, to illustrate why the design is important.

Check this out for homework

In the meantime, it will be helpful for you to learn what not to do, before we even start to do anything. To that end, I would like you to read a piece by  Dr. SQL himself, Louis Davidson, author of SQL Server 2005 Database Design and Optimization. The specific blog post I would like you to read is entitled Ten Common Database Design Mistakes. There is a wealth of information to be had here, and I would like you to have this information in mind as we discuss the proper design of a database.

Until next class, then, enjoy yourselves.

Leave a comment