July 2, 2010
Database Design 101c, #SQLU
Welcome Back Once Again!
For today, our last class together at SQL University, (#SQLU on Twitter, the brain child of Jorge Segarra, who blogs at The SQL Update Statement, and can also be found championing the Social side of DBAness as @SQLChicken on Twitter), let’s talk about keys, both primary and foreign, discuss indexing, and finally, how to get information from data. Busy day today, so let’s get into it!
Thanks for my involvement (one more time!)
I would like to once more express my heartfelt thanks to Chancellor Segarra for inviting me to lecture in this esteemed forum. I’d also like to thank Coach Larock for re-emphasizing what I am telling you about here – his lecture was better than all three of mine combined! Read it, if you haven’t already!
Where are my Keys?
Keys are important!
When we talk about keys in a database, we are talking about a couple of different things. Primary keys are those fields in a table which help define and delineate the table, like an order or customer number. These are not repeated in a table because they are used to identify each row of data, and so can be used to index a table, as we will discuss shortly.
Foreign keys are important, too!
Foreign keys are sometimes a tricky concept to understand. Tables are related in what are known as Parent-Child relationships, and it is the keys that define and control that relationship. In my previous lecture, we laid out a data model for Customers and Orders. When we normalized the table, we ended up with one table that had Orders, separated out from the Customers table, as illustrated below.
In this table, the Order# field is the Primary key for this table, and the Cust# field is the foreign key. It relates this table back to the Customers table, so that orders can be matched to customers.
Another thing Primary and Foreign keys allow besides relationships is a concept known as constraints. When the database is set up properly, the rows of data in both tables are constrained to each other. This means that the relationship between them is such that, for example, you can not delete a customer without first deleting that customer’s orders. Imagine if that constraint was not in place, and someone deleted a customer. The orders would still be out there, with a foreign key pointing to nothing.
Indexes, or how do I find things?
Primary Keys Define Indexes
The Primary keys that you define when constructing a table are generally also used to define the index for the table. Database indexing is a large subject unto itself, in this class we will just go over them briefly. Essentially, an index is how data gets found within a table, much like an index is used in a book to find information.
In a book, you would turn to the index to look up a certain word, and the index, in turn, would point you to the page on which that word was found. A database index works much the same way. When a table is queried, and the query contains a Where clause to limit data, the database engine takes the predicate of that Where clause and looks, or seeks, through the index looking for that predicate. Once found, the rest of the record is also located, and the query returns the data to the user.
Without an index, the database engine would need to search through every row of the table looking for the predicate before it could return the row in question, so you can see how much greater performance will be with an index.
Indexes can be defined as something other than the Primary key, but in general, that’s what the index is based on.
Getting Information from Data
Raw Data is hard to Read
When a database is designed properly, with the correct amount of normalization for the data, in its raw form it is not very intelligible to the human mind. There are keys and tables, bits of data here in this table, other bits in that table, and simply reading through a table will not get you very far if you want to know what’s going on in your business. Properly queried, though, there is real information in your tables for the business to keep tabs on their business, find trends in data, and help the business stay afloat and even grow.
All of this is made possible with the right set of queries to pull from the data. In my previous lecture, I showed how not to build a table, and would like to illustrate here why it shouldn’t be built like that. To remind, here is the obnoxious offender.
|John||Smith||123 Any St||Big City||MA||02134||001012||345||Disc|
|Jane||Doe||456 Main St||Small City||MA||01234||001453||651||Bowl|
|William||Smith||2 Mulberry Ln||Small City||MA||01234||001232||345||Disc|
I will ask again, for the sake of time, to imagine that there are more orders added to the right side of the table, 3 columns for each, and, further, imagine that a sales manager would like to know how many of what product is being sold to a certain city. The query would have to include lines that take each order column and collate them together into their own column, bringing the city only once for each row, and only then being able to get counts for each city’s orders. This would involve messy and low performing pivots or CASE statements to get the orders separated out, so bad I am not even going to try to right it out here, please just imagine it. And then imagine that this little table is thirty or forty columns wide with millions of records. The manager may as well start the query on Monday and hope for a result by the end of the week!
In a properly designed database, though, the query would simply get the cities from one table, join to orders in another, possibly tying them together with customers as a binder, and his count of product orders per city could be returned in seconds. And if such data could be returned quickly, other ways of querying can be used also taking mere seconds each to get concepts like trends over time, trends over time and area, customer payment histories, etc.
I realize this has been a very simplified discussion, but (A) I’m up against having to take care of other things, and so must finish this lecture, and (B) this is a 100 level class, so I am just trying to get the concepts relayed to you, with the hope that you will endeavor to take up the ideas and go further with them yourselves in your homework and research on your own.
Wrap it up
Thanks so much for joining me!
This week has been an incredible experience for me being able to share with you all this information. I thank you so much for joining me! It is my sincere hope that you’ve gotten something to take away from it, if even only a single concept. If you have any questions, be sure to post them in the Comments, below, and I will answer them ASAP.
Check this out for homework
For homework, I am going to once again suggest someone *much* smarter than I to reference on the subject of Indexing. I am giving here a 100 level class; the lady to whom I refer can take you as far as you want to go. I speak here of the Incredible Kalen Delaney, who literally wrote the book, Microsoft® SQL Server® 2008 Internals, the definitive guide to the internals and architecture of the Microsoft SQL Server 2008 relational database engine. She can be found at http://www.sqlserverinternals.com/, and her blog is at http://sqlblog.com/blogs/kalen_delaney/default.aspx. I would simply suggest reading everything there, you won’t be sorry!
Thank you all for joining me this week, I hope I was able to teach you something you can use.