Databases and the Virtue of Laziness Is this you? * You think database access should be behind-the-scenes, transparent to the programmer * You find it easier to write a complicated Perl object than a clever SQL query * You like to keep your SQL queries as simple as possible Does this sound familiar? * Your average DB task takes you more than one query * A bug in a DB query takes you all day to track down * Your DB contains garbage, missing, or orphaned data * Certain queries (or inserts) must be done through a particular layer of Perl code, or else the answers will come out wrong * Your perl code performs checks on data before inserts * Your perl code transforms some of your data, every time it is fetched * Your database contains data that is garbage, missing, or orphaned * Your average database-related task requires more than one query * You sprinkle small, fast database queries guiltlessly throughout your scripts and modules * Your database queries (web pages, etc) return results slowly * Your queries/pages return results quickly, but you can't serve as many users as you'd like on your hardware * It takes you for-frikkin-ever for new hires/collaborators to make sense of the database access in your code * You feel really sorry for the poor sap that will inherit your code after you leave your job * When your program has a bug in a database query, it takes you all day to find and fix the bug I'm here to help. Here's what I'll do for you today. I'll give you some ideas on how to: * make your programs and database queries faster * make your code easier to read, write, and maintain * ensure the correctness and integrity of your data, without tons of code I'll also give you some tricks you can apply RIGHT NOW to your projects. Take a look at this example. Here's a nice easy SQL query. Most of you can write this stuff in your sleep. [sql join] But when you start thinking about your program's code, you might do something stupid like this: [software loop] Some of you are laughing at this code now, but maybe some of you are scratching your heads, saying, what's so stupid about that? Well, you're making one query to bar for every row in foo, and you're making one query to baz for every row in bar. That's like making 20 trips to the grocery store if there are 20 things on your shopping list. This example is obvious, but sometimes you're working with an object that has database queries buried deep below the surface. You think "well, I'm creating one object for every foo, but that's not so bad". But what if the constructor makes a database query? And what if it inherits from something else that makes a few queries? As with any optimization, it's up to you whether your gains balance out the performance benefit - perhaps your interns find the less efficient version easier to write. But here are two things to remember about performance. First: Even if the difference isn't perceptible to the user (say, one second vs. five) it may be perceptible to the system (say, supporting 1000 users vs 5000 users). Second: My boss often increases his technical debt by saying "Wow! it'll only be a problem when we have ten million rows in that table? Well, that'll be great! We'll be such a big successful website! I'd love to have that problem!" But think about that. When your business is growing, your database is bulging at the seams, you're hiring new people and buying new hardware - will you say "yay, we finally get to fix all those slow queries we've written over the years!" ? Benchmarking: how to tell that your query/approach is actually faster (/timing, 'explain analyze', Benchmark is OK but remember that pages will already be cached by the fs on the later calls) (Now, I can't teach you all of SQL in 20 minutes, or the intricacies of your query planner. I'll recommend some resources for that at the end.) SEARCHES: Consider an application like a search. We want to have a clean, easy module that There are (at least) three solutions to this very common need. 1. String munging: appending WHERE conditions, etc. Usually not fully tested. 2. Sophisticated query building: similar idea but with maximum flexibility. Shrinks the query as much as possible. 3. Combining the conditions in the database - the INTERSECT approach integrity "referential integrity" - FOREIGN KEYS - Some rows are missing. Did you lose data by accident? Did your former coworker delete the data because it was incorrect? Which foos did those bars belong to anyway? Should I go and delete some bazes? - Have you heard of "legacy code"? Sooner or later someone will inherit your "legacy database". "data integrity" CONSTRAINTS - In addition to relationships, you also want to keep your data intact. - Where does bad data come from? constraints Data often comes from humans. Some people like to check values in the perl code that loads data into the database. But what are the chances that that one program will always be the only thing ever used to insert data into the database? In my experience, data also comes from: - scripts written by other people - scripts written by you, after you forget about checking that field - people making "quick fixes" at the psql prompt - you don't have to remember, with each program you write, what you have to check for - you can know that nobody else's programs or wacky psql antics are going to put in data that shouldn't be allowed - you have an extra (though small) measure of security for INSERTS from the web user - not scratching your head six months from now wondering how the wrong data got into that column. You might think "but I'm a good programmer! I wouldn't put the wrong thing in a column!" ... but where does your data come from? For most of us, it comes from untrustworthy sources, such as ... HUMANS. triggers in perl - What is a trigger? - syntax - performance tradeoff transactions - why transactions? Sandbox. Testing. Consistent view of the DB. - what AutoCommit does - savepoints - deferred FK constraints views & materialized views - Fun facts: * views don't have to be read-only. You can define rules saying what should happen when you insert or update data in a view. * we just made views on tables, but you can make views on other views, if that helps you. * you can grant different permissions on a view than on the tables that make it up. (example: create view public_foo as select from foo where censored=0;) * when you change your database's structure (it will happen someday), you won't have to update queries in programs that use views; instead you can just update the view definition. Similarly, you can create backward compatibility after a db redesign by creating a view that looks just like one of the old tables. Since regular views are generated more-or-less dynamically, a big query that takes a long time to run will still be a PITA - although you do benefit from the query being prepared ahead of time - A materialized view gets its results precomputed, so that querying it is like using a lookup table. This is like a cache of query results. - Neither postgres nor mysql officially supports materialized views, but you can achieve the same effect by creating a table and then updating its contents either from a cron job or from an insert trigger create table big_hairy_query_mview as ...; ACTION LIST: Five things you can do RIGHT AWAY to lazy up your programs. * The next time you write a query,pack as much action into it as you can * Find a table in your database that needs a foreign key, or a constraint. Add it, and relax. * Add a transaction where there was no transaction before. * Find a function that makes two database calls. Combine them. * Find Perl code that validates information before inserting it into the database. Replace that code with a database constraint. * Find a query you run too often. Make a view out of it. * Add a transaction where there was no transaction before. * Ask yourself, "Can I pack more action into this query?"