SUBSCRIBE
-
RECENT POSTS
CATEGORIES
TAGS
Advertising apple audience banner blindness Branding Browser cocktails compliance data Database Development direct mail Email email campaign email campaigns Email client email list flash google green revolution humor Information Science Insight Interactive ipad iphone ipod touch Marketing marketing strategy mnemonics Mobile poetry QA recycling rules saving trees social networking sustainability Testing UE Usability user interface UX Virtualization visualizationARCHIVES
Getting Along with Very Large Databases
In tech jargon, a “Very Large Database,” or VLDB, is a database that contains several billion rows of data, or occupies more than a thousand gigabytes of storage space. Strictly speaking, most databases won’t ever become very large.
But databases can grow complexity, as well as size. And practically speaking, there’s a point when any database can become large enough, or complex enough, to require some special considerations for working with it effectively.
The first consideration is your time. Don’t ever accept slow performance from your database! Database performance requires tuning, and over time, every database requires a tune-up. It’s important to ensure that the voice of your database users is heard when they encounter poor database performance, so that the technical team can make useful adjustments.
If you’re waiting for a long time for the results of a selection query, for example, then ask the following question, “Could we be missing an index here?” Database indexes are comparable to indexes in books. These indexes store the location of data, and enable the database server software to retrieve data as quickly as possible. Surprisingly, even small tables can benefit from the addition of indexes.
You can’t index everything, however, because the maintenance of those indexes would become a time-consuming process for your database system, in and of itself. So there can be too much of a good thing. And while you can “throw more hardware” at a database performance problem by purchasing more server memory and/or a new, more powerful database server, that is generally a costly decision that is frequently unnecessary.
Direct Partners has observed that most database performance problems are caused by overly complex queries that include numerous filter criteria and/or join numerous large tables at the same time.
You can achieve the fastest selection time by breaking up large, slow-performing queries into a series of smaller queries, as if you were telling someone a story. You wouldn’t tell someone a story in one breath, and you can take your analysis one step at a time. By a process of selecting subsets of data into temporary tables, and writing successive queries that build upon and leverage these subsets of data to compile a final result set, you will be giving your database server less work to do at any one time and free it up to maximize its performance.
Sometimes it’s not the database performance that’s slowing you down, however. It’s navigating the contents of the database to find what you’re looking for! Databases are often enhanced to address new business needs with the addition of new data tables. And frequently, existing tables are no longer needed to store particular sources of data, but are retained in the database because they contain valuable data that’s appropriate to keep for analysis.
As new tables are introduced to store additional data sets, the relationships between all data sets become more nuanced. Furthermore, as the ranges of possible data values increase within your database tables, it’s easy for new data to get overlooked.
It’s crucial to keep everyone informed about database changes, via email, intranet blogs, and/or technical documentation and user guides. I think even written documentation can benefit from word-of-mouth notification that there’s been a database change. So take a moment in your regularly scheduled in-person meetings to ask “What’s new?” in your database systems.
Being informed about the latest database enhancements doesn’t necessarily translate into speed and ease when working with the contents of a database. It always takes time to produce well-reasoned, validated results, especially when working with a complex, evolving database. While you can demand speed from your database server software, you shouldn’t expect all of your analyses to be completed quickly.
All of these considerations become magnified as the number of people using a database system grows. And so you’ll know you’re getting along with your database system, no matter how large it becomes, when it’s handling the additional usage without sacrificing performance and when you’re not having to revise your analyses because relevant data was overlooked.