Open source software and data warehousing go together like beer and pretzels One of the areas where we’re seeing more and more adoption of open source technology is in data warehousing. If you think about it, the reasons are pretty obvious. First of all, there’s more and more data! Every online application, Web site, e-commerce, travel, retail or advertising application is generating more and more data. People are doing more and more analysis and looking for patterns among the data. After all, if by doing further analysis of your data you can figure out a more efficient way to serve your customers or a more compelling offer to make to them, then you have an advantage in the market. Heck, that’s what the online giants like Google, Yahoo, and Amazon are doing all the time. They’re constantly evaluating data and tuning algorithms in order to take smarter actions.The second factor driving open source adoption in this space is that the traditional approaches to datawarehousing with expensive proprietary hardware are just too darned expensive. Old school datawarehousing systems can cost millions of dollars to implement. So no surprise we find over 25% of MySQL users are doing some kind of data warehousing, typically using the MyISAM storage engine and often in conjunction with open source BI or reporting such as Pentaho or JasperSoft. While MyISAM is fast, it was never designed for large scale datawarehouses and it can tap out at a couple of terabytes in many cases.But given MySQL’s pluggable storage engine API, we’re starting to see more and more choices for developers. The most recent update is from InfoBright. They’ve delivered a great column-oriented storage engine for MySQL that enables people to develop much larger datawarehouses, up to 10 TB and beyond, without requiring specialized hardware. Column-oriented databases enable much faster analysis of data in aggregate over traditional row-based databases. As such, data can also be compressed thereby reducing storage by a factor of 10x ore more. The net result is a huge gain in performance. Some queries that could take hours in a traditional database system can be completed in seconds. InfoBright has announced that they are open sourcing their storage engine with a new InfoBright Community Edition (ICE) available for download at www.infobright.org.Also, Sun has made an investment in InfoBright latest funding round, in addition to its earlier investment in Greenplum which uses a heavily parallelized approach to data warehousing for data warehouses that go much larger than 10TB. Open Source