Bigtable: A Distributed Storage System for Structured Data

To be able improve applicability, scalability, performance and availability in data storage for large data, the authors have implemented and deployed a distributed storage system which is called Bigtable, and this would be the main motivation of the paper. To manage large data, the system provides a simple data model for dynamic control over data layout and format for clients as describe as following paragraph.

For their contributions, the authors have spent roughly seven person-years on design and implementation. They have introduced an interesting model which a map data structure, the concept of row and column families, and time stamps which form the basic unit of access control and so on. Also the refinements and the performance evaluation which describes in the paper have shown an improvement. Three of the real applications or products have success by using the Bigtable implementation and concepts.

The paper’s single most noticeable deficiency already describes by the authors in the paper which are the following. For example, consideration of the possibility of multiple copies of the same data doesn’t count; a permission to let the user tell us what data belongs in memory and what data should stay on the disk rather than trying to determine this dynamically. Lastly, there are no complex queries to execute or optimize. The Bigtable seems to take to another whole level of manipulating the data, however my question is still concerned about the networking such that it seems to me that the latency plays an important role to be able to retrieve or display the result of queries. In my personal opinion, there is still a bottle neck because it is a distribute servers which require a high-performance network infrastructure to achieve the highest performance.

I would rate the significant of the paper 5/5(breakthrough) because of the Bigtable model system is amazing such that it could adapts to handle some very large data, and it has been used in many popular application that we have been using nowadays, for examples, Google products such as Google earth and Google analytics and etc. The concept of adding a new machine when it needs more performance to perform database operations is spectacularly. I believe that the Bigtable will be very useful in future use, and we will most likely to see the next coming products from such companies take this model to approve their use of database.

Bigtable: A Distributed Storage System for Structured Data, F. Chang, J. Dean, S. Ghemawat, W. Hsieh, D. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. Gruber, Proc. of the 7th Conf. on USENIX Sym. on Operating Systems Design and Implementation, November 2006, pp. 205-218.

