In yesterday’s blog post we learned the importance of the NewSQL. In this article we will understand the role of Cloud in Big Data Story
What is Cloud?
Cloud is the biggest buzzword around from last few years. Everyone knows about the Cloud and it is extremely well defined online. In this article we will discuss cloud in the context of the Big Data. Cloud computing is a method of providing a shared computing resources to the application which requires dynamic resources. These resources include applications, computing, storage, networking, development and various deployment platforms. The fundamentals of the cloud computing are that it shares pretty much share all the resources and deliver to end users as a service.
Examples of the Cloud Computing and Big Data are Google and Amazon.com. Both have fantastic Big Data offering with the help of the cloud. We will discuss this later in this blog post.
There are two different Cloud Deployment Models: 1) The Public Cloud and 2) The Private Cloud
Public Cloud is the cloud infrastructure build by commercial providers (Amazon, Rackspace etc.) creates a highly scalable data center that hides the complex infrastructure from the consumer and provides various services.
Private Cloud is the cloud infrastructure build by a single organization where they are managing highly scalable data center internally.
Here is the quick comparison between Public Cloud and Private Cloud from Wikipedia:
|Public Cloud||Private Cloud|
|Initial cost||Typically zero||Typically high|
|Privacy||No (Host has access to the data||Yes|
|Scaling up||Easy while within defined limits||Laborious but no limits|
Hybrid Cloud is the cloud infrastructure build with the composition of two or more clouds like public and private cloud. Hybrid cloud gives best of the both the world as it combines multiple cloud deployment models together.
Cloud and Big Data – Common Characteristics
There are many characteristics of the Cloud Architecture and Cloud Computing which are also essentially important for Big Data as well. They highly overlap and at many places it just makes sense to use the power of both the architecture and build a highly scalable framework.
Here is the list of all the characteristics of cloud computing important in Big Data
- Ad-hoc Resource Pooling
- Low Cost to Setup Infastructure
- Pay on Use or Pay as you Go
- Highly Available
Leading Big Data Cloud Providers
There are many players in Big Data Cloud but we will list a few of the known players in this list.
Amazon is arguably the most popular Infrastructure as a Service (IaaS) provider. The history of how Amazon started in this business is very interesting. They started out with a massive infrastructure to support their own business. Gradually they figured out that their own resources are underutilized most of the time. They decided to get the maximum out of the resources they have and hence they launched their Amazon Elastic Compute Cloud (Amazon EC2) service in 2006. Their products have evolved a lot recently and now it is one of their primary business besides their retail selling.
Amazon also offers Big Data services understand Amazon Web Services. Here is the list of the included services:
- Amazon Elastic MapReduce – It processes very high volumes of data
- Amazon DynammoDB – It is fully managed NoSQL (Not Only SQL) database service
- Amazon Simple Storage Services (S3) – A web-scale service designed to store and accommodate any amount of data
- Amazon High Performance Computing – It provides low-tenancy tuned high performance computing cluster
- Amazon RedShift – It is petabyte scale data warehousing service
Though Google is known for Search Engine, we all know that it is much more than that.
- Google Compute Engine – It offers secure, flexible computing from energy efficient data centers
- Google Big Query – It allows SQL-like queries to run against large datasets
- Google Prediction API – It is a cloud based machine learning tool
Besides Amazon and Google we also have other players in the Big Data market as well. Microsoft is also attempting Big Data with the Cloud with Microsoft Azure. Additionally Rackspace and NASA together have initiated OpenStack. The goal of Openstack is to provide a massively scaled, multitenant cloud that can run on any hardware.
Thing to Watch
The cloud based solutions provides a great integration with the Big Data’s story as well it is very economical to implement as well. However, there are few things one should be very careful when deploying Big Data on cloud solutions. Here is a list of a few things to watch:
- Data Integrity
- Initial Cost
- Recurring Cost
- Data Access Security
Every company have different approaches to Big Data and have different rules and regulations. Based on various factors, one can implement their own custom Big Data solution on a cloud.
In tomorrow’s blog post we will discuss about various Operational Databases supporting Big Data.
Reference: Pinal Dave (http://blog.sqlauthority.com)