Big Data – Role of Cloud Computing in Big Data – Day 11 of 21

In yesterday’s blog post we learned the importance of the NewSQL. In this article we will understand the role of Cloud Computing in Big Data Story

What is Cloud?

Cloud is the biggest buzzword around from last few years. Everyone knows about the Cloud and it is extremely well defined online. In this article we will discuss cloud in the context of the Big Data. Cloud computing is a method of providing a shared computing resources to the application which requires dynamic resources. These resources include applications, computing, storage, networking, development and various deployment platforms. The fundamentals of the cloud computing are that it shares pretty much share all the resources and deliver to end users as a service.

 Examples of the Cloud Computing and Big Data are Google and Both have fantastic Big Data offering with the help of the cloud. We will discuss this later in this blog post.

There are two different Cloud Deployment Models: 1) The Public Cloud and 2) The Private Cloud

Public Cloud Computing

Public Cloud is the cloud infrastructure build by commercial providers (Amazon, Rackspace, etc.) creates a highly scalable data center that hides the complex infrastructure from the consumer and provides various services.

Private Cloud Computing

Private Cloud is the cloud infrastructure build by a single organization where they are managing highly scalable data center internally.

Here is the quick comparison between Public Cloud and Private Cloud from Wikipedia:

Public CloudPrivate Cloud
Initial costTypically zeroTypically high
Running costUnpredictableUnpredictable
PrivacyNo (Host has access to the dataYes
Single sign-onImpossiblePossible
Scaling upEasy while within defined limitsLaborious but no limits

Hybrid Cloud

Hybrid Cloud is the cloud infrastructure build with the composition of two or more clouds like public and private cloud. Hybrid cloud gives best of the both the world as it combines multiple cloud deployment models together.

Cloud and Big Data – Common Characteristics

Big Data - Role of Cloud Computing in Big Data - Day 11 of 21 big-data-cloud There are many characteristics of the Cloud Architecture and Cloud Computing which are also essentially important for Big Data as well. They highly overlap and at many places it just makes sense to use the power of both the architecture and build a highly scalable framework.

Here is the list of all the characteristics of cloud computing important in Big Data

  • Scalability
  • Elasticity
  • Ad-hoc Resource Pooling
  • Low Cost to Setup Infastructure
  • Pay on Use or Pay as you Go
  • Highly Available

Leading Big Data Cloud Providers

There are many players in Big Data Cloud but we will list a few of the known players in this list.


Amazon is arguably the most popular Infrastructure as a Service (IaaS) provider. The history of how Amazon started in this business is very interesting. They started out with a massive infrastructure to support their own business. Gradually they figured out that their own resources are underutilized most of the time. They decided to get the maximum out of the resources they have and hence  they launched their Amazon Elastic Compute Cloud (Amazon EC2) service in 2006. Their products have evolved a lot recently and now it is one of their primary business besides their retail selling.

Amazon also offers Big Data services understand Amazon Web Services. Here is the list of the included services:

  • Amazon Elastic MapReduce – It processes very high volumes of data
  • Amazon DynammoDB – It is fully managed NoSQL (Not Only SQL) database service
  • Amazon Simple Storage Services (S3) – A web-scale service designed to store and accommodate any amount of data
  • Amazon High Performance Computing – It provides low-tenancy tuned high performance computing cluster
  • Amazon RedShift – It is petabyte scale data warehousing service


Though Google is known for Search Engine, we all know that it is much more than that.

  • Google Compute Engine – It offers secure, flexible computing from energy efficient data centers
  • Google Big Query – It allows SQL-like queries to run against large datasets
  • Google Prediction API – It is a cloud based machine learning tool

Other Players

Besides Amazon and Google we also have other players in the Big Data market as well. Microsoft is also attempting Big Data with the Cloud with Microsoft Azure. Additionally Rackspace and NASA together have initiated OpenStack. The goal of Openstack is to provide a massively scaled, multitenant cloud that can run on any hardware.

Thing to Watch

The cloud based solutions provides a great integration with the Big Data’s story as well it is very economical to implement as well. However, there are few things one should be very careful when deploying Big Data on cloud solutions. Here is a list of a few things to watch:

  • Data Integrity
  • Initial Cost
  • Recurring Cost
  • Performance
  • Data Access Security
  • Location
  • Compliance

Every company have different approaches to Big Data and have different rules and regulations. Based on various factors, one can implement their own custom Big Data solution on a cloud.


In tomorrow’s blog post we will discuss about various Operational Databases supporting Big Data.

Reference: Pinal Dave (

Cloud Computing, SQL Server
Previous Post
Big Data – Buzz Words: What is NewSQL – Day 10 of 21
Next Post
Big Data – Operational Databases Supporting Big Data – RDBMS and NoSQL – Day 12 of 21

Related Posts

2 Comments. Leave new

  • We all understand the future is going towards cloud computing and using other companies to host their data and applications. One of the concerns is the performance of these applications in the cloud. Since your application is on a shared resource you may not know how fast your application will run until it is fully implemented. Do you truly know how much bandwidth is available to your application? Hopefully with time better metrics will be provided.


Leave a Reply