BIG DATA

Big-Data-tutorialbyte-rohan-chettri

Introduction

The term has been in use since the 1990s, with some giving credit to john mashey for coining or at least making it popular.  Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process data within a tolerable elapsed time. Its philosophy encompasses unstructured, semi-structured and structured data, however the main focus is on unstructured data.

Big data “size” is a constantly moving target, as of 2012 ranging from a few dozen terabytes to many petabytes of data. It requires a set of techniques and technologies with new forms of integration to reveal insights from datasets that are diverse, complex, and of a massive scale.

 

Big data is data sets that are so complex that traditional data processing application software are in adequate to deal with them. Big data challenges include capturing data, data storage, data analysis, search sharing, transfer, visualization, querying, and updating and information privacy.

There are five dimensions to big data known as volume, variety, velocity and the recent added veracity and value.

Lately, the term “big data” tends to refer to the use of predictive analytics, or  certain other advanced data analytics method that extract value from data, and seldom to a particular size of data set. Big data are used to extract large data –sets in areas including science work, including genomics, biology, business informatics, internet search, urban informatics and environmental research. Data sets grow rapidly in part because they are increasingly gathered by cheap and numerous information-sensing internet of things devices such as mobile aerial (r emote sensing), software logs, camera, microphones, radio-frequency identification (RFID) readers and wireless sensor networks.

                 

Background

The rise of cloud computing and cloud data stores has been a precursor and facilitator to the emergence of big data. Cloud computing is the commoditization of computing time and data storage by means of standardized technologies.

It has significant advantages over traditional physical deployments. However, cloud platforms come in several forms and sometimes have to be integrated with traditional architectures.

This leads to a dilemma for decision makers in charge of big data projects. How and which cloud computing is the optimal choice for their computing needs, especially if it is a big data project? These projects regularly exhibit unpredictable, bursting, or immense computing power and storage needs. At the same time business stakeholders expect swift, inexpensive, and dependable products and project outcomes. This article introduces cloud computing and cloud storage, the core cloud architectures, and discusses what to look for and how to get started with cloud computing.

            Characteristics

Big data can be described by the following characteristics:-

  • Volume

The quantity of generated and stored data. The size of the data determines the value and potential insight- and whether it can actually be considered big data or not.

  • Variety

The type and nature of the data. This helps people who analyze it to effectively use the resulting insight.

 

  • Velocity

In this context, the speed at which the data is generated and processed to meet the demands and challenges that lie in the path of growth and development.

  • Variability

Inconsistency of the data set can hamper processes to handle and manage it.

 

  • Veracity

The data quality of captured data can vary greatly, affecting the accurate analysis

 

 

 Relationship between cloud computing and big data

Big data can make use of the elasticity of the cloud, because big data back ends need to scale up and down with processing load and storage.

On the other hand, many cloud applications, i.e. distributed systems, have a lot of fast data to track, typically as transforming immutable event streams and pre-aggregating them into OLAP stores for later query.

Anyhow, cloud applications need to scale in their database back end as well, and that is where NewSql, BigSql, NoSql and Fast Data (a variation of Big Data) are in play.

So there is such a big intersection between distributed applications and big data, that the capstone project of Coursera Cloud Computing specialization has a Data Science orientation, on top of HadoopMapReduce, Spark, Storm and Spark streaming

 

       Highlights of BIG DATA

  • Problem solving: Whether the user is deploying traditional relational database management systems, massively distributed implementations such as Hadoop, or an integrated solution for both, Cloud4C has a solution.
  • World Advantages: At Cloud4C, Data is properly gathered, stored, sorted and analyzed, which will yield business intelligence and knowledge that translates into real world advantages for organizational analyses.
  • Tools: User also has the power to create personalized topologies with custom network and security settings to work with any Big Data module.
  • Big Data Analytics: The ability to capture real time data, big data and analytics play an important role for insurers in terms of potential customers.

 Features

  • Scalable: – Store and distribute very large data sets across hundreds of inexpensive servers Operates in parallel unlike RDBMS.
  • Cost Effective: – Big Data Hadoop is open source software that runs on commodity hardware. The cost per terabyte, for both storage and processing, is much lower than on older proprietary systems
  • Flexible:-It easily accesses new data sources and handles both structured and unstructured data.
  • Faster: – Hadoop efficiently process static as well as dynamic data.
  • Customization:-Performance is highly customizable to suite to all the possible requirements.
  • Resilient to Failure:-Hadoop is fault tolerance. Data is replicated to other nodes in cluster. Hence in the event of failure, there is another copy available for use.

Leave a Reply

Your email address will not be published. Required fields are marked *