Big Data has taken the software industry by storm. Some of the interesting statistics hint that:-
Big data is a dynamic that seemed to appear from almost nowhere. But in reality, Big Data is not new – and it is moving into mainstream and getting a lot more attention. The growth of Big Data is being enabled by inexpensive storage, a proliferation of sensor and data capture technology, increasing connections to information via the cloud and virtualised storage infrastructure, as well as innovative software and analysis tools. It is no surprise then that business analytics as a technology area is rising on the radars of CIOs
and line-of-business (LOB) executives.
Much has been written on how the amount of data in the world is exploding in volume. According to a recent study, the amount of information created and replicated will surpass 1.9 zettabytes (1.8 trillion gigabytes) in 2011 – growing by a factor of 9 in just five years.
- Big data is a top business priority and drives enormous opportunity for business improvement. Wikibon’s own study projects that big data will be a $50 billion business by 2017.
- Market research firm IDC has released a new forecast that shows the big data market is expected to grow from $3.2 billion in 2010 to $16.9 billion in 2015
- 94% of Hadoop users perform analytics on large volumes of data not possible before; 88% analyze data in greater detail; while 82% can now retain more of their data.
Big data is a dynamic that seemed to appear from almost nowhere. But in reality, Big Data is not new – and it is moving into mainstream and getting a lot more attention. The growth of Big Data is being enabled by inexpensive storage, a proliferation of sensor and data capture technology, increasing connections to information via the cloud and virtualised storage infrastructure, as well as innovative software and analysis tools. It is no surprise then that business analytics as a technology area is rising on the radars of CIOs
and line-of-business (LOB) executives.
To me no other technology in the recent past has directly affected the lives of so many people, right from the developer to product owner to marketing dept, to CIO and finally the customer himself.
And rightly so, if something is touching so many lives, is it okay to mean different things to different people?
Most people define Big Data as the 3 V's -- velocity , variety and volume of data.
Big Data is not so much about the content that is created, nor is it even about just consumption. It
is more about the analysis of the data and how that needs to be done. Although the varied variety of content (unstructured, semi-structured) does play a huge role, it is not really a ‘thing’, but
instead a dynamic/activity that crosses many IT borders.
With the focus on Big Data going mainstream, a range of new technologies have hit the market. The table
below gives an overview of these technologies, with associated context (note that the list is not exhaustive).
| Technology | Context |
|---|---|
| Big Table | Proprietary distributed database system built on the Google File System. Inspiration for HBase. |
| Data Warehouse & BI | Consists of an integrated set of servers, storage, operating system(s),database, business intelligence, data mining and other software specifically pre-installed and pre-optimised for data warehousing. |
| Hadoop | Multiple computers, communicating through a network, used to solve a common computational problem. The problem is divided into multiple tasks, each of which is solved by one or more computers working in parallel. Improved price:performance ratio, higher reliability and more scalability. |
| NoSQL / Key value store | A non-relational database is one that does not store data in tables (rows and columns) – in contrast to a relational database. Key Value Stores allow for the management of schema-less (noSQL) entities. E.g. Hbase, Cassandra, Couchbase, MongoDB etc. |
| Machine Learning | Machine learning is a field that is closely related to data mining and often uses techniques from statistics, probability theory, pattern recognition, and a host of other areas. It's used to build systems like those at Netflix and Amazon that recommend products to users based on past purchases, or systems that find all of the similar news articles on a given day. It can also be used to categorize Web pages automatically according to genre (sports, economy, war, and so on) or to mark e-mail messages as spam. |
Below is the architecture diagram of how the whole things comes together. We will cover the diagram and its components in more detail in my upcoming posts.
Conclusion
Apart from the 3 V's mentioned above, Big Data is also equally about the 4th V -- 'Value'.
It is about creating value out of data using about some or all of the above technologies. It entails data analytics by using technologies like NoSQL to store vast amounts of historical data and process it on commodity hardware using technologies like Hadoop. The real power of data comes out of co-relating different sources of related data, hence it is imperative to use it in conjunction with data stored in existing data ware houses.
In future articles we will deal with each of the above viz. No SQL, Hadoop and Machine Learning individually.

No comments:
Post a Comment