What is data?
Computers generally use many different types of data stored in digital format such as video, text, documents.What is big data?
Big data is the collection of audio, video, text, documents in large volume such that it cannot be stored or processed using traditional computer system approach within given time.
It is also defined as it is used to describe massive volume of both structured and unstructured data which is very large and complex to process using traditional database and software.
How huge the data could be?
Generally when we refer to big data there would be a misconception that the data would be in Gigabytes, Terabytes, Petabytes. This might be big data but not the complete definition.
Even a small amount of data could be referred as big data depending on its usage.
Example:
If we try to attach a file of 100 MB in a mail, we wont be able to do so.
Thus with respect to email this 100 MB of data can be referred as big data.
Real world examples of big data
For example youtube, facebook, twitter, google, receives a lot of data on daily basis.
Facebook alone receives 500 TB of data, google receives 10 TB of data per day and twitter 400 million tweets per day.
As the number of users increases on these sites storing and processing becomes difficult and challenging. As data holds lots of information it cannot be processed easily within short span of time using traditional computer system we cannot accomplish this task within given time as computer systems are not sufficient to process and hold the data.
Example:2
The aircrafts while they are flying they are flying they keep transmitting a lot of data to air traffic control located at the airports. The air traffic control uses this data to track and monitors status and progress of the flight.
Since multiple aircrafts will be transmitting a huge amount of data simultaneously which becomes a challenging task to manage and process data. Thus this is also referred as big data.
What generates big data?
Some sources which generate big data
1.Scientific instruments - Satellites
2.Social media and networks
3.Mobile devices
4.Security cameras
Types of Big data
1.Structured big data - Refers to data which is already stored in database in an ordered manner
2. Unstructured big data - Which has no clear format in stored like structured.
3. Semi-structured big data - As the name refers it is unclear data.
Characterization of big data
1.Volume
Organizations collect from the social media, business, information from machine to machine data which requires a large volume of data to store.
2.Velocity
Rate at which the data flows is faster.
Organizations generate new data at rapid speed and needs to respond in real time, you have velocity associated with big data.
3.Variety
Data comes in all types of format from structured, unstructured, numeric data.
Why big data is prominent?
Due to its cheap infrastructure to store data and to process data faster.
Previously we have the cheap infrastructure to store but to process it is more expensive.
Challenges facing in big data
In October 2016 report found that organizations were stuck in initial stage of big data. Only 15% of business reported deploying that big data project to production efficiently unchanged from last year.It clearly shows that organizations are facing some major challenges when it comes to implementing their big data strategies.
1.Storage and analyzing data that is large and rapidly growing then deciding how precisely handle the data.
2. Securing big data
security is also a big concern for organizations with big data stores.
In a survey the organizations told that they use additional measures to secure big data such as data encryption and data segregation.
3.Validating data
Organizations often are getting similar pieces of data from different systems and data in those systems does not always agree.For example hospitals electronic health record may have one address for patient while pharmacy has different address on record.The process of getting those
records to agree as well as make sure that these records are accurate , usable and secure is called data governance.
Solving these are complex and data management solutions designed to simplify.
Other challenges may occur while integrating big data. Some include skill availability,solution cost.
Tools
1. Apache Hadoop
It is a java based free software frame work that can efficiently store large amount of data in cluster. This frame work runs in parallel on a cluster and has ability to allow us to process data across nodes.
Hadoop Distributed File System(HDPS) is the storage system of hadoop which splits up big data and distribute across many nodes in cluster.
2. Microsoft HD Insight
It is Big data solution from microsoft powered by Apache hadoop which is available as a service in cloud. HD Insight uses windows Azure Blob storage as a default file system. This provides high availability with low cost.
3. NoSQL
Traditional SQL can be efficiently used to handle large amount of structured data, we need NoSQL (Not only SQL) to handle the unstructured data. NoSQL gives better performance in storing massive amount of data . NoSQL databases store unstructured with no particular schema i.e each row can have it's own set of columns.
4. Hive
Distributed data management for hadoop .
Supports HiveSQL to access bigdata.
Used for datamining.
5. Sqoop
Connects Hadoop with various relational databases to transfer the data.
6. Big data in Excel
We can also connect data stored using Excel. People find more easier to handle data with Excel.
Applications of bigdata
1. It has wide range of applications in many areas such as business, healthcare, manufacturing, scientific research, media and entertainment, weather forecasting.
2. Help companies make more informative business decisions by analyzing large volume of data.
3. Used for webservers,social media content, customer email, phone call details.
4. Big data had improved healthcare by making the researchers to mine th data to see what treatments are effective for particular condition.
Big data is a term used to refer to data sets that are too large or complex for traditional data-processing application software to adequately deal with. It was originally associated with three key concepts: volume, variety, and velocity.
ReplyDeleteThe above information in which I expressed is all that I know.The information in which you shared one after the other is quite interesting.The more I read,the more I earned knowledge.
I was looking for something a little different. Was thinking this might help me better understand what data needs to be used in decision making and what processes can be utilized in shifting information. However, the information shared on getting data and manipulated was very helpful.
ReplyDeleteIn this, we will differentiate between the Data Science, Big Data, and Data Analytics, based on what it is, where it is used, the skills you need to become a professional in the field, and the salary prospects in each field.
ReplyDeleteThe Dark Web sites generally use the Tor encryption tool to mask their identities due to which they can keep their activities hidden. The tool basically functions just like a VPN and consistently randomizes the host’s location to a different country so its almost impossible to detect where the user is.Big Data now a days it is a trending topic
ReplyDeletebig data is all about when we refer to big data there would be a misconception that the data would be in Gigabytes, Terabytes, Petabytes. This might be big data but not the complete definition.
ReplyDeleteEven a small amount of data could be referred as big data depending on its usage.
the way u explained is really awesome we understood very clearly .....thanks for sharing your experience wt us..
ReplyDeleteBig data is a collection of audio,video,text,documents in large volume.It cannot be stored
ReplyDeleteor processed using traditional computer system. youtube, Facebook, Twitter,google receives a lot of data on daily basis.
It is a good topic.Big data is very important topic to be aware of.Yes you are correct there is a misconception that big data is about large amounts of data but is about the site that accepts amount of data accepts by the site we use.
ReplyDeleteyour article throws light into the present state of quality issues related to Big Data. It provides valuable insights that can be used to leverage Big Data science activities.
ReplyDeleteYou have mentioned each and every point related to "big data" with real life examples and also you have explained the importance of big data. The topic is so interesting and I found depth in your topic
ReplyDeleteNow a days data is very important .It is very good topic .It has wide range of applications in many areas such as business, healthcare, manufacturing, scientific research, media and entertainment, weather forecasting.
ReplyDeleteYes ,Big data is a collection of huge amount of data and it is very surprising to know how that large amount of data is stored .
ReplyDeleteIn the information era enorm amounts of data have become available on hand to decision makers. Big data refers to datasets that are not only big, but also high in variety and velocity, which makes them difficult to handle using traditional tools and
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteyour topic is quite interesting and trending one too.it is a topic where every signal word should be discussed and u have done it well.
DeleteAs there are many fields of transfer of information they produce in large amount of data and it should be saved for further information which is of large amount
ReplyDeleteData is a collection of huge amount of data. Now a days data is very important. The information you gave about big data is very clear.
ReplyDeleteNow a days wherever we go and whatever the field may be the most frequently heard term is data.So when we think what this data is,it is some useful information which is required to us or from us.Big data is a very important and useful way of toring such huge data.
ReplyDeleteIn the information era, enormous amounts of data have become available on hand to decision makers. Big data refers to data sets that are not only big, but also high in variety and velocity, which makes them difficult to handle using traditional tools and techniques.
ReplyDeleteThis topic is very interesting to read and thank you for sharing the valuable information about Big Data. The information you provided was very easy to read and understand. I gathered a lot of information on your Big Data blog.