Section Background
Articles 29th January 2024

Big Data Breakdown: Understanding the 5V Challenges

Data can be many things – in the right hands, it can be transformative, illustrative, and powerful. A doctor, with data, may be able to identify the cause of an issue that’s afflicting a patient. An analyst, empowered with data, may be able to identify characteristics of a company that may be an acquisition opportunity.

No matter whether you’re a big business or a sole trader, data can be of critical importance. Let’s face it, though – data can be tough to work with. There is only one place in the world that has clean data all the time – and that’s the classroom. Let’s explore how the skills and strategies acquired through an online master in business analytics can help to work through some of the issues that occur in data sets all over the world.

Volume – Can You Turn Down The Data?

Let’s face it – data is everywhere. From the images that you take on your cameraphone, to the words you type when searching the web, as individuals, we generate tens of thousands of data points that get fed into multitudes of systems on a daily basis.

From social media platforms to the shopping list you save on your phone, data is everywhere. This can be anything from customer records, to service support documentation – with queries from all over the organization being collected, many thousands, and sometimes millions of records can be generated on a daily basis.

The amount, or volume, of data generated, represents a physical storage problem for business. While a small amount of data may be manageable, as businesses grow, and their operational requirements grow, the amount of data that may be captured by an organization may grow exponentially.

Being able to scale up with data is immensely important for business – if you’re choking on the volume of data you’re working with, you’re setting yourself up to fail. Volume presents a singular challenge for business – assess what data you store – what are you using, and what do you really need to position your business for future success?

Velocity – How Fast Do You Want It?

While volume represents one type of data problem for a business, the velocity at which data is generated presents our next problem – velocity.

Consider the realm of transactional data. You jump online, find a product that you like, rush through checkout, and in a few short days, your order is at your door. This is an example of transactional data – and if you’re a company that does any form of sales, this can be coming in fast. Consider the amount of data generated by online shipping platforms like Amazon – transacting billions of dollars worth of sales per day can generate a lot of receipts and customer information.

Understanding how fast data is being generated can be of critical importance to analysts and decision-makers within an organization. When millions of records can be generated in a day, is it still relevant to try and analyze data over extended periods of time, particularly when they are impacted by events such as the recent pandemic?

Velocity presents a unique challenge for analysts – based on the speed at which data is created, an analyst must ask, when is the best time to review and assess the data generated? Is it appropriate to wait an extended period before making a business decision, or is there enough information to make an informed decision immediately?

Variety – Would You Like (Hard) Copies With That?

While the data points we’ve been discussing in this article have explored transactional data so far, let’s be realistic – data is far more complex than what is simply stored on the internet. Data can be conveyed and presented in a lot of different mediums. For example, the automated reads on one of the more than 110 million electricity meters outside of properties in the United States, the printed shipment notes listed on packages from a warehouse, or the delivery paperwork managed at a big box retailer.

Data can be stored in many ways – from structured data (such as that stored in databases) to semi-structured data (such as forms). In fact, there’s also some data that’s stored in an unstructured form – such as that presented in text, audio, or video format. This article is one such example of unstructured data – an analyst may find it beneficial to use tools such as frequency analysis to determine if, as a writer, I have a preference for certain word phrasings.

Variety presents a unique challenge for business analysts – and raises questions on how data can be transformed in a way to create meaningful outcomes for a business. If data is structured, it may be simple to work with – but does a database provide the same level of information for a survey that may be available in say, handwritten feedback cards?

Value – What Does This Mean?

Data can be generated in a range of different ways – and ultimately, while data can be collected, it remains to be seen whether information is always useful.

Consider, for example, a set of directions produced by a GPS system in a car. Contextually, it might make sense if a driver is on their way to a destination – but for a transportation company, understanding other metrics such as the miles traveled and fuel consumed may be of more importance than simply what roads a vehicle is traveling on.

The data points that are generated by any single system may be costly to store. As a result, it becomes imperative for those who design databases and work with information to ensure that only necessary data is stored – and not frivolous, unnecessary information that is likely to be a financial burden on a business in the future.

For business analysts, the value problem of data will always make them consider the following – with all the data that is live and available to them, what data points provide the most appropriate answers to the questions that they’re facing?

Veracity – Can I Trust What You Say?

A classic Cold War saying is the Russian rhyming proverb trust, but verify. Popularised by US President Ronald Reagan in the 1980s, it highlighted a need for the ability of the United States and then USSR to trust each other on nuclear disarmament, provided they had made steps towards denuclearisation.

While data has come a long way from the dot matrix printers of the 1980s, the saying applies rather nicely to the data sets of today’s modern world. Data can be generated by many things, including large language models such as ChatGPT. In fact, some large language models can do an excellent job of producing spectacularly incorrect data.

Veracity presents an ongoing challenge to a business analyst and highlights the need for ongoing verification, testing, and validation. Veracity makes us ask, if data can be brought in from a wide variety of different sources, what methods and techniques can be used on the data to ensure that it is from a trusted source, and not simply false information?

No matter whether you work in a small business or a large corporation, data issues can present themselves in a variety of ways. For the seasoned business analyst, understanding how the 5Vs can cause challenges in your data can be a great way of narrowing your focus and working towards addressing some of the issues presented when dealing with the volume, velocity, variety, value, and veracity of data in today’s modern world.