Big data refers to data sets that stand out on three criteria, all beginning with the letter V:
- Volume: large amounts of data
- Variety: different, often unstructured data formats from different sources
- Velocity: tight time requirements for data analysis.
Big data analyses are often based on sources such as social media and mobile apps. But even in-house data contains a wealth of information that conventional techniques have not previously been able to sort out.
An example would be information from the processing of payments that get posted to accounts conventionally, but analysing them for typical fraud patterns, for example, would exceed the capacity of ordinary database servers. Every company compiles huge logs of activity on its servers, but the quantity of data is so large that the logs are only analysed post-facto in exceptional cases.
External data, such as that from social networks, can reveal a lot of useful information about customer preferences. Videos and blog entries contain reviews of products, both those of your company and those of your competitors. Also, keeping an eye on company websites can provide helpful clues about growth, future investments or potential problems.
Making Unstructured Data Usable
The examples given above are already indicative of the variety of data formats out there. Often, it is so unstructured that there is not much of a format to speak of, as is the case with texts, video and audio files. What is needed is a way to extract the relevant information from the unstructured data to make it available for analysis.
Even then, however, the data that is extracted still cannot be analysed like data from a relational database. Modelling the data to a relational data structure would require the kind of in-depth understanding that would only be available after extensive analysis.
Therefore, the data is analysed with inductive statistics and pattern recognition. It takes an expert user intimately familiar with a system to determine the information that can be obtained from the data, the patterns that seem plausible and which refinements may still be necessary to make it ready for analysis.
New Roles for IT and Other Departments
Big data is changing cooperation between IT and the other departments in a company. It has moved beyond the old idea of running reports with everything calculated to the last cent. Instead, each department needs to be able to perform exploratory analysis of its data from various perspectives. New sources of data thus need to be swiftly integrated.
The traditional report of sales figures from the traditional database will continue to have a place, but it will be supplemented by such things as much more precise data about customers and their needs which will be based on big data analysis.
Software Innovation Campus
As a member of the Software Innovation Campus initiative in Paderborn, S&N is pushing the issue of big data in the relevant working group.