Taming Big Data

Big Data can be a beast. Data volumes are growing exponentially.The types of data being created are likewise proliferating. And the speed at which data is being created – and the need to analyze it in near real-time to derive value from it – is increasing with each passing hour.

But Big Data can be tamed. We’ve got living proof. Thanks to new approaches for processing, storing and analyzing massive volumes of multi-structured data – such as Hadoop and MPP analytic databases — enterprises of all types are uncovering new and valuable insights from Big Data everyday.

Leading the way are Web giants like Facebook, LinkedIn and Amazon. Following close behind are early adopters in financial services, healthcare and media. And now it’s your turn. From marketing campaign analysis and social graph analysis to network monitoring, fraud detection and risk modeling, there’s unquestionably a Big Data use case out there with your company’s name on it.

In an era where Big Data can greatly impact a broad population, many novel opportunities arise, chief among them the ability to integrate data from diverse sources and “wrangle” it to extract novel insights. Conceived as a tool that can help both expert and non-expert users better understand public data, MATTERS was collaboratively developed by the Massachusetts High Tech Council, WPI and other institutions as an analytic platform offering dynamic modeling capabilities. MATTERS is an integrative data source on high fidelity cost and talent competitiveness metrics. Its goal is to extract, integrate and model rich economic, financial, educational and technological information from renowned heterogeneous web data sources ranging from The US Census Bureau, The Bureau of Labor Statistics to the Institute of Education Sciences, all known to be critical factors influencing economic competitiveness of states. This demonstration of MATTERS illustrates how we tackle challenges of data acquisition, cleaning, integration and wrangling into appropriate representations, visualization and story-telling with data in the context of state competitiveness in the high-tech sector.

Unstructured data refers to information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, numbers, and facts as well. This results in irregularities and ambiguities that make it difficult to understand using traditional programs as compared to data stored in fielded form in databases or annotated (semantically tagged) in documents.

Data in greater volume, velocity and variety has some business leaders riding a big data analytics tiger in search of new commercial opportunities. Now, several years into the big data era, some taming of the tiger seems to be in order. That’s according to several data professionals from large enterprises on hand at IBM’s recent Information On Demand (IOD) conference in Las Vegas. While they see potential for a new breed of data-driven applications, they also see a need to reign in unbridled efforts, which means applying more rigorous planning, refining analytics skills and instituting more data governance.

Data variety and ‘dangerous insights’

Telecommunications giant Verizon Wireless, also headquartered in New York, has always had data volume and velocity to deal with. What’s new is the variety of data that big data analytics must work with, said Ksenija Draskovic, a Verizon predictive analytics and data science manager, who discussed the implications for predictive analytics in another IOD session.

C-suite buy-in of big data kind

Madhavan said he and his JP Morgan Chase colleagues have worked to create better planning methodologies to deal with big data. Steps are in place to ensure business users have an idea beforehand of the kind of data they want to work with, what business goals they hope to achieve, and what kind of revenue can be expected if the new application is wildly successful.