Big Data

Big data is a term for data sets that are so large or complex that traditional data processing application softwareis inadequate to deal with them. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating and information privacy.

Lately, the term “big data” tends to refer to the use of predictive analytics, user behavior analytics, or certain other advanced data analytics methods that extract value from data, and seldom to a particular size of data set. “There is little doubt that the quantities of data now available are indeed large, but that’s not the most relevant characteristic of this new data ecosystem.” Analysis of data sets can find new correlations to “spot business trends, prevent diseases, combat crime and so on.” Scientists, business executives, practitioners of medicine, advertising and governments alike regularly meet difficulties with large data-sets in areas including Internet search, fintech, urban informatics, and business informatics. Scientists encounter limitations in e-Science work, including meteorology, genomics, connectomics, complex physics simulations, biology and environmental research.

Relational database management systems and desktop statistics- and visualization-packages often have difficulty handling big data. The work may require “massively parallel software running on tens, hundreds, or even thousands of servers”. What counts as “big data” varies depending on the capabilities of the users and their tools, and expanding capabilities make big data a moving target. “For some organizations, facing hundreds of gigabytes of data for the first time may trigger a need to reconsider data management options. For others, it may take tens or hundreds of terabytes before data size becomes a significant consideration.”

  • The Brain’s On-Ramps
When facing a challenging scientific problem, researchers often turn to supercomputers. These powerful machines crunch large amounts of data and, with the right software, spin out images that make the data easier to understand. Advanced computational methods and technologies, for instance, can provide unprecedented maps of the human brain. In the image at left, the colors represent white matter pathways that allow distant parts of the brain to communicate with each other. For over 30 years, the National Science Foundation has invested in high-performance computing, both pushing the frontiers of advanced computing hardware and software and providing access to supercomputers for researchers across a range of disciplines. Use of NSF-supported research cyberinfrastructure resources is at an all-time high and continues to increase across all science and engineering disciplines.

  • Stellar Turbulence

Simulations help astrophysicists understand and model the turbulent mixing of star gases. This image, created at the Pittsburgh Supercomputing Center (PSC), depicts a 3-D mixing layer between two fluids of different densities in a gravitational field. In this case, a heavy gas is on top of a lighter one. This type of mixing plays an essential role in stellar convection. Understanding mixing dynamics will help researchers with a long-term goal of visualizing the turbulent flows of an entire giant star, one similar to the sun.  PSC is a leading partner in NSF’s eXtreme Science and Engineering Discovery Environment (XSEDE), which provides researchers and educators with the most powerful collection of advanced digital resources in the world.

  • Solar Fury
This glowing maelstrom results from magnetic arcs (orange lines) that shoot hundreds of thousands of kilometers above the sun’s surface. When the electrically charged arcs destabilize, they cause plasma to erupt from the sun’s surface. If one eruption follows another, the second will spew forth faster than the first, catching up and merging with it. Researchers at National Center for Supercomputing Applications’ Advanced Visualization Laboratory are simulating events that trigger solar eruptions to aid prediction of future solar storms. Such advances will improve preparations to mitigate and prevent the storms’ worst effects such as knocking out the electric power grid and disrupting satellite communications.

  • Churning Out a Supernova
This 3-D simulation captures the dynamics that lead to the explosive birth of a supernova. The red colors indicate hot chaotic material and the blue show cold, inert material. Two giant polar lobes form when strongly magnetized material ejected from the star’s center distorts and fails to launch cleanly away. The simulation was created on Stampede, a supercomputer at the Texas Advanced Computing Center.
  • Spinning Hydrogen
On the bridge of the Allosphere, one of the largest immersive scientific instruments in the world, researchers interact with a spinning hydrogen atom. The bridge runs through the center of the spherical display, which includes stereo video projectors covering the entire visual field, immersive audio, and devices to sense, track and engage users. Located at the University of California, Santa Barbara, the Allosphere allows researchers to visualize, explore and evaluate scientific data too small to see and hear. By magnifying the information to the human scale, researchers can better analyze the data to gain new insights into challenging problems.