University of Virginia
Networks are ubiquitous and are a part of our common vocabulary. Network science and engineering has emerged as a formal field over the last twenty years and has seen explosive growth. Ideas from network science are central to companies such as Akamai, Twitter, Google, Facebook, and LinkedIn. The concepts have also been used to address fundamental problems in diverse fields (e.g., biology, economics, social sciences, psychology, power systems, telecommunications, public health and marketing), and are now part of most university curricula. Ideas and techniques from network science are widely used in making scientific progress in the disciplines mentioned above. Networks are now part of the public vocabulary, with news articles and magazines frequently using the term "networks" to refer to interconnected entities. Yet, resources for effective use of techniques from network science are largely dispersed and stand-alone, of small scale, home-grown for personal use, and/or do not cover the broad range of operations that need to be performed on networks. Compositions of these diverse capabilities are rare. Furthermore, many researchers who study networks are not computer scientists. As a result, they do not have easy access to computing and data resources; this creates a barrier for researchers. This project will develop a sophisticated cyberinfrastructure that brings together various resources to provide a unifying ecosystem for network science that is greater than the sum of its parts. The resulting cyberinfrastructure will benefit researchers and students from various disciplines by facilitating access to various tools for synthesizing and analyzing large networks, and by providing access points for contributors of new software and data. An important benefit of the system is that it can be readily used even by researchers who have no formal training in computer programming. The cyberinfrastructure resulting from this work will foster multi-disciplinary and multi-university research and teaching collaborations. As part of this project, comprehensive education and outreach programs will be launched by the participating institutions, spanning educators and K-12 students. These programs will include network science courses with students from minority and under-represented groups, and students at smaller institutions who do not have easy access to high performance computing resources.Resources for doing network science are largely dispersed and stand-alone (in silos of isolated tools), of small scale, or home-grown for personal use. What is needed is a cyberinfrastructure to bring together various resources, to provide a unifying ecosystem for network science that is greater than the sum of its parts. The primary goal of this proposal is to build self-sustaining cyberinfrastructure (CI) named CINES (Cyberinfrastructure for Sustained Innovation in Network Engineering and Science) that will be a community resource for network science. CINES will be an extensible and sustainable platform for producers and consumers of network science data, information, and software. CINES will have: (1) a layered architecture that systematically modularizes and isolates messaging, infrastructure services, common services, a digital library, and APIs for change-out and updates; (2) a robust and reliable infrastructure that---for applications (apps)---is designed to accommodate technological advances in methods, programming languages, and computing models; (3) a resource manager to enable jobs to run on target machines for which they are best suited; (4) an engine to enable users to create new workflows by composing available components and to distribute the resulting workload across computing resources; (5) orchestration among system components to provide CI-as-a-service (CIaaS) that scales under high system load to networks with a billion or more vertices; (6) a digital library with 100,000+ networks of various kinds that allows rich services for storing, searching, annotating, and browsing; (7) structural methods (e.g., centrality, paths, cuts, etc.) and dynamical models of various contagion processes; (8) new methods to acquire data, build networks, and augment them using machine learning techniques; (9) a suite of industry- recognized tools such as SNAP, NetworkX, and R-studio that make it easier for researchers, educators, and analysts to do network science and engineering; (10) a suite of APIs that allows developers to add new web-apps and services, based on an app-store model, and allows access to CINES from third party software; and (11) metrics and a Stack Overflow model, among other features, for producers and consumers to interact (in real-time) and guide the evolution of CINES. CINES will enable fundamental changes in the way researchers study and teach complex networks. The use of state-of-the-art high-performance computing (HPC) resources to synthesize, analyze, and reason about large networks will enable researchers and educators to study networks in novel ways. CINES will allow scientists to address fundamental scientific questions---e.g., biologists can use network methods to reason about genomics data that is now available in large quantities due to fast and effective sequencing and the NIH Microbiome Program. It will enable educators to harness HPC technologies to teach Network Science to students spanning various academic levels, disciplines, and institutions. CINES, which will be useful to researchers supported by many NSF directorates and divisions, will be designed for scalability, usability, extensibility, and sustainability. This project will also advance the fields of digital libraries and cloud computing by stretching them to address challenges related to Network Science. Given the multidisciplinary nature of the field, CINES will provide a collaborative space for scientists from different disciplines, leading to important cross fertilization of ideas.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.