The first speaker will present the Person-event Data Environment (PDE)—the Army’s enterprise platform for integrating data. The PDE brings researchers into a secure data environment where hundreds of Department of Defense datasets are housed. Researchers use the PDE to describe, investigate, and predict a variety of phenomena from attrition to injury to sexual assault.
The second speaker will describe the Army Synthdata project which addressed important privacy concerns by generating synthetic datasets from over 400,000 Active Duty Soldiers that, from a machine learning and statistical perspective, resembled the real US Army, but contains no individual person’s data. This paper presents a feasible method for securing privacy in Big Data when simple deidentification is insufficient to protect privacy.
The third paper presents lessons learned and useful approaches for integrating Big Data, illustrated by several real challenges we faced. We share five strategies for integrating data within sources (e.g., modeling the length of time between instances of longitudinal data) and five for integrating between sources (e.g., examining bounded and biased distributions among datasets). There are no magic solutions to Big Data integration problems, however, careful planning and evaluation of data in advance of model-building are best practices.
Lastly, the discussant will summarize the Army’s approach to working with Big Data, including current and future challenges. He will facilitate a discussion which is intended to provide SPR attendees with actionable strategies and best practices to apply in their own research.