Session: US Army Solutions to Big Data Challenges with Technology, Privacy, and Data Integration (Society for Prevention Research 27th Annual Meeting)

3-050 US Army Solutions to Big Data Challenges with Technology, Privacy, and Data Integration

Thursday, May 30, 2019: 3:00 PM-4:30 PM
Seacliff B (Hyatt Regency San Francisco)
Theme: Big Data Integration
Symposium Organizer:
Alycia Perez
Stacy Ann Hawkins
This symposium describes several of the US Army’s focused solutions for using Big Data safely, securely, ethically, and in accordance with scientific best practices. The Human Capital Big Data Initiative is the Army’s overarching policy and strategy for using human data. The Army uses its large stores of data for descriptive analysis, policy evaluation, research, and prediction. The sensitivity of the Army’s data and operations requires steadfast protection. In this symposium, researchers at the Army Analytics Group’s Research Facilitation Laboratory detail best practices for using Big Data.

The first speaker will present the Person-event Data Environment (PDE)—the Army’s enterprise platform for integrating data. The PDE brings researchers into a secure data environment where hundreds of Department of Defense datasets are housed. Researchers use the PDE to describe, investigate, and predict a variety of phenomena from attrition to injury to sexual assault.

The second speaker will describe the Army Synthdata project which addressed important privacy concerns by generating synthetic datasets from over 400,000 Active Duty Soldiers that, from a machine learning and statistical perspective, resembled the real US Army, but contains no individual person’s data. This paper presents a feasible method for securing privacy in Big Data when simple deidentification is insufficient to protect privacy.

The third paper presents lessons learned and useful approaches for integrating Big Data, illustrated by several real challenges we faced. We share five strategies for integrating data within sources (e.g., modeling the length of time between instances of longitudinal data) and five for integrating between sources (e.g., examining bounded and biased distributions among datasets). There are no magic solutions to Big Data integration problems, however, careful planning and evaluation of data in advance of model-building are best practices.

Lastly, the discussant will summarize the Army’s approach to working with Big Data, including current and future challenges. He will facilitate a discussion which is intended to provide SPR attendees with actionable strategies and best practices to apply in their own research.

* noted as presenting author
Big Data, Compliance, & the US Army: Cleaving the Gordian Knot
Francisco Huante, BA, Army Analytics Group/Research Facilitation Lab
Beyond Duct Tape and Baling Wire: Realistic Strategies for Integrating Large Datasets
Alycia Perez, PhD, Army Analytics Group/Research Facilitation Lab