The government has been collecting and storing data as long as there has been a digital system in place to do so. With petabytes on petabytes of data and more being created every moment – there is no such thing as small data in government, only big.
When you have vast quantities and diversity of data, challenges are certain to follow. It does not exist in one accessible place, it’s not always clean and ready-to-use, and the storage platforms from the past are not built for the work being done today. Government agencies and private industry alike are all facing the same question: How can we unlock and make the most of it while still protecting this most important of assets?
Pure Storage’s Nick Psaki was joined by Department of Homeland Security’s (DHS) Donna Roy, U.S. Army’s Tom Sasala, National Oceanic and Atmospheric Administration’s (NOAA) Jonathan O’Neil, MarkLogic’s Brigham Bechtel, and Cloudera’s Henry Sowell. The panel explored this unresolved question as part of Federal News Network’s Federal Executive Forum Program on Big Data Analytics in Government.
As agencies across the government are working to turn data into innovation, they are learning the importance of clean data. DHS has realized that, if computing is increasingly done at the edge – there needs to be data quality at the edge. However, this isn’t being routinely discussed or addressed. Agencies must start thinking about how to create smart data that cleans itself at the edge so when it comes time to use data, it’s ready – regardless of where it’s needed.
In addition to smart data, it is essential to have it be secure. It’s non-negotiable. According to Marklogic, both the private and public sector must ensure data is governed, curated and secured properly before it can be fed into algorithms. And agencies agree – according to DHS, artificial intelligence (AI) and machine learning (ML) either produce really good mission improvement or really frightening mission improvement – and the outcomes depend on the data fed in.
From the perspective of the private sector, Cloudera sees the beginning of a shift where activity happening in private sector is finding its way into the government space. There are passionate government employees who want to see and be part of progress – and they want to leverage proven methods and technologies.
As agencies come to understand the importance of clean data, we see the need for a fundamental understanding of basic management functions. For example, when data is used for the first time, many agencies are realizing that the quality isn’t as expected. While real-time data was thought to be essential for mission success, in practice, access to clean data is more important. To manage it correctly and keep progress on track, the proper teams must be in place. The challenge facing government is gathering and building the necessary skillsets. It needs more than just scientists – it requires an entire ecosystem to support this work.
As the public and private sectors navigate the process of making it mission-ready, they understand the tremendous value of making data shared and available. For example, innovation is happening today at NOAA. It is experimenting with making public data available on the cloud free of charge through its “big data project.” The initiative is helping to advance understanding of bird and marine mammal migration patterns. And, while NOAA has seen the usage of this valuable data increase, its administrative work has decreased. Allowing it to be accessed on the cloud makes the process seamless, freeing up staff time for agencies and fostering innovation.
As part of the discussion, Cloudera highlighted the importance of delivering ways to clean, understand, catalog, and make available data to an enterprise cloud environment. There is growing awareness and understanding across the government of this need, and the private sector is stepping up to craft solutions. A great example is the work being done with the 2020 U.S. Census. Groups are working to determine the best way to collect it at the edge, clean it, and use it to help shape policy.
Big data has the power to transform how the public sector accomplishes its mission – making it more effective and efficient. The key is for public and private sector to continue working together to create an incubator for success. Agencies are looking for ways to jump-start the move to a data-centric architecture, and the private sector is responding by delivering an infrastructure that supports that, allowing it to be leveraged for smart, informed decision making.
The full webcast can be viewed here and for more information on how Pure is helping agencies to accelerate adoption of data-centric architectures to make the most of their data, visit purestorage.com.