We all know that in any company, adhering to security and compliance requirements is a challenging task. This is especially true because protecting the business must be done in a way that does not slow or interferes with the business objectives. Therefore, it is vital that your business keep up with changing compliance regulations, business environments, and technologies. This is especially true for companies focused on data as a growth engine. The reasons for these hardships vary, but let’s look at some of the main complexities.
- Data is a moving target
In the past (or in companies that are not data-oriented), data was an object to be stored or handled by specific teams only; it was hardly used being kept behind a “DBA wall”. This structure severely limited the impact that data had on decision-making.
In the modern era with modern data architectures, the value of data is optimized for value and thus handled by a large number of users. Data is analyzed, the value extracted, but also stored, enriched, and duplicated. While there is a tremendous benefit from this agile data, it is significantly harder to govern and protect, as well as ensure it meets compliance requirements.
- Users are also a moving target
In the same way that data is much more dynamic in nature, users (and their requirements) are also more dynamic. With more organizationally diverse teams and users clamoring for access to different datasets places a significant burden on the data engineering teams. These access requests may be for temporary projects, as well as ongoing projects, meaning a user requires access to a dataset, but only for a certain period of time, or to solve a specific problem. Such variability in access and personnel places additional burdens on data engineering teams to keep up with these changing access requirements, and in many cases results in them not adhering to the principle of least privilege.
- Where is our sensitive data?
Sensitive data is often the core data we’re trying to protect. From a risk point of view, non-sensitive data, even if exposed, should not be a significant risk; especially when compared with exposure of data containing PII, PHI, and the like.
However, in most companies dealing with large amounts of data, it is far from trivial to actually know where sensitive data is stored (and what types of data). This is because many teams are accessing data, across different databases, data warehouses, and data lakes. Some of this sensitive data is found in unexpected or hard-to-find locations (siloed databases, within semi-structured data, etc).
The challenge is if we don’t know where it is, it’s harder to protect it.
- Even when we know all of this, lowering risk is not simple
Even when we have all the facts, know the location of the sensitive data and who requires access to it, and why we still need to make sure that data access is done safely. For example, the analytics team knows the dataset they need and where it is located but should consume it in a de-sensitized form (anonymized, masked, redacted, encrypted, and so on). In other cases, teams should only have access to certain data records. For example, according to data localization requirements, they should only be able to access records for customers from a specific region.
Adhering to compliance requirements and enforcing these security policies across the different data access locations is challenging.
Adapting to “Modern” Data Use
Overcoming such challenges is crucial, as companies rely on fast, efficient, available data. However, this has to be done while ensuring that security and compliance risks are met. Here are the main pillars to solving this problem, in a “DataSecOps” friendly way.
- Bolt security and compliance requirements in data projects. This means that the data stakeholders shouldn’t learn about requirements as an afterthought, but from design through implementation.
- Security policies are clear and deterministic. Having vague “open a ticket and we’ll take it from there” policies don’t work – both in terms of speed and risk reduction. There should be clear directives about teams’ abilities to access datasets after certain data is anonymized.
- Once security policies are clear and deterministic, automate as much as possible (for example by building a self-service data portal), to reduce the friction of manual authorization by engineering teams.
- The same policies should be applied to data across different technologies, to prevent “pockets” of incompliance.
- Do not map/inventory your sensitive data “for an audit”. Due to frequent changes in data, stale metadata can increase risk. Scanning for sensitive data has to be done continuously.
Overall, it is essential to ensure that expectations are aligned with the different stakeholders – business owners, data producers, data consumers, and the governance, security, and compliance teams. Once everybody is on the same page – that data helps you win but only if it’s secure – it is easier to understand the reasoning behind requirements, and find agile ways to apply them in an effective way.
Ben is the Chief Scientist for Satori, the DataSecOps platform, simplifying access to data at scale.