Raiven

Governments around the world have thankfully recognised corporate misuse of personal data and have brought in legislation to give citizens more rights over their data. GDPR, CCPA, PIPEDA, APPI and more give individuals around the world varying levels of protection and control over their personal information.‍

It’s this precarious data landscape in which we see AI starting to reach mainstream adoption. Many of us will be aware of the well-documented data privacy and copyright concerns reported in the press surrounding AI. But don’t be fooled into thinking that these worries are only present for the likes of OpenAI and Anthropic!

Even small and medium sized organisations need to carefully navigate data privacy when implementing their own AI-driven tech.

Preface

We’re not privacy lawyers, and none of this article is intended as legal advice. However I would like to point you to the ICO’s guidance around using artificial intelligence within the confines of GDPR.

In this section, they rightly point out:

It is not possible to list all known security risks that might be exacerbated when you use AI to process personal data. The impact of AI on security depends on:

the way the technology is built and deployed;
the complexity of the organisation deploying it;
the strength and maturity of the existing risk management capabilities; and
the nature, scope, context and purposes of the processing of personal data by the AI system, and the risks posed to individuals as a result.

Due to the vast scope of potential use cases that AI presents, the precise way that you protect and secure user data within such a system is largely dependent on the scope, function, and construction of that system.

With this in mind, any SME exploring the use of AI and automation within their organisation needs to be aware of the below seven AI and data privacy considerations, at the very least.

‍

1. Data Transparency Can Be Murkier Than You Think

Under GDPR, all European and British organisations now need to think more carefully about what personal data they collect, what risks they introduce by working with that data, and how to keep that data secure.

However, AI can introduce certain temptations when it comes to data processing.

AI is incredible at filtering through and making sense of large amounts of data. Many organisations have a lot of siloed info that they desperately need to assimilate, understand, and get their heads around. Charging AI with this task would seem like a silver bullet solution.

Yet there can be real data risks in lobbing chunks of personally identifiable data into the AI meat-grinder, just to see what comes out the other end!

One of the guiding tenets of GDPR is transparency. Data processors need to be honest and transparent about what data they collect, why they collect it, and how they use that data. AI adoption can present two stumbling blocks in the way of this transparency.

When a piece of software is “closed source,” that means that both users and the wider public are unable to personally inspect the software’s code because it is proprietary to a given organisation. Microsoft’s Windows operating system is a good example of closed source software.

When a solution is closed source and proprietary to an external provider, it can be difficult to interrogate quite what happens to the data you put into it, where that data goes, and what it does. Could the data end up on an insecure server somewhere? Could the data be used to further train the AI model against your data subjects’ wishes? There may not be a way for you, as the average user, to tell.

We’re not accusing any AI model or software of this behaviour, of course. But without having access to the code that runs the software, organisations like yours have little way of knowing what is truly happening under the bonnet.

The second issue is that of AI’s renowned “black box problem.” A lot of deep learning systems rely on swathes of training data and inferences that have now become so complex that even their creators don’t understand why they give some of the answers that they do.

Understandably, both issues present a significant challenge for those trying to be as transparent as possible about how personal data is used.

‍

2. Follow the Rules Around Automated Decision Making

GDPR also contains stringent rules about automated decision making.

Individuals covered by GDPR have a right to opt out of solely automated decision making - i.e., where data controllers make significant decisions about individuals purely using an automatic programme or algorithm. Individuals also have a right to ask a human to reassess any decision solely made through automation. This remains the case whether AI plays a part in that decision process or not.

Additionally, our readers in the EU should also be aware of the new EU AI Act. This effectively bans the use of AI tools to impose “social scoring” on individuals or to identify people in real time using biometric data.

If you are considering creating a system that makes significant decisions about people’s lives, there are a few things you should bear in mind.

Firstly, identify the bare minimum data points that a human would need in order to make that decision about an individual case. This should be the absolute maximum data that you feed into your AI decision-making solution. If you give your AI solution more information than it is likely to need, you risk overexposing individuals’ data, you risk introducing bias into the AI model, and you risk regularly overworking the AI tool, which can present energy costs.

Secondly, you need to consider how your solution is going to respect the wishes of those who opt out of automated decision making. How you achieve this is going to depend heavily on what the solution does and how it works, but a way of excluding data subjects from automatic decisions should always be built in from the outset.

Above all, always keep data subjects informed about the use of their data, tell them about your use of automated processing, and give them clear ways to opt out or to challenge any automated decision. Schedule in regular checks to ensure that your decision-making tools are working as they should be too - especially when your AI tools use machine learning to pick up new things and adapt their judgement over time.

Essential Reading from ICO: Rights related to automated decision making including profiling

‍

3. Less is More: Embrace Data Minimisation

Data minimisation is where an organisation collects the bare minimum amount of personal data they need in order to function, and it’s wise data privacy practice. After all, minimising the amount of data you hold similarly minimises your data exposure risk and minimises data storage costs too.

You might also want to adopt a related concept: purpose limitation. That’s where personal data is only collected for specified, explicit, and legitimate purposes and never processed in ways incompatible with those purposes.

So where does AI come into this? Again, it might depend on what the AI is tasked with doing. For example, say you’re developing an AI solution that is designed to monitor a video feed and flag errors on an assembly line, though not to identify those responsible. It simply doesn't make sense to store vast amounts of likely repetitive video data, which may also introduce privacy concerns for workers and visitors in the vicinity. Such huge amounts of storage would also be vastly outside the scope of the application.

It would respect individuals’ privacy a lot more to only store and analyse video data whilst an instigating error is taking place; with measures in place to obscure any personally identifying images of team members captured in that segment of video.

It’s also worth bearing in mind that when an AI model has a smaller amount of purposeful, clean data to trudge through in order to formulate a response, this can have a positive impact on the model’s performance and hardiness.

‍

4. Build in Anonymity, Build Out Bias

If personal details aren’t relevant to data processing or storage, then keeping that data completely anonymised is great data protection practice. After all, if personal data isn’t present, it can’t be breached or misused.

But anonymising data has another benefit too. When identifying characteristics (such as name, gender, ethnicity, sexuality and geography) are completely absent from a system, this eliminates the potential for bias towards or against certain individuals or groups. We’re all aware of how humans can bring their own biases into a process - but without careful instruction and training to the contrary, AI can introduce biases too.

In an older, well-documented case, Amazon developed an ML recruiting tool to review job applicants’ CVs and spit out the best few candidates for each role in a completely objective, neutral way. However, the tool was trained using CVs submitted to the company over a 10-year period - most of which were from male candidates due to the male-dominated nature of the tech industry. The system therefore ended up “teaching itself” to favour male candidates over female ones.

Therefore, measures need to be built into systems to eradicate bias - and build in total anonymity if the scope of the project allows.

For example, RAIven is building a real-time, AI/ML-powered, health and safety monitoring tool for a leading corporate client, which incorporates data from video streams. In order to respect anonymity, we’ve built in layers of abstraction so certain actions get flagged as potentially desirable or undesirable without feeding in any data that is identifies an individual. This built-in anonymity eliminates possible privacy concerns around storing people’s physical likenesses - but it also helps to eradicate the possibility of the system picking up any biases along the way.

Care also needs to be taken around what AI tools are allowed to infer about data subjects. Even with a few seemingly innocuous data points, a solution may be able to deduce highly personal things like gender, medical conditions, or sexual orientation, simply through its incredible pattern-matching prowess!

Bias can also be purposefully built into AI tools, as evidenced by Google Gemini well-meaningly “over-diversifying” images it generated from prompts where a level of historical precision was expected.

In our view, AI tools need to be constructed with the maximum amount of anonymity and with unbiased neutrality built in from the outset.

5. Keep Your Data Lean and Local

AI tools are able to receive, process, and create new data at breakneck speeds, making it essential that any organisation using AI carefully considers the practicalities of storing that data.

Keeping your data minimised, sanitised, and process-specific obviously reduces the amount of space it is going to hold on a disc. This reduces storage costs (and environmental costs) in and of itself.

However, there’s another factor to consider here – transfer costs. Transferring data from one location to another is going to use energy and incur cost. Transferring data, especially over public networks, can also introduce cyber and privacy risks too.

With this in mind, aim to keep any data and computation as local as possible. Does a piece of data really need to be transferred halfway across the country to be computed and then returned? Or can the whole process happen on-site?

Also bear in mind that AI requires a lot more computational power than standard computing, so any hardware that is tasked with on-site AI computing will need to be fit for purpose.

For example, within some of the solutions we develop, we are able to plug an AI-ready computational device directly into a camera or sensor, so the data generated doesn’t need to travel through miles of cable in order to be computed. The needed computing all happens right there before the results of that computation are moved on to where they need to go. This keeps data risk and transfer costs to an absolute minimum.

6. Be Aware of AI-Specific Data Privacy Attacks

Many of us are aware of attacks on people’s private data like social engineering attacks. But did you know there are AI-specific privacy attacks that can be used to uncover personally identifiable information from an AI powered system?

In membership inference attacks, hackers probe an AI model using previously obtained personally identifying data about a target individual. Their aim is to work out whether that individual’s data was part of the AI’s training data or not. This could let hackers know whether an individual had interacted with a particular service during the time the training data was being amassed.

Another type of attack is a model inversion attack, where criminals (armed with some initial identifying data about their target/s) aim to probe an AI model to infer and extract personal information about those individuals within its dataset.

However, there is an important caveat here: both of these attacks involve the criminals already having some personally identifying information about the individuals they’re targeting, and both require attackers to gain access to the AI model itself. This makes a strong case for data privacy and access control best practices.

‍

7. Document All Data Movement, Storage, and Use

The ICO make an excellent point about recording what you do with the data under your care:

ML systems require large sets of training and testing data to be copied and imported from their original context of processing, shared and stored in a variety of formats and places, including with third parties. This can make them more difficult to keep track of and manage.

Your technical teams should record and document all movements and storing of personal data from one location to another. This will help you apply appropriate security risk controls and monitor their effectiveness. Clear audit trails are also necessary to satisfy accountability and documentation requirements.

You may also find it enlightening to interrogate your technical supply chains, especially those which directly interact with sensitive data and AI components.

‍

In Conclusion

The best way to ensure the most stringent control over data privacy within an IT system is to have it custom built. This way, you have total visibility into its internal workings, you are less beholden to external supply chain fluctuations, and you’re not locked into a particular vendor’s way of doing things.

‍

So if your organisation is exploring its custom, privacy-respecting, ethical AI options, book a free consultation with the Raiven team today!

‍

What is Federated Learning?

In today's world, the importance of data privacy and security cannot be overstated. With the exponential growth of connected devices and the ever-increasing volumes of data they generate, traditional centralised machine learning approaches face significant challenges. Enter federated learning—a revolutionary paradigm that promises to reshape the landscape of artificial intelligence (AI) by enabling collaborative model training without compromising data privacy.

Federated learning is a decentralised approach where multiple devices or servers collaboratively train a shared model while keeping the data localised. This innovative technique allows organisations and individuals to harness the collective intelligence of distributed data sources without the need to transfer sensitive information to a central server. Imagine hospitals around the world collaboratively training a medical AI model on patient data without ever sharing the sensitive information. The potential for breakthroughs in healthcare, finance, and beyond is immense.

But how exactly does federated learning work, and what makes it such a game-changer in the realm of AI? We can delve into the fundamentals of federated learning, explore its numerous benefits whilst addressing the challenges that come with this groundbreaking technology.

Definition

Federated learning is a decentralised machine learning approach where multiple devices or servers collaboratively train a shared model without exchanging their local data. Instead of sending raw data to a central server, each device computes model updates (gradients) based on its local data and then sends these updates to a central server. The server aggregates these updates to improve the global model, which is then shared back with the devices. This process continues iteratively, allowing the model to learn from distributed data while preserving privacy and reducing data transfer.

In essence, federated learning enables collaborative learning while keeping data localised, ensuring data privacy and security.

How it works

Data Stays Local

In FL, data remains on individual devices instead of being centralised. This means your personal data never leaves your device, maintaining privacy and security.

Local Training

Each device trains a copy of the global model using its local data. For instance, your smartphone might improve its predictive text model based on your messages, without sharing any content.

 Sharing Model Updates

Devices send only model updates (gradients) back to a central server, not the raw data. This ensures privacy while still contributing to the model's improvement.

Aggregation

The central server aggregates updates from all devices to refine the global model. Techniques like Federated Averaging combine these updates to enhance the model.

Iteration

The improved model is sent back to devices, and the process repeats. This iterative cycle allows the model to get better while keeping data private. 

Benefits

Data Protection

Federated learning offers significant advantages in terms of data protection and privacy. By keeping data on local devices, FL ensures that sensitive information never leaves its source. This decentralised approach means that personal and proprietary data remains secure, minimising the risk of data breaches and unauthorised access. It complies with stringent data protection regulations, such as GDPR, by ensuring that raw data does not leave the network, thereby preserving user privacy.

Limiting Data Transfer

Another key benefit of federated learning is the reduction in data transfer. Traditional machine learning methods require transferring large amounts of data to a central server, which can be costly and time-consuming. In contrast, FL only sends model updates, significantly reducing the bandwidth and computational resources needed for data transmission. This efficiency makes it particularly suitable for edge devices and IoT applications, where network connectivity may be limited or expensive.

Improved models and collaboration

Federated learning enables the creation of improved models through collaborative efforts. By leveraging the diverse data distributed across various devices, FL can capture a wider range of patterns and behaviours than a single centralised dataset could. This leads to more robust and generalised models. Moreover, FL fosters collaboration between different organisations and entities, allowing them to collectively train AI models without sharing sensitive data, therefore accelerating innovation and advancements across industries.

Use cases

SAFERAI

At Raiven we are using federated learning to transform the manufacturing industry, particularly in health and safety risk prediction, through initiatives like SAFER AI (Safety Advancing Federated Estimation of Risk using Artificial Intelligence - SAFER AI). We collaborate with P&G to train a shared AI model that predicts risk in a manufacturing environment, thus improving workplace safety without sharing sensitive operational data. Other companies can train the model on their own local safety data, then sends model updates to a central server. These updates are aggregated to enhance the global model, ensuring robust predictions across diverse environments. This approach enables companies to benefit from collective insights, enhancing safety standards while maintaining data privacy and protecting proprietary information. SAFERAI exemplifies how federated learning fosters a safer and more secure industrial landscape.

Christoph Wagner-Gillen, P&G Product Supply HSE Governance

“Federated Learning ... is unlocking potential not yet realized in industry. It’s enabling competing business to collaborate on the use of technology and AI for good. Specifically, within SaferAI, I can see significant value in utilizing federated learning to share model results for predicting safety incidents and risks. I can see this changing approaches in industrial safety. Before, we could only cooperate on best practices, technology or processes, but now we can actually prevent incidents with the use of AI and shared data.”

Healthcare

Federated learning can revolutionise healthcare by enabling collaborative AI model development without compromising patient privacy. Instead of centralising patient data, each hospital could train a shared model locally and only sends model updates to a central server. This ensures sensitive data remains within the healthcare facility, complying with privacy regulations. For example, hospitals worldwide can collectively improve a cancer detection model by sharing updates, not raw data. This approach leads to more accurate, generalised models that benefit from diverse datasets, while promoting data equity and enhancing patient outcomes in a secure and effective manner.

Predictive Maintenance

In industries such as manufacturing, predictive maintenance is crucial for minimizing downtime and extending the lifespan of machinery. Federated learning can enhance predictive maintenance by enabling multiple factories to collaboratively train machine learning models without sharing sensitive operational data. Each factory collects data from sensors on their equipment, trains a local machine learning model, and then aggregates these models into a global one using federated learning techniques. This ensures data privacy while leveraging data from multiple sources to make more accurate predictions, leading to reduced unexpected equipment failures, lower maintenance costs, and increased operational efficiency.

Conclusion

Federated learning is a transformative approach that combines the power of collaborative AI with the paramount need for data privacy. By keeping data localised and only sharing model updates, FL ensures sensitive information remains secure while harnessing the collective intelligence of distributed datasets. This innovative method is driving advancements in various fields, from healthcare to manufacturing, enabling the development of robust and generalised models. As we navigate an increasingly data-driven world, federated learning stands out as a promising solution that balances privacy, efficiency, and collaborative potential. Embracing federated learning is a step towards a more secure, equitable, and innovative future in artificial intelligence.

Over the past couple of decades, the relationship between modern tech and personal data privacy has become increasingly complex.