Jak dbać o prywatność i chronić dane w dobie AI? Poradnik dla początkujących i zaawansowanych

For years, we’ve known that the possibility of promotion and communication on social media comes at the price of losing control over our private data, profiling, and social polarization. What are we sacrificing in exchange for the increased efficiency and complex analysis of content creation made possible thanks to generative AI? Can we secure these tools so they don’t become a new area of risk for NGOs?

Changing threats

A Microsoft study from the beginning of 2025 on cognitive loads and working with generative AI tools showed that, in most cases, workers (from different fields and professions) used less energy on critical thinking. This surely isn’t particularly shocking. It is, however, worth thinking about what this means for the quality of work, but also its security. What are the consequences of slowly switching to autopilot?

Threats, which prey on our lulled vigilance, are not the only risks that come with working with readily available and easy-to-use AI. The most significant potential risks still depend on what data is used to train the AI and how secure the tools we use to access data and generate content are.

Here are the most significant threats:

Cheating the AI system: A defining feature of gen AI solutions is their vulnerability to manipulation through prompts (known as prompt injection). Attackers design prompts to cheat the system or a specific tool into performing malicious actions or gaining access to the AI model and data. Relying on popular AI models, e.g., when creating your own versions of chatbots, may be vulnerable to the same flaws.
Holes in data security: AI tools may be susceptible to attacks, leading to leaks of sensitive data. “Model extraction” attacks are made to extract information on how the models work and about the data used to train them. A successful attack of this kind, which would result in a leak of personal data, poses technical and legal risks for companies and organizations (related to data protection and GDPR regulations).
Sharing sensitive data with AI models: The ease with which one can get multiple tasks done through chatting with AI makes it so that we share more information with it faster. Both individuals and organizations need to be mindful of what data they share with AI models. In 2023, for instance, there was a major incident when Samsung employees accidentally entered confidential data into ChatGPT.
Unauthorized access: Like all cloud tools, the information we store in them (chat histories, attachments), as well as what they store about us, can leak if we lose control of our account.

Besides the risks which result directly from how the tools work, it’s worth remembering the risks which come from the use of generative AI results:

Inaccurate or biased results: AI models trained on incomplete or biased data can then generate incorrect or discriminatory results. The risk of using or disseminating this information for companies and organizations is significant, unless it’s verified and corrected by humans.
Copyright infringement: Although content created by AI, without significant modifications by humans, is not considered a work and doesn’t automatically gain copyright protection, it may still infringe intellectual property rights. If an AI product (trained on real works) is too similar to protected works, the publication of the product may infringe upon the rights of the authors or property rights holders. Therefore, the publication of such products requires human supervision before being shared. In cases of accidental plagiarism (when AI-generated text or graphics are too similar to an existing work), the copyright of the original creator may be infringed. In such a case, the responsibility lies with the person who publishes the product.
Using AI for social engineering: Generative AI tools can be used to create convincing phishing e-mails or social engineering attacks. The latest GPT-4o model can generate images with text that are as accurate as a fake ID.

The need to update the working model

For the past two years, privacy has been the primary concern when selecting AI tools and models. Early versions of these systems saved your chats and used them to improve their models. There was no option to decline, nor were there any alternatives. This has changed dramatically, as I describe below in the section on tool settings. In addition to learning how to use the tools better, we also need to know how to work with data safely.

Both you and your team need clear rules about the safety of using Gen AI tools, the safe use of Gen AI tools, and improving your skills in working with them. The ease of working with conversational AI and the vast range of tools available make it difficult to control whether everyone is using them as safely as they should be. So, in addition to creating a policy for working with generative AI in your team, it’s also worth investing time in practical skills development (e.g., data anonymization, regardless of the tools used) and building mutual trust (to keep each other informed about how we plan to use generative AI tools in our individual or team work).

Minimizing data and tools

Using multiple tools simultaneously, each slightly better than the others in specific tasks, has its benefits, but it also significantly increases the risks concerning security and privacy, as I mentioned earlier.

It’s worth limiting the number of tools used by your team or in your organization, as well as the data being processed, to make it easier to keep everything in control. Minimization involves focusing on tasks that increase your productivity thanks to AI, while limiting the risks associated with tools that don’t offer such benefits.

Data anonymization

Data anonymization is a process of removing personal data and other sensitive information from a dataset so that it cannot be linked to specific individuals. This is a key skill in the age of generative AI tools, which excel at natural language processing and processing large amounts of data.

Anonymization can reduce the risk of data leaks and help you comply with privacy regulations. Depending on the type and sensitivity of the data, different anonymization methods can be used. Here are some common techniques:

Pseudonymization – replacing sensitive data with fake identifiers (pseudonyms), making it harder to identify specific individuals. The structure of the data remains the same, which allows it to continue to be analyzed. This process can be carried out in Excel (use the RANDBETWEEN or RAND functions to generate random IDs) and Word (use the Find and Replace function (Ctrl+H) to replace names en masse).
Data masking – this allows you to hide sensitive data using random characters, symbols, or other substituted values. The data format remains unchanged, but the data becomes illegible. Hiding parts of information, such as a social security number, can also be achieved using Excel functions (use the TEXT and REPT or LEFT/RIGHT functions).
Generalizing – decreasing the level of specificity of the data, e.g., by replacing exact age with an age range.
Data swapping – shuffling values between records in a dataset, making it difficult to assign data to specific individuals.
Adding noise – introducing random changes to the data, which reduces its accuracy and makes it difficult to identify individual entries.

Joint result auditing

In small organizations, sharing best practices can positively bolster the work of the entire team and protect us from making mistakes. Generative AI tools produce extremely varied results and can change (e.g., as a result of changes in the model itself). What’s worse, they can operate in a way that is undetectable to co-workers or recipients (e.g., meeting transcription programs that do not inform attendees in online meetings about their presence). It’s worth taking the time to come together and regularly audit how such tools are used to:

Strengthen transparency on the use of AI tools within the team,
Help each other avoid risky usage and reinforce safe practices,
Improve each other’s work, e.g., by exchanging better instructions (prompts).

This approach combines risk management with improving our teams’ AI skills.

Privacy settings in the tools you use online

Cloud-based versions (ChatGPT, Gemini, Perplexity) offer some form of private mode, which is designed so it doesn’t use the data entered to train the AI model. Corporate versions of these products have begun to offer additional security guarantees and regulatory requirements. It’s also becoming easier to run generative AI models locally on your own computer or, in the case of companies and organizations, on dedicated cloud servers such as Microsoft Azure or Google Vertex.

Basic security measures in ChatGPT

ChatGPT offers the option of temporary conversations that are not saved in your history and are not used to train models. To enable this, go to the top left corner of ChatGPT and select “Temporary Chat.”

You can also completely disable the option to train its model using your data. However, this setting change will still save your history and give OpenAI temporary access to your memory. To do this, go to your profile icon on the ChatGPT website and select Settings > Data Control, then disable “Improve the model for everyone.” While this option is disabled, your new conversations will not be used to train ChatGPT models.

If you need a guarantee of data processing security and the ability to disable AI models on our data by default, these options are only available with an Enterprise account ($29/month per person). In this version, you can also better secure your account and control access for employees.

It’s also worth paying attention to additional security settings, such as two-factor authentication, in the settings for all account types.

Basic security features in Perplexity

You’ll find the exact same options when using Perplexity. To enable incognito mode, click Settings in the lower left corner.

To disable model training on your data, go to Settings > Preferences and disable Data Retention. Perplexity also offers an Enterprise Pro version, which provides companies and organizations with more granular model settings and account security.

Basic security features in Google Gemini

Unlike OpenAI ChatGPT, Google offers limited options for managing personal data with Gemini. Most privacy-related settings are linked to your parent Google account, which in turn is connected to all Google services such as Chrome, YouTube, etc. By clicking on your profile in the upper right corner, you can disable and clear your history, meaning your Gemini app activity.

Do generative AI tool providers respect user privacy? Can they be trusted?

As with any commercial tool that operates in the cloud and whose terms of use we have to agree to, it’s essential to understand its provisions and assess the possible risks. It’s worth distinguishing between the different levels of security resulting from how the tool works, as specified in the terms of use (i.e., Terms of Service or User Agreements), general regulations, and our own rights.

Settings, i.e., options available to the tool user and the ability to manage privacy and security (as described above).
Terms of service for AI services and tools, which specify, for example, the scope of processing of our data.
Regulations that protect our privacy, e.g., in terms of consent to the processing of our personal data.

Take, for example, information about the Google Gemini service, where we find limited options for disabling our training data, the need to consent to the processing of data on content from other tools connected to it for the purpose of providing services (but not for training the model), and declarations of compliance with GDPR, in the same way as the entire Google Workspace service (i.e., Google Drive and Google Docs), which concern, among other things, the company’s obligation to inform us about personal data leaks, etc.

Zestawienie opcji prywatności i ochrony danych w usłudze Google Gemini – informacje o tym, jakie dane są dostępne dla modelu, jak są używane oraz jak są chronione. — Overview of privacy and data protection options in Google Gemini – information about what data is available to the model, how it is used, and how it is protected.

When choosing an AI service for yourself or your organization, it’s worth taking a close look at all three of these levels. But that’s not all. As I mentioned when discussing the differences between ChatGPT and Perplexity accounts, security and data guarantees are sometimes only provided for corporate accounts, and not necessarily for every paid account.

Control over artificial intelligence with a locally running model—on a computer

Locally-run (running on our own equipment, a computer or server) large language models (LLM) can significantly raise our level of privacy and data safety. By running these models within your own environment or private cloud, confidential data remains under your (as an individual), your company or organization’s control. Not all models can be run locally, on your own servers.

If you want to run a local model on your own computer, you only need a simple configuration (e.g., using programs like jan.ai, as shown in the illustration below, LMstudio, or Olama) and a model that fits within your computer’s resources. The main limitation here would be RAM. For instance, 32 GB of RAM would allow several such models to run, like Google Gamma, DeepSeek R1, Llama3, or Mistral.

If you have access to services such as Microsoft Azure or Google Vertex, you can not only run most models, including those from OpenAI, i.e., GPT, but also relatively easily build your own tools with their help. You still won’t need any specialized knowledge for their configuration and will be able to create, for example, your own chatbot.

Can artificial intelligence, privacy, and data security go hand in hand? A summary

The safe use of generative AI requires deliberate measures, clear rules, and the constant improvement of privacy and data protection skills. We can’t yet rely solely on legal regulations and the goodwill of suppliers of often very new tools. It is worth regularly auditing the tools used, as well as the results of working with them, minimizing the amount of information processed, and becoming more familiar with anonymization techniques. The level of security depends primarily on our vigilance.

How to protect your privacy and data in the age of generative artificial intelligence: A guide