How to safely use generative AI?

Do you use generative artificial intelligence-based tools but aren’t sure if you are doing it safely? Do you want to make sure you aren’t violating any copyright laws? Can you legally protect the work you create in Midjourney, DALL-E, and the like? You don’t want to risk the data you feed into these tools leaking? We have all been asking ourselves these questions lately. Here are some tips and answers to help you better work with Gen AI tools, decide what to share with them, and how best to use their products.

Before we move on to the tips, divided into the areas of copyright, privacy protection, and data security, let’s sort out what we deal with when working with generative artificial intelligence:

Dataset: Collections of data used to teach AI models (this includes articles, websites, books, paintings, and photos, but also the data we input into the tools), which will, more often than not, constitute works within the meaning of copyright law.
LLM (Large Language Model): A language model that performs natural language processing tasks (input data) to generate responses (output data).
Generative AI apps: We most often use these models with the help of tools and applications (such as ChatGPT or Midjourney) that have Terms of Service that address, for example, data protection or copyright issues.
A prompt: A commonly accepted term used for commands and instructions given to generative AI tools. Prompts can also consist of attached images, texts, and data.
Output: the results obtained from generative AI tools, such as text, code, video, and graphics.

In the definitions I’ve shared, I pay special attention to when we can use copyrighted works and data in our work with generative AI. This will help us know when and where to look closely at legal and data security issues. It’s worth keeping in mind that you are bound to run into challenges on both sides:

the performance of training tools and data,
our instructions and the results we receive.

AI content generation accuracy and error liability

Practically all providers of generative AI solutions warn against the possibility of eliciting incorrect results. In their terms of service or in messages we receive after logging in to their sites, we will often find a note stating that the final assessment of results lies with the tool user and that it is their responsibility if they end up sharing false information. Although these tools are constantly improving, you should not expect these companies to take responsibility for the content they generate to a greater extent than they do now.

For example, OpenAI, the company behind the most popular GenAI model and chatbot ChatGPT, puts it this way in their terms of service:

When you use our Services, you understand and agree: Output may not always be accurate. You should not rely on Output from our Services as a sole source of truth or factual information or as a substitute for professional advice. You must evaluate Output for accuracy and appropriateness for your use case, including using human review as appropriate, before using or sharing Output from the Services.

It’s worth noting that these errors may not only result from so-called hallucinations but also from mistakes in the training material, the lack of access to up-to-date data, or imprecise instructions. Awareness of our (human) responsibility for using generative AI solutions will also be crucial in subsequent areas.

Copyrights vs. generative artificial intelligence

This is likely the most uncertain area of the everyday use of generative AI tools. On the one hand, we have to take into account the provisions of the regulations and terms of use of AI applications, on the other, current legal regulations (which did not necessarily foresee technologies such as generative AI), and on the third hand, potential changes that may bring with them, for example, the first court cases against such tools and new legal regulations such as the EU AI Act.

The United States Copyright Office’s position is that “AI technology […] generated material is not the product of human authorship . . . [and therefore] not protected by copyright.” In practice, however, the authorship and ownership of works created by AI are attributed to the users creating the prompts. It is also on them to check if an image infringes on a third party’s copyrights (e.g., features a logo, images of people, pieces of copyrighted works) before sharing it publicly, especially in a commercial capacity. The terms and conditions of most generative AI tools (which largely originated in the US) are written in a similar vein.

The AI Act, implemented in the European Union at the beginning of 2024, compels the providers of AI tools to provide information about the data used to train models and their compliance with copyright laws.

A team from Stanford University assessed the most popular generative AI models’ level of compliance with the AI Act in 2023. As you can see, most of them scored a zero when it comes to copyright. We can surely expect improvement in this area in 2024. It’s worth checking for up-to-date information on whether the AI tool we’re using (especially for professional purposes) is adapting to the new regulations in force in the EU.

AI-generated images vs. websites’ terms of service

In most AI tools’ terms of service, the user is defined as the owner of all input (prompts) and the copyright holder to all generated content received (output). In practice, this also means that the user will be responsible for any potential copyright infringements by third parties if such are found in the output and published. However, the matter is complicated by the promise of several generative AI model providers (OpenAI, Google, Anthropic) to protect commercial users in the event of being sued for copyright infringement.

We’ll only find out how this so-called copyright shield will work in reality when courts rule in the first AI-generated results’ copyright infringement cases.

However, remember that these protections are only guaranteed to business users (in OpenAI: Enterprise customers, but not Pro customers who pay for access to GPT-4). These guarantees are also limited only to the original OpenAI tools and not others that use their models in other applications.

Open AI DALL-E and GPT

According to OpenAI, the user is the owner of both the prompt and the result of working with the tool. The user can decide whether or not to publish the results but cannot mislead the public about whether the image or content was created with the help of AI. You also cannot upload photos of people to be edited without their consent, pictures of public figures, or works you don’t have the right to use.

Microsoft Copilot

Microsoft’s image creator is actually DALL-E 3, which the company made available as part of its Copilot services. Like DALL-E, the company gives users copyrights, but when using the free version of their services, the license allows Microsoft to use their work as it pleases. Microsoft also has a detailed policy for using the tool, which prohibits, among other things, creating content that spreads hate or is of a sexual nature.

Midjourney

This paid generator is available (at least for now) only through Discord. It offers several paid subscription plans that differ in the amount of credits and access to faster instruction-based image generation. According to Midjourney’s terms of service, generated images can be used commercially, provided community guidelines are followed. Midjourney can use your content for promotional purposes, among other things. Only purchasing the Pro and Mega plans or the additional “Stealth” mode allows you to reserve exclusive rights to the works.

Adobe Firefly

Adobe is the first AI image generator that claims to have been trained solely on works to which the company has licenses (the second commercial generator is from Getty Images). This is supposed to ensure greater legal security than other tools of its kind. Firefly is available through their website and in all Adobe Creative Cloud (formerly Adobe Suite) programs: Photoshop, Express, Illustrator, and Stock.

We have the option to use our images commercially as long as they do not use features marked as beta (in the testing phase). We can also use commercial prompts, but the license specifies that Adobe can use them too. This does not apply to Premium subscribers, who can opt to use them or not. Like other tools, Adobe Firefly also specifies content that we cannot upload or create, e.g., including images of public figures or other people without their consent.

Almost a self-portrait of the author made with Adobe Firefly

Stable Diffusion

This image generation model was developed by the start-up Stability AI in 2022. Like other models, Stable Diffusion recognizes that all rights to generated images are the user’s. The service prohibits creating illegal or harmful content (e.g., violating personal rights or inciting violence). Stable Diffusion is an open model that can be used by installing it (and modifying it) on your own computer. Due to the information this company has shared about its model at the beginning of its operation, we know for a fact that the model was trained using content protected by copyrights and available behind so-called pay-walls, e.g. paid Getty Images stock photos.

Can data protection and AI go hand in hand?

Anyone who’d like to use AI tools commercially or in their organization should carefully examine each tool’s terms of service. Tools such as ChatGPT, Google Gemini, or Microsoft Copilot have several versions: free, paid subscriptions, and for businesses. They often differ not only in the scope of each model’s capabilities but also in their security guarantees, level of data protection, and the terms under which the companies providing the tool can use the content we input into a given Gen AI app (prompts, attachments, etc.).

When choosing a tool for your team, organization, or company, make sure you know:

Who has access to the data you input into the Gen AI app? Only your organization or the provider of the service as well? What is the scope of their access to your data?
Will the data we input be used to train the AI model further? Is there an option to decline or disable it?
What security guarantee does the provider offer during the processing of your data?
Is it possible to sign a contract concerning the processing of personal data?

The answers to these questions are crucial for the security and compliance of your organization or company with data protection regulations (such as GDPR in the EU), just like in the case of other cloud service providers (e.g., email). Unlike services like email or CRM (Customer Relationship Management) programs, data processing by generative AI services is not immediately obvious to all users.

At this point in time (April 2024), only a handful of Gen AI tool versions from Google, OpenAI, and Microsoft for companies are compliant with European personal data protection regulations. In practice, this means that only after purchasing particular versions of their products, in these cases ChatGPT Team, Microsoft Copilot 365, or AI Microsoft Azure, will we have the possibility of entering into a contract concerning data protection and a guarantee that confidential organization or company data will remain confidential (ie. it will not be used to train the model).

Other versions, including paid versions like ChatGPT Plus, do not offer such guarantees. If we input personal data into them, we do so in violation of data protection laws. If we input our organization’s confidential data into them, it may be used to further train the model and be shared when generating answers for other users.

The Vischer consulting agency reviewed all the most popular tools, or more precisely, different versions of their terms of service for individual and commercial clients. Find their comparison here.

The rules of using generative AI at work: Where to start?

Employees often use generative AI tools at their discretion. Organizations rarely have an AI policy or single tool they use. The very nature of services like ChatGPT, where one can have a conversation about anything at all, makes it easy to incidentally input sensitive information. That is why, in addition to checking the terms of service, we should also define the rules for using these tools at work.

Anonimyze personal data when inputting documents into chatbots.
Be transparent when using generative AI tools.
Clearly communicate to coworkers that you are using Gen AI when creating shared content (emails, presentations, etc.).
If you’re using Gen AI based on other people’s content, e.g. when creating automatic transcripts of meetings, remember to obtain consent to record.
Before deciding to use generative AI tools on a larger scale when working on a project, with a team, or at a company, research the tools. Find out about any potential issues or flaws. Check which versions (many Gen AI tools offer a choice of language models they use) will be best for what you need to do and how much of the tool can be customized (e.g., fine-tuning).
If you plan to use a Gen AI tool for tasks such as data analysis, consider purchasing the business version.
If you plan to use Gen AI tools to create publicly available content (e.g. ads), make sure you use tools that have been trained on legally acquired data or provide you with non-infringement guarantees. Many tools from companies like OpenAI, Microsoft, and Claude call these solutions a copyright shield but only provide guarantees for selected business or paid versions.
Review your organization’s existing policies and procedures—you may want to update them to better protect user and employee privacy. It may also be worth adding information about how Gen AI can be used (or prohibited) in certain situations.