wa-law.org > bill > 2025-26 > HB 2503 > Original Bill

HB 2503 - AI training data

Source

Section 1

The definitions in this section apply throughout this chapter unless the context clearly requires otherwise.

  1. "Aggregate consumer information" means information that relates to a group or category of consumers, from which individual consumer identities have been removed, that is not linked or reasonably linkable to any consumer or household, including via a device. "Aggregate consumer information" does not mean one or more individual consumer records that have been deidentified.

  2. "Artificial intelligence" means the use of machine learning and related technologies that use data to train statistical models for the purpose of enabling computer systems to perform tasks normally associated with human intelligence or perception, such as computer vision, speech, or natural language processing, and content generation.

  3. "Developer" means a person, partnership, or corporation primarily engaged in developing or substantially modifying a generative artificial intelligence system intended for commercial distribution or public use. For the purposes of this subsection, "public use" does not include use by an affiliate as defined in RCW 19.146.010. This definition excludes individuals or entities that develop artificial intelligence systems solely for internal use or research purposes or those that utilize third-party artificial intelligence systems via application programming interface without substantial modification. This definition also excludes public entities and tribal nations.

  4. "Generative artificial intelligence" means an artificial intelligence system that generates novel data or content based on a foundation model.

  5. "Security and integrity" means the ability of:

    1. Networks or information systems to detect security incidents that compromise the availability, authenticity, integrity, and confidentiality of stored or transmitted personal information;

    2. Developers, users, or businesses to detect security incidents, resist malicious, deceptive, fraudulent, or illegal actions and to help prosecute those responsible for those actions; and

    3. Developers, users, or businesses to ensure the physical safety of natural persons.

  6. "Substantial modification" means a new version, new release, or substantial update from the developer, or an intentional and deliberate change to a generative artificial intelligence system by a deployer, that materially changes its functionality or performance in a manner that was not reasonably foreseeable to the developer at the time the artificial intelligence system was made publicly available by the developer.

  7. "Synthetic data generation" means a process in which original data are used to create artificial data that have some of the statistical characteristics of the original data.

  8. "Train a generative artificial intelligence system" includes testing, validating, or fine tuning by the developer of the generative artificial intelligence system.

Section 2

  1. On or before January 1, 2027, and before each time thereafter that a generative artificial intelligence system, or a substantial modification to a generative artificial intelligence system, released on or after January 1, 2022, is made publicly available to Washingtonians for use, regardless of whether the terms of that use include compensation, the developer of the system shall post on the developer's internet website documentation regarding the data used by the developer to train the generative artificial intelligence system including, but not limited to, a high-level summary of the datasets used in the development of the generative artificial intelligence system including, but not limited to:

    1. The sources of the datasets;

    2. A general description of how the datasets further the intended purpose of the generative artificial intelligence system;

    3. The number of data points included in the datasets, which may be in general ranges, and with estimated figures for dynamic datasets;

    4. A high-level description of the types of data points within the datasets;

    5. Whether the datasets include any data protected by copyright, trademark, or patent, or whether the datasets are entirely in the public domain;

    6. Whether the datasets were purchased or licensed by the developer;

    7. Whether the datasets include personal information, as defined in RCW 19.373.010;

    8. Whether the datasets include aggregate consumer information;

      1. Whether there was any cleaning, processing, or other modification to the datasets by the developer, including the intended purpose of those efforts in relation to the generative artificial intelligence system;
    9. The dates the datasets were first trained or the date of the last significant update to the datasets during the development of the generative artificial intelligence system; and

    10. Whether the generative artificial intelligence system used or continuously uses synthetic data generation in its development. A developer may include a description of the functional need or desired purpose of the synthetic data in relation to the intended purpose of the system.

  2. A developer is not required to post documentation regarding the data used to train a generative artificial intelligence system for any of the following:

    1. A generative artificial intelligence system whose sole purpose is to help ensure security and integrity;

    2. A generative artificial intelligence system whose sole purpose is the operation of aircraft in the national airspace; and

    3. A generative artificial intelligence system developed for national security, military, or defense purposes that is made available only to a federal entity.

  3. The requirements in this chapter do not apply to generative artificial intelligence systems that are subject to applicable requirements under the federal food, drug, and cosmetic act, 21 U.S.C. Sec. 301 et seq., as amended.

Section 3

Section 2 of this act shall be construed to require developers to comply based on the generally acknowledged state of the art, which may include, but is not limited to, guidance issued by the national institute of standards and technology, without compromising their own intellectual property rights or trade secrets.

Section 4

The legislature finds that the practices covered by this chapter are matters vitally affecting the public interest for the purpose of applying the consumer protection act, chapter 19.86 RCW. A violation of this chapter is not reasonable in relation to the development and preservation of business and is an unfair or deceptive act in trade or commerce and an unfair method of competition for purposes of applying the consumer protection act, chapter 19.86 RCW.


Created by @tannewt. Contribute on GitHub.