An LLM generates textual content one token at a time. These tokens can characterize a single character, phrase or a part of a phrase. To create a sequence of coherent textual content, the mannequin predicts the subsequent probably token to generate. These predictions are based mostly on the previous phrases and the chance scores assigned to every potential token.
For instance, with the phrase “My favourite tropical fruits are __.” The LLM may begin finishing the sentence with the tokens “mango,” “lychee,” “papaya,” or “durian,” and every token is given a chance rating. When there’s a variety of various tokens to select from, SynthID can modify the chance rating of every predicted token, in instances the place it received’t compromise the standard, accuracy and creativity of the output.
This course of is repeated all through the generated textual content, so a single sentence may comprise ten or extra adjusted chance scores, and a web page might comprise tons of. The remaining sample of scores for each the mannequin’s phrase selections mixed with the adjusted chance scores are thought-about the watermark.