Text Multiplication for Templating
The LangTorch Text
class encodes the structure, and enables the formatting of texts via multiplication. TextTensor
multiplication follows the same rules but applies them to pairs of entries according to broadcasting rules. For most applications it is enough to use multiplication for string formatting as in python, where the left text is created from a template and the right from a completion dictionary such that t1 * t2 = t1.format(**t2.keys())
>> Text("Hi, {name}!") * Text({"name": "Hubert"})
# Outputs: Hello, Hubert!
Understanding Multiplication
Text
multiplication in LangTorch is a composition of two texts t1 * t2
.
Text Representation: Recall that Text
represents content with a list of key-value pairs. Unlike dictionaries the same key may be used multiple times (the values being the actual substrings that make up the text). In this section we will assume t2
only has only one key-value pair ).
Intuitively
-
t1
the (left side of the operation) is the text being changed -
If
t1 = (key, value)
the right side of the operation(_)*t2
means "replace with ".To perform this replacement of
key
, we look for an entry int1
whose value matcheskey
, it is replaced with the new value. If there is no matching value, multiplication acts just like addition by appending ) to the end oft1
.
The two ways texts can be composed (replacement and addition):
>> t1_t2 = Text(("","key")) * Text(("key","value"))
>> t1_t2.items()
# Outputs: [("", "value")]
>> t1_t2 = Text(("", "key")) * Text(("key2", "value"))
>> t1_t2.items()
# Outputs: [("", "key"),("key2", "value")]
Pairs that make up a text object can be access with its .items(), to understand how these two rules allow for templating see how parsed t1
has matching values to the t2
keys:
t1 = Text("{name} says: {greeting}")
print(prompt_template.items())
# Outputs: [('', 'name'), ('',' says: '), ('', 'greeting')]
t2 = Text({"greeting": "Hello", "name": "Alice"})
print(input_values.items())
# Outputs: [('greeting', 'Hello'), ('name', 'Alice')]
Combining Multiple Operations: We can chain multiplication t1*t2*t3
to replace or append multiple values. In the example, we also use the completion TextTensors
to route different entries to different prompts:
`# Two prompts with two completions needed
conversation_template = TextTensor([["{name} says: {greeting}."],
["{name} replies: {reply}"]])
shared_completion = TextTensor({"name": "Bob"})
unique_completion = TextTensor([[{"greeting": "Hello"}],
[{"reply": "Hi there!"}]])
formatted_conversation = conversation_template * shared_completion * unique_completion
print(formatted_conversation)
# Outputs: [[Bob says: Hello. ],
# [Bob replies: Hi there!]]`
Nested entries: The expressiveness of the (key, value) Text representations is based on the fact that we can set the value to be another structured Text object instead of one string, allowing for tree-like structures that can represent e.g. chat templates:
chat = Text(
('user', "Hi"),
('assistant', "Hello"),
('user', 'Can you explain {theory} like im five?'),
)
chat *= Text({"theory": "critical theory"})
print(chat.iloc[-1])
# Outputs: 'Can you explain critical theory like im five?'
Wildcard: To make it easier to work with nested key-value pairs there is also a special "wildcard" entry (key,value) = (key, "*")
that if present in the left text replaces the "*"
value with the whole right text content, making it nested with a shared new key.
Optional Formal Introduction
The TextTensor
multiplication operation can be thought of as a group of text composition. In this group, the empty string serves as the identity element, and reversing keys and values in each segment creates the inverse element. When an element is multiplied by its inverse, the result is the identity element.
Let's denote the multiplication operation as . For any Text
objects, , , and , the following properties hold:
- Associativity:
- Identity element:
- Inverse element: For each , there exists an such that
However, the multiplication operation is not commutative, meaning in general.
The elements of this group are pairs of strings or a sequence of such pairs. An empty string is the element . We denote appending strings and as .
In defining the "text composition" operation we aim to for it to satisfy that the right side of the composition intuitively means "replace with " and acts on the entries of the left element. The output is a pair or sequence of pairs, satisfying:
If the left Text
has multiple segments, the element on the right is checked for matches on segments of the left text from left to right. If a match is found, replace that segment, append at the end otherwise. If the left element has nested entries, like , for purposes of multiplication the matching value would be .
If the right Text
object has multiple segments , the composition is composing the left Text
with each segment in sequence:
With the added restriction that we don't format the same segment twice (which includes not matching with ).