Text Multiplication for Templating

The LangTorch Text class encodes the structure, and enables the formatting of texts via multiplication. TextTensor multiplication follows the same rules but applies them to pairs of entries according to broadcasting rules. For most applications it is enough to use multiplication for string formatting as in python, where the left text is created from a template and the right from a completion dictionary such that t1 * t2 = t1.format(**t2.keys())

>> Text("Hi, {name}!") *  Text({"name": "Hubert"}) 
# Outputs: Hello, Hubert!

Understanding Multiplication

Text multiplication in LangTorch is a composition of two texts t1 * t2.

Text Representation: Recall that Text represents content with a list of key-value pairs. Unlike dictionaries the same key may be used multiple times (the values being the actual substrings that make up the text). In this section we will assume t2 only has only one key-value pair $(\text{key}, \text{value}$ ).

Intuitively

t1 the (left side of the operation) is the text being changed
If t1 = (key, value) the right side of the operation (_)*t2 means "replace $\text{key}$ with $\text{value}$ ".

To perform this replacement of key, we look for an entry in t1 whose value matches key, it is replaced with the new value. If there is no matching value, multiplication acts just like addition by appending $(\text{key}, \text{value}$ ) to the end of t1.

The two ways texts can be composed (replacement and addition):

>> t1_t2 = Text(("","key")) * Text(("key","value")) 
>> t1_t2.items()
# Outputs: [("", "value")]
>> t1_t2 = Text(("", "key")) * Text(("key2", "value"))
>> t1_t2.items()
# Outputs: [("", "key"),("key2", "value")]

Pairs that make up a text object can be access with its .items(), to understand how these two rules allow for templating see how parsed t1 has matching values to the t2 keys:

t1 = Text("{name} says: {greeting}")
print(prompt_template.items())  
 # Outputs: [('', 'name'), ('',' says: '), ('', 'greeting')]

t2 = Text({"greeting": "Hello", "name": "Alice"})  
print(input_values.items())  
# Outputs: [('greeting', 'Hello'), ('name', 'Alice')]

Combining Multiple Operations: We can chain multiplication t1*t2*t3 to replace or append multiple values. In the example, we also use the completion TextTensors to route different entries to different prompts:

`# Two prompts with two completions needed
conversation_template = TextTensor([["{name} says: {greeting}."],
                                    ["{name} replies: {reply}"]]) 

shared_completion = TextTensor({"name": "Bob"})
unique_completion = TextTensor([[{"greeting": "Hello"}], 
                                [{"reply": "Hi there!"}]])

formatted_conversation = conversation_template * shared_completion * unique_completion 
print(formatted_conversation) 
# Outputs: [[Bob says: Hello.      ],
#           [Bob replies: Hi there!]]`

Nested entries: The expressiveness of the (key, value) Text representations is based on the fact that we can set the value to be another structured Text object instead of one string, allowing for tree-like structures that can represent e.g. chat templates:

chat = Text(
            ('user', "Hi"),
            ('assistant', "Hello"),
            ('user', 'Can you explain {theory} like im five?'),
)
chat *= Text({"theory": "critical theory"})
print(chat.iloc[-1])
# Outputs: 'Can you explain critical theory like im five?'

Wildcard: To make it easier to work with nested key-value pairs there is also a special "wildcard" entry (key,value) = (key, "*") that if present in the left text replaces the "*" value with the whole right text content, making it nested with a shared new key.

Optional Formal Introduction

The TextTensor multiplication operation can be thought of as a group of text composition. In this group, the empty string serves as the identity element, and reversing keys and values in each segment creates the inverse element. When an element is multiplied by its inverse, the result is the identity element.

Let's denote the multiplication operation as $\circ$ . For any Text objects, $a$ , $b$ , and $c$ , the following properties hold:

Associativity: $(a \circ b) \circ c = a \circ (b \circ c)$
Identity element: $a \circ \text{empty_string} = \text{empty_string} \circ a = a$
Inverse element: For each $a$ , there exists an $a^{-1}$ such that $a \circ a^{-1} = a^{-1} \circ a = \text{empty_string}$

However, the multiplication operation is not commutative, meaning $a \circ b \neq b \circ a$ in general.

The elements of this group are pairs of strings $(\text{key}, \text{value})$ or a sequence of such pairs. An empty string is the element $(\text{""}, \text{""})$ . We denote appending strings $s_1$ and $s_2$ as $s_1s_2$ .

In defining the "text composition" operation we aim to for it to satisfy that the right side of the composition $\circ\ (\text{key}, \text{value})$ intuitively means "replace $\text{key}$ with $\text{value}$ " and acts on the entries of the left element. The output is a pair or sequence of pairs, satisfying:

$(\text{key}_1, \text{value}_1) \circ (\text{key}_2, \text{value}_2) = \begin{cases} (\text{key}_2, \text{value}_1), & \text{if } \text{key}_1 = \text{value}_2 \\ \\ (\text{key}_1, \text{value}_1) (\text{key}_2, \text{value}_2), & \text{otherwise} \end{cases}$

If the left Text has multiple segments, the element on the right is checked for matches on segments of the left text from left to right. If a match is found, replace that segment, append at the end otherwise. If the left element has nested entries, like $(\text{key}_{1a}, (\text{key}_{1b}, \text{value}_1))$ , for purposes of multiplication the matching value would be $key1 := \text{key}_{1a}+"."+\text{key}_{1b}$ .

If the right Text object has multiple segments $[s_1, s_2, s_3]$ , the composition is composing the left Text with each segment in sequence:

$((\text{left\_text} \circ s_1) \circ s_2) \circ s_3$ With the added restriction that we don't format the same segment twice (which includes not matching $s_1$ with $s_2$ ).