class
langtorch.TextTensor
A TextTensor
is a multi-dimensional matrix containing elements of structured text data, each represented as a Text
object. It is a specialized subclass of torch.Tensor
designed for handling and manipulating textual data within the LangTorch framework.
Initializing and Basic Operations
A TextTensor
can be constructed from Python lists or sequences of string-like inputs. Every entry is transformed into a Text
object, so each entry must satisfy one of its many possible input formats, which can include:
- Strings
- Tuples of the form
(label, text)
representing text keys and values - Dictionaries of correctly ordered keys and values
- Sequences of any of the above
Note
Strings written with the f-string syntax (containing {}
characters) will be parsed to enable formatting. Read more about parsing here.
Construction
You can construct a TextTensor
, similar to torch.tensor
, where the input sequence can be shaped into nested lists or a NumPy array:
from langtorch import TextTensor
# 0-d TextTensor
x1 = TextTensor("It's a single text or {prompt} {template}")
## A completion dictionary is also just one entry
x2 = TextTensor([{"key1": "value1", "key2": "value2"}])
# 1-d TextTensor
x3 = TextTensor(["A list", "of texts"])
# 2-d TextTensor, and so on
x4 = TextTensor([["A column of"],
[" paragraphs"]])
Note that the first text tensor created in this example contains {}
special characters, which allows prompt
and template
be replaced with values from another TextTensor, as if that entry was an f-string.
TextTensors
can also be created with familiar operations:
# A tensor of empty strings
langtorch.zeros(shape)
langtorch.zeros_like(tensor)
# Tensor with identical entries
langtorch.full(shape, text_entry)
langtorch.full_like(other, text_entry)
Note
TextTensors
almost never get initialized from torch.Tensor
s directly.
Integer number entries are reserved for the order of formatting without keys, i.e. "Place {1} place {0}", which takes two elements.
Indexing and Slicing
Access and modify the contents of a TextTensor
using Python's indexing and slicing notation:
tt = TextTensor([["Hello", "World"], ["Torch", "Framework"]])
print(tt[0,1]) # Outputs: "World"
tt[1][0] = "LangTorch"
print(tt) # Outputs: [["Hello", "World"],
# ["LangTorch", "Framework"]])
To access a single entry from a one-element TextTensor
use the item()
method:
tt = TextTensor([["Hello"]])
single_text = tt.item()
print(single_text) # Outputs: "Hello"
Basic Features
- Supports standard tensor operations listed here.
- Enables TextTensor addition and multiplication, which respectively perform concatenation and prompt formatting. Learn more about how multiplication works here.
- Seamlessly integrates with PyTorch, supports autograd, being assigned as a Parameter in a Module and so on.
Special Operations
- Addition (
+
): Concatenates text entries, paired up according to broadcasting rules. - Multiplication (
*
): Formats prompt templates with values, where the left tensor entries are formatted with entries from the leftTextTensor
.
Both operations are inherited from the Text
class that represents the tensor entries. To understand better how entries are structured and modified with both operations, see Text
.
Methods
reshape(*shape)
Returns a reshaped version of the TextTensor. position.
squeeze()
Returns a new TextTensor with all the dimensions of size 1 removed.
unsqueeze(dim=0)
Returns a new TextTensor with a dimension of size one inserted at the specified
embed(model=$default)
Embeds the TextTensor using the specified embedding model.
set_key(keys=None, inplace=False)
Sets the top-level keys for the Text
entries in the TextTensor
(removing previous keys).
add_key(keys=None, inplace=False)
Add a new top-level key, for the Text
entries, preserving the previous structure:
apply(func)
Applies a function to each Text
entry in the TextTensor
.
sum(dim=None, keepdim=False)
Reduces the input tensor over the specified dimensions, optionally keeping dimensions.
format(**kwargs)
Returns a version of the tensor with formatted entries as if multiplied with substitutions from kwargs
.
Examples
Creating a TextTensor
with a prompt template and two completion dicts:
tt = TextTensor("Hello, {name}!")
tt * TextTensor([{"name": "World"}, {"name":"Tad"}])
# Output: TextTensor(["Hello, World!", "Hello, Tad!"])
Concatenating TextTensors
a with broadcasting:
tt1 = TextTensor(["Hello", "Go away"])
tt2 = TextTensor([", World!"])
tt1 + tt2
# Output: TextTensor(["Hello, World!", "Go away, World!])
Notes
- Operations meant to format prompt templates require accurate matching between the entries of the template (left) and completion (right) tensor.
- Consider setting
requires_grad
oris_param
in the init to track operations with autograd. TextTensors
support some operations likeapply
which break the autograd tracing.