Skip to content

class langtorch.TextTensor

A TextTensor is a multi-dimensional matrix containing elements of structured text data, each represented as a Text object. It is a specialized subclass of torch.Tensor designed for handling and manipulating textual data within the LangTorch framework.

Initializing and Basic Operations

A TextTensor can be constructed from Python lists or sequences of string-like inputs. Every entry is transformed into a Text object, so each entry must satisfy one of its many possible input formats, which can include:

  • Strings
  • Tuples of the form (label, text) representing text keys and values
  • Dictionaries of correctly ordered keys and values
  • Sequences of any of the above

Note

Strings written with the f-string syntax (containing {} characters) will be parsed to enable formatting. Read more about parsing here.

Construction

You can construct a TextTensor, similar to torch.tensor, where the input sequence can be shaped into nested lists or a NumPy array:

from langtorch import TextTensor

# 0-d TextTensor
x1 = TextTensor("It's a single text or {prompt} {template}")
## A completion dictionary is also just one entry
x2 = TextTensor([{"key1": "value1", "key2": "value2"}])

# 1-d TextTensor
x3 = TextTensor(["A list", "of texts"])

# 2-d TextTensor, and so on
x4 = TextTensor([["A column of"], 
                [" paragraphs"]])

Note that the first text tensor created in this example contains {} special characters, which allows prompt and template be replaced with values from another TextTensor, as if that entry was an f-string.

TextTensors can also be created with familiar operations:

# A tensor of empty strings
langtorch.zeros(shape)
langtorch.zeros_like(tensor)

# Tensor with identical entries
langtorch.full(shape, text_entry)
langtorch.full_like(other, text_entry)

Note

TextTensors almost never get initialized from torch.Tensors directly.

Integer number entries are reserved for the order of formatting without keys, i.e. "Place {1} place {0}", which takes two elements.

Indexing and Slicing

Access and modify the contents of a TextTensor using Python's indexing and slicing notation:

tt = TextTensor([["Hello", "World"], ["Torch", "Framework"]])
print(tt[0,1])  # Outputs: "World"
tt[1][0] = "LangTorch"
print(tt)  # Outputs: [["Hello", "World"], 
#                      ["LangTorch", "Framework"]])

To access a single entry from a one-element TextTensor use the item() method:

tt = TextTensor([["Hello"]])
single_text = tt.item()
print(single_text)  # Outputs: "Hello"

Basic Features

  • Supports standard tensor operations listed here.
  • Enables TextTensor addition and multiplication, which respectively perform concatenation and prompt formatting. Learn more about how multiplication works here.
  • Seamlessly integrates with PyTorch, supports autograd, being assigned as a Parameter in a Module and so on.

Special Operations

  • Addition (+): Concatenates text entries, paired up according to broadcasting rules.
  • Multiplication (*): Formats prompt templates with values, where the left tensor entries are formatted with entries from the left TextTensor.

Both operations are inherited from the Text class that represents the tensor entries. To understand better how entries are structured and modified with both operations, see Text.

Methods

reshape(*shape)

Returns a reshaped version of the TextTensor. position.

squeeze()

Returns a new TextTensor with all the dimensions of size 1 removed.

unsqueeze(dim=0)

Returns a new TextTensor with a dimension of size one inserted at the specified

embed(model=$default)

Embeds the TextTensor using the specified embedding model.

set_key(keys=None, inplace=False)

Sets the top-level keys for the Text entries in the TextTensor (removing previous keys).

add_key(keys=None, inplace=False)

Add a new top-level key, for the Text entries, preserving the previous structure:

apply(func)

Applies a function to each Text entry in the TextTensor.

sum(dim=None, keepdim=False)

Reduces the input tensor over the specified dimensions, optionally keeping dimensions.

format(**kwargs)

Returns a version of the tensor with formatted entries as if multiplied with substitutions from kwargs.

Examples

Creating a TextTensor with a prompt template and two completion dicts:

tt = TextTensor("Hello, {name}!")
tt * TextTensor([{"name": "World"}, {"name":"Tad"}])
# Output: TextTensor(["Hello, World!", "Hello, Tad!"])

Concatenating TextTensorsa with broadcasting:

tt1 = TextTensor(["Hello", "Go away"])
tt2 = TextTensor([", World!"])
tt1 + tt2
# Output: TextTensor(["Hello, World!", "Go away, World!])

Notes

  • Operations meant to format prompt templates require accurate matching between the entries of the template (left) and completion (right) tensor.
  • Consider setting requires_grad or is_param in the init to track operations with autograd.
  • TextTensors support some operations like apply which break the autograd tracing.