TextTensor Attributes
Each TextTensor has 6 main attributes, two per each representation: texts (content and ttype), embeddings (embedding and embedding_model) and tokens (tokens and tokenizer). 
- 
The
contentattribute is a numpy array with the textual entries. Thettypeis the "text type" of these entries, which is any subclass of theTextclass. - 
The
embeddingattribute holds atorch.tensorrepresentation of the text entries, computed using the model specified in theembedding_model(name of OpenAI model or local Module). - 
tokensare thetorch.Tensortokenized text entries fromcontent. While embeddings by default are computed using the "text-embedding-3-small" mode, to tokenize content thetokenizerattribute has to be set to a tokenizer from the Transformers library. 
Setting attributes
The content is always set, as this is what we initialize a TextTensor on.
The embedding and tokens are not calculated upon initialization to save costs. They are automatically computed directly before an operation that requires them or can be invoked manually with .embed() and .tokenize() respectively.
TextTensor subclasses and Text attributes
By default, a TextTensor is a tensor with entries of type Text. We can get various benefits from using text tensor subclasses that set ttype to a subclass of Text, for example:
- A 
Textsubclass can add a custom__str__method, e.g.langtorch.Markdown, which formats entries to a string differently (which is done e.g. when passint to an LLM ) - Set the 
allowed_keysattribute to require certain keys, e.g.langtorch.Chatwhich guarantees a correct chat history representation with only system, user or assistant keys