Skip to content

Quickstart Guide: Dive into LangTorch

< < To go through this guide interactively, we recommend this Colab notebook version


Installation

To install LangTorch using pip:

pip install langtorch

To use the OpenAI API as our LLM, we need to set the OPENAI_API_KEY environment variable. You can find your API key on platform.openai.com

import os

os.environ["OPENAI_API_KEY"] = "your_api_key" # Replace with your actual OpenAI API key

1. Perform multiple LLM calls with TextTensors

from langtorch import TextTensor # holds texts instead of weights, supports tensor operations
from langtorch import TextModule # torch.nn modules working on TextTensors, perform prompt templating and llm calls

TextTensors are designed to streamline working with many pieces of text and performing parallel LLM calls. langtorch.TextTensor is a subclass of PyTorch's torch.Tensor that:

  • Holds text entries instead of numerical weights.

  • Special Structure: TextTensors entries can represent chunked documents, prompt templates, completion dictionaries, chat histories, and more.

  • Represents Geometrically: TextTensors have a shape and can be modified with PyTorch functions (reshape, stack, etc.).


In this example, we will create tensors holding prompt templates, fill them with a tensor of completion dictionaries, and send them to the OpenAI API.

prompt_tensor = TextTensor([["Is this an email address? {input_field}"],  
                            ["Is this a valid web link? {input_field}"]])  

# Adding TextTensors appends their content according to broadcasting rules  
prompt_tensor += " (Answer 'Yes' or 'No')"  
print(prompt_tensor)  
print("Shape =", prompt_tensor.shape)
Output:
[[Is this an email address? input_field (Answer 'Yes' or 'No')]
 [Is this a valid web link? input_field (Answer 'Yes' or 'No')]]
Shape = torch.Size([2, 1])

TextModules are torch.nn.Modules that work on TextTensors:

  • Tensor of Prompts: They hold a tensor of prompts instead of numerical weights.

  • Input Handling: They accept TextTensors as input, which are used to format the prompt tensor.

  • Formatting and Broadcasting: This allows formatting multiple prompts on multiple completions, controlling which prompt gets which input through broadcasting rules.

  • Activation Function: Most torch layers end with an activation function. Similarly, TextModules end in an activation of an LLM call.

In this example, we will create a TextModule that ends in a call to an OpenAI model. This module can now execute both tasks in parallel on as many inputs as we'd like:

tasks_module = TextModule(prompt_tensor, activation="gpt-3.5-turbo")

input_completions = TextTensor([{"input_field": "contact@langtorch.org"}, {"input_field": "https://langtorch.org" }])

# The first row of the output are answers to "Is this an email address?", second to "Is this a valid web link?"

# Columns are the two input completions

print(tasks_module(input_completions))
Output:
[[Yes   No ]
 [No    Yes]]

Comparison with the OpenAI Package

The TextModule above both formats the prompts and sends them to the OpenAI activation (langtorch.OpenAI). Let's compare LangTorch to the OpenAI package.

First, we'll separate the formatting and API steps.

A core feature of TextTensors is that they allow us to easily format several prompts on several inputs.

LangTorch achieves this by defining the product of two TextTensors: text1*text2 as an operation akin to text1.format(**text2). As shown below this is what happens in a TextModule before adding an activation:

# Using TextModule  
tasks_module = TextModule(prompt_tensor)  
prompts = tasks_module(input_completions)  

# Equivalently, using "TextTensor multiplication"  
prompts = prompt_tensor*input_completions  
print(prompts)
Output:
[[Is this an email address? contact@langtorch.org (Answer 'Yes' or 'No')   Is this an email address? https://langtorch.org (Answer 'Yes' or 'No')]
 [Is this a valid web link? contact@langtorch.org (Answer 'Yes' or 'No')   Is this a valid web link? https://langtorch.org (Answer 'Yes' or 'No')]]

The code above introduces the multiplication operation (used in TextModules), which acts like a more powerful format operation and allows for the various features of TextTensors. For a more in depth look, see TextTensor Multiplication.

We can send the formatted prompts to the OpenAI API by creating a langtorch.OpenAI module (the "activation") and compare speed between three API use cases:

import openai  
import langtorch  
import time  

langtorch_api = langtorch.OpenAI("gpt-3.5-turbo", system_message="You are a helpful assistant.", max_token=1, T=0.)  
openai_api = openai.OpenAI()  

# Open AI package  
start = time.time()  
responses = []  
for prompt in prompts.flat:  
    responses.append(openai_api.chat.completions.create(  
        model="gpt-3.5-turbo",  
        messages=[{"role": "system", "content": "You are a helpful assistant."},  
                  {"role": "user", "content": prompt}],  
        max_tokens=1,  
        temperature=0.  
    ))  
print(f"1.\n{str(responses)[:125]}...")  
print(f" OpenAI loop time taken: {time.time() - start:.2f} seconds")  

# LangTorch  
start = time.time()  
responses = langtorch_api(prompts)  
print(f"2.\n{responses}")  
print(f" LangTorch time taken: {time.time() - start:.2f} seconds")  

# LangTorch on repeated requests  
start = time.time()  
responses = langtorch_api(prompts)  
print(f"3.\n{responses}")  
print(f" LangTorch on repeated requests time taken: {time.time() - start:.2f} seconds")
Output:
1.
[ChatCompletion(id='chatcmpl-9WkJfJJ8jPGXivDlQfPT35UZS5q2K', choices=[Choice(finish_reason='length', index=0, logprobs=None, ...
 OpenAI loop time taken: 2.18 seconds
2.
[[Yes   No ]
 [No    Yes]]
 LangTorch time taken: 0.84 seconds
3.
[[Yes   No ]
 [No    Yes]]
 LangTorch on repeated requests time taken: 0.05 seconds

The OpenAI Activation in LangTorch (langtorch.OpenAI) isn't just a wrapper around the OpenAI package. The observed speed up comes from the fact that the LangTorch implementation:

  • Sends API calls in parallel, allowing multiple completions to be generated much faster than calling the OpenAI chat completion endpoint in sequence.
  • Saves on tokens and speeds up subsequent calls by caching API results, especially for embeddings and when the temperature is set to zero.
  • Optimizes requested API calls removing duplicates

langtorch.OpenAI also:

  • Operates directly on TextTensors, returning calls in the same shape as the input.
  • Handles API errors and retries by default

Chained Calls and Simple Zero-Shot Chain of Thought

LangTorch integrates seamlessly with torch, allowing you to easily chain TextModules using torch.nn.Sequential. This can be used to chain multiple LLM calls or additional prompting methods. A simplified example is a zero-shot Chain of Thought, for which we can create a reusable TextModule:

CoT = TextModule("{*} Let's think step by step.")

{} in a prompt template is a positional argument, taking one input argument in each entry. For our chain of thought module we use the placeholder {*}, which is a "wildcard" key that places all the input entries in its place.

Now to chain these torch.nn.Sequential:

import torch  
calculate = TextModule("Calculate the following: {} = ?")  

calculate_w_CoT = torch.nn.Sequential(  
    calculate,  
    CoT,  
    langtorch.OpenAI("gpt-3.5-turbo") ,  
    # You can add sequential calls here  
)  

input_tensor = TextTensor(["170*32", "123*45/10", "2**10*5"])  
output_tensor = calculate_w_CoT(input_tensor)  
output_tensor.view(1,-1) # We use torch methods to reshape TextTensors and view entries in columns
Output:
[[To calculate 170 multiplied by 32, we can use↵   Step 1: Multiply 123 by 45         First, calculate 2**10 (2 to the power of 10):
   long multiplication:                            123 * 45 = 5535                    2**10 = 1024                                   

       170                                         Step 2: Divide the result by 10    Next, multiply the result by 5:                
  x     32                                         5535 / 10 = 553.5                  1024 * 5 = 5120                                
  _________                                                                                                                          
       340   (170 * 2)                             Therefore, 123 * 45 / 10 = 553.5   Therefore, 2**10*5 = 5120.                     
  +  5100   (170 * 30)                                                                                                               
  _________                                                                                                                          
      5440                                                                                                                           

  Therefore, 170 multiplied by 32 equals 5440.                                                                                      ]]

Ensemble / Self-Consistency

Representing texts geometrically in a matrix or tensor allows for creating meaningful structures. Methods like ensemble voting and self-consistency involve generating multiple completions for the same task, easily represented by adding a dimension.

In this example, we build a module that creates multiple Chain-of-Thought answers for each input. These create separate TextTensor entries that we combine using a "linear layer" to marginalize over them, improving overall performance (see Wang et al., 2022).

calculate = TextModule("Calculate the following: {} = ? Let's think step by step.")  

ensemble_llm = langtorch.OpenAI("gpt-3.5-turbo",T=1.4,n = 3) # 3 completions per input with high temperature  

combine_answers = langtorch.Linear([[ f"\nAnswer {i}: " for i in [1,2,3] ]]) # Here we use properties of matrix multiplication:  
# Linear uses matmul, where row_of_labels @ column_of_completions == one long entry with labeled completions  

chose = TextModule("Select from these reasoning paths the most consistent final answer: {}")  

llm = langtorch.OpenAI("gpt-3.5-turbo", T=0)  

self_consistent_calculate = torch.nn.Sequential(  
    calculate,  
    ensemble_llm,  
    combine_answers,  
    chose,  
    llm  
)
input_tensor = TextTensor("171*33")

print(self_consistent_calculate(input_tensor))
Output:
[The most consistent final answer is: 171 * 33 = 5643. This is the answer provided in both Answer 2 and Answer 3, which break down the multi
  plication process into steps and arrive at the same final result.                                                                           ]

Saving results from repeating these calls, let's us see accuracy increasing from 25% (using calculate) to well over 50% (using self_consistent_calculate) on this input.

3. Automatic TextTensor Embeddings for Building Retrievers

TextTensors offer a straightforward way to work with embeddings. Every TextTensor can generate its own embeddings -- held in a torch tensor that preserves their shape. Moreover, TextTensors automatically act as their embeddings when passed to torch functions like cosine similarity.

These representations (available under the .embedding attribute) are created automatically right before they are needed, using a set embedding model (default is OpenAI's text-embedding-3-small).

import torch  

tensor1 = TextTensor([[["Yes"],  
                       ["No"]]])  
tensor2 = TextTensor(["Yeah", "Nope", "Yup", "Non"])  

torch.cosine_similarity(tensor1, tensor2)
Output:
tensor([[[0.6923, 0.6644, 0.6318, 0.5749],
         [0.5458, 0.7727, 0.5386, 0.7036]]])

We can access the embedding tensor under .embedding, change the embedding model and embed using .embed():

# To change embedding model and embed
tensor1.embedding_model = "text-embedding-3-large"
tensor1.embed()
# To access the embedding tensor
tensor1.embedding
Output:
tensor([[[[-0.0338,  0.0298, -0.0105,  ..., -0.0194, -0.0076,  0.0153]],
         [[-0.0281,  0.0073, -0.0121,  ..., -0.0071,  0.0094,  0.0090]]]])

Working with embeddings and documents (parsing, chunking and indexing)

To enable its functionalities TextTensor entries aren't just strings, but structured Text objects, which can be created from f-string templates, dictionaries and markup documents and are represented by a sequence of (label, text) pairs.

For the next task we need chunked text data. We can use the above fact to conveniently manipulate markdown files -- in this example, a paper on the abilities of language models. Download the markdown file from here:

> wget https://raw.githubusercontent.com/adamsobieszek/langtorch/main/src/langtorch/conf/paper.md

We can create a tensor with each markdown block in a separate entry simply with:

paper = TextTensor.from_file("paper.md")

print(paper[:3],"\n (...)")
print(f"shape = {paper.shape}")
Output:
[# Playing Games with Ais: The Limits of GPT-3 and Similar Large Language Models
  Adam Sobieszek & Tadeusz Price, 2022                                           
  ## Abstract                                                                    ] 
  (...)
shape = torch.Size([80])

As the text has headers and other text blocks, we need to extract only paragraphs. This is where text entries being structured becomes useful, as LangTorch provides iloc and loc accessors for Text entries and tensors:

# Select paragraphs
paragraphs = paper.loc["Para"]  
# Remove empty entries  
paragraphs = paragraphs[paragraphs!=""]  
print(paragraphs[:].apply(lambda x: x[:40] + "..."))
Output:
[This article contributes to the debate a...
  These are questions put to Salvador Dali...
  We take issue with some methodological u...
  We’ll show some situations in which reve...
  In the second part of this paper, we pro...
  In their paper, Floridi and Chiriatti pr...
  The logic of Floridi and Chiriatti’s rev...
  The most mature theory quantifying such ...
...

Build Custom Retriever and RAG modules

For complex modules we can subclass TextModule and as in PyTorch define our own init and forward methods.

Using how TextTensors can automatically act as aTensor of its embeddings, we can very compactly implement e.g. a retriever, which for an input entry finds k entries with the highest cosine similarity among the documents it holds:

class Retriever(TextModule):  
    def __init__(self, documents: TextTensor):  
        super().__init__()  
        self.documents = TextTensor(documents).view(-1)  

    def forward(self, query: TextTensor, k: int = 5):  
        cos_sim = torch.cosine_similarity(self.documents, query)  
        return self.documents[cos_sim.topk(k)]
retriever = Retriever(paragraphs)  
query = TextTensor("What's the relationship between prediction and compression?")  
retriever(query)
Output:
[Recall how a language model during training must compress an untenable number of conditional probabilities. The only wa
  to do this successfully is to pick up on the regularities in language (as pioneered by Shannon 1948). Why do we claim   
  that learning to predict words, as GPT does, can be treated as compressing some information? Let’s assume we’ve         
  calculated the conditional probability distribution given only the previous word of all English words. Consider, that   
  such a language model can either be used as a (Markavion) language generator or, following Shannon, be used for an      
  efficient compression of English texts. Continuing this duality, it has been shown, that if a language model such as GPT
  would be perfectly trained it can be used to optimally compress any English text (using arithmetic coding on its        
  predicted probabilities; Shmilovici et al., 2009). Thus the relationship between prediction and compression is that     
  training a language generator is equivalent to training a compressor, and a compressor must know something about the    
  regularities present in its domain (as formalized in AIXI theory; Mahoney 2006). To make good predictions it is not     
  enough to compress information about what words to use to remain grammatical (to have a syntactical capacity), but also 
...  ]

Note how the implementation didn't require us to learn about any new operations we would not find in regular PyTorch. One goal of LangTorch is to give developers control over these lower-level operations while being able to write compact code without a multitude of classes. For this reason, implementations such as the retriever above are not pre-defined classes in the main package.

We can now compose this module with a Module making LLM calls to get a custom Retrieval Augmented Generation pipeline:

class RAG(TextModule):  
    def __init__(self, documents: TextTensor, *args, **kwargs):  
        super().__init__(*args, **kwargs)  
        self.retriever = Retriever(documents)  

    def forward(self, user_message: TextTensor, k: int = 5):  
        retrieved_context = self.retriever(user_message, k) + "\n"  
        user_message = user_message + "\nCONTEXT:\n" + retrieved_context.sum()  
        return super().forward(user_message)
rag_chat = RAG(paragraphs,  
               prompt="Use the context to answer the following user query: ",  
               activation="gpt-3.5-turbo")  
assistant_response = rag_chat(query)  
print(assistant_response.reshape(1,1))
Output:
[[The relationship between prediction and compression is that training a language generator, such as GPT, is equivalent to training a compres↵
  sor. This is because in order to make accurate predictions, the model must compress information about the regularities present in the langu↵ 
  age it is trained on. This compression of information allows for generalization, which in turn leads to the ability to operate on novel inp↵ 
  uts and create novel outputs. So, prediction leads to compression, which leads to generalization, and ultimately to computer intelligence.  ]]

With only small modifications to the retriever, this module could also perform batched inference — performing multiple simultaneous queries without much additional latency. Note, prompt and activation are arguments inherited from TextModule and need the super().forward call to work.

We are excited to see what you will build with LangTorch. If you want to share some examples or have any questions, feel free to ask on our discord. In the likely event of encountering a bug, send it on discord or post on the GitHub Repo and we will fix it ASAP.