How to initialize a llmp job

When dealing with large language models in programming projects it is common challenge to get reliable outputs from the model. By carefully crafting prompts for different tasks, generating and selecting few shots examples and tuning the temperature and top_k parameters, it is possible to get good results. However, this process is time-consuming and requires a lot of manual work. While developing a project it is a hurdle that forces you to leave the flowstate and contradicts the idea of fast iteration.

While using LLM for text generation and creative tasks may be sufficient with long/short form text outputs, for programming tasks the output requires a more structured type reliable format. To integrate LLM generation tasks in a programming project without leaving the development flowstate, we want to reduce the time spent on prompt engineering as much as possible. A LLMP unit of work (Job) is therefore reduced to the minimal generative effort needed to define the task. Simply by defining the input and the output model of the task. The initialization event will then handle further example generation, instruction generation and run an optimization process to craft a reliable prompt from it. Each Job is stored under the default or custom job directory and can be reused within your project by referencing the job id or (optional) job name.

To initalize a Job we have different possibilites that we want to present in the following Notebooks:

Initialize a Job by defining the input and output model
Initialize a Job

Define Input and Output Model using Pydantic

For our example we will define a Job for simple labeling task. Where we want to define the genre of a book. To define the possible Labels we will use a Enum class from the standard library. To define the input and output model we will use Pydantic.

from typing import Literal
from pydantic import BaseModel
from llmp.services.program import Program


class InputObject(BaseModel):
    book_title: str
    book_author: str
    release_year: int
    
class OutputObject(BaseModel):
    genre: Literal["fiction", "non-fiction", "fantasy", "sci-fi", "romance", "thriller", "horror", "other"]


# Initialize a job
program = Program("Book to Genre", input_model=InputObject, output_model=OutputObject)

# load a job
# program = Program("Book to Genre")

input_data={
    "book_title": "The Lord of the Rings",
    "book_author": "J. R. R. Tolkien",
    "release_year": 1954
}
program(input_data=input_data)

    ---------------------------------------------------------------------------

    MaxRetriesError                           Traceback (most recent call last)

    Cell In[4], line 6
          1 input_data={
          2     "book_title": "The Lord of the Rings",
          3     "book_author": "J. R. R. Tolkien",
          4     "release_year": 1954
          5 }
    ----> 6 program(input_data=input_data)


    File ~\Codes\LLMP\src\llmp\services\program.py:110, in Program.__call__(self, input_data, auto_optimize, log_action, return_metrics, **kwargs)
        107     is_first_run = True
        108     generator_type = "consensus"
    --> 110 output, run_metrics = self.job_manager.generate_output(
        111     self.job, input_data, generator_type=generator_type, return_metrics=return_metrics, **kwargs
        112 )
        114 if is_first_run:
        115     self.job_manager.optimize_job(self.job, mode="all")


    File ~\Codes\LLMP\src\llmp\services\job_manager.py:81, in JobManager.generate_output(self, job, input_data, generator_type, **kwargs)
         79 """Generate output for a specific input."""
         80 generator = load_generator_cls(generator_type=generator_type)(job, **kwargs)
    ---> 81 result, run_metrics = generator.generate(input_data, **kwargs)
         82 event_metric = {
         83     "verification_type": generator.verification_type,
         84     **run_metrics,
         85     **kwargs
         86 }
         87 job.log_generation(input_data, result, event_metric)


    File ~\Codes\LLMP\src\llmp\components\generator\simple.py:35, in Generator.generate(self, input_data, **kwargs)
         25 """Generate an output based on the job and input data.
         26 
         27 loads the engine from the job and runs it with the input data.
       (...)
         31     **kwargs: any -  passed to engine.run() method
         32 """
         34 engine = load_engine_from_job(self.job, self._job_settings, **self._engine_kwargs)
    ---> 35 output, run_metrics = engine.run(input_data, **kwargs)
         36 return output, run_metrics


    File ~\AppData\Local\pypoetry\Cache\virtualenvs\llmp-l_UTfyBq-py3.11\Lib\site-packages\structgenie\engine\genie.py:61, in StructEngine.run(self, inputs, **kwargs)
         59 e = MaxRetriesError(f"exceeded max retries: {self.max_retries}")
         60 self._log_error(e)
    ---> 61 raise e


    MaxRetriesError: MaxRetriesError(exceeded max retries: 4)

Despite defining an Enum class we can also set options via Field or use the Literal type. The following example shows how to define the same OutputModel using Field and Literal.

Let's define a new program

from typing import Optional


class InputObject(BaseModel):
    book_title: str
    book_author: str
    release_year: int
    
class OutputObject(BaseModel):
    genre: Literal["fiction", "non-fiction", "fantasy", "sci-fi", "romance", "thriller", "horror", "other"]
    has_sequal: bool
    sequal_name: Optional[str] = "None"


# Initialize a job
program = Program("Book to Genre/Sequal", input_model=InputObject, output_model=OutputObject)

# load a job
#program = Program("Book to Genre/Sequal")

    ---------------------------------------------------------------------------

    NameError                                 Traceback (most recent call last)

    Cell In[1], line 4
          1 from typing import Optional
    ----> 4 class InputObject(BaseModel):
          5     book_title: str
          6     book_author: str


    NameError: name 'BaseModel' is not defined

program.job.generation_log

    [{'event_id': '6eee28be26ef44b68a7085418e8964e7',
      'input': {'book_title': 'The Bible',
       'book_author': 'Johannes Gutenberg',
       'release_year': 1450},
      'output': {'genre': 'non-fiction',
       'has_sequal': False,
       'sequal_name': 'None'}},
     {'event_id': '9a1109c8b8fd46a0a2195a8c2dce0d63',
      'input': {'book_title': 'Harry Potter',
       'book_author': 'J. K. Rowling',
       'release_year': 1997},
      'output': {'genre': 'fantasy',
       'has_sequal': True,
       'sequal_name': 'Harry Potter and the Chamber of Secrets'}}]

input_data={
    "book_title": "Harry Potter",
    "book_author": "J. K. Rowling",
    "release_year": 1997
}
result = program(input_data=input_data)


if result.has_sequal:
    print(f"The book {result.sequal_name} is a sequal to {input_data['book_title']}")
else:
    print(f"The book {input_data['book_title']} has no sequal")

    The book Harry Potter and the Chamber of Secrets is a sequal to Harry Potter