DSPy MIPROv2 Configuration Explained

This page explains how the MIPROv2 optimizer Class and Compiler parameters relate to each step of the MIPROv2 optimizer to show you how to configure MIPROv2.

Author

Will James

Published

August 13, 2025

Background

Understanding what happens inside DSPy optimizers can be challenging, especially when dealing with the optimizer’s complex Constructor and Compiler parameters and their interactions.

This guide will help clarify the intricacies of MIPROv2 and enable you to configure parameters with predictable outcomes.

Based on DSPy Source Version: 3.0.0

MIPROv2 Key Concepts

The objective of the MIPROv2 Optimizer is to improve your DSPy Program which ultimately means improving the final prompt and getting better results from an LLM. In over-simplified terms, it is prompt improvement.

There is some terminology to understand that helps clarify concepts in DSPY. Let’s review these first before walking through the MIPROv2 steps and related parameters.

Prompt components

It is crucial to understand that the DSPy framework distinguishes between instructions and the prompt that gets sent to an LLM.

Prompt: DSPy generates a system prompt from several components for you. You don’t write prompts when using DSPy.
Instructions: Instructions are the natural language desriptions in your DSPy Program.

DSPy will included natural language descriptions from within DSPy modules you may use, such as Signatures, Predict, ChainOfThought, etc.. in the DSPy prompt output. DSPy also includes information about their structure.

You should be familiar with this, but incase you are not, here is a short DSPy Program example, instructions are in the comments, code is the structure:
```
class MySignature(dspy.Signature):
    """Classify if the sentence is grammatically correct (1) or not (0)."""
    sentence = dspy.InputField()
    label = dspy.OutputField(desc="1 if correct, 0 if incorrect")

class MyClassifier(dspy.Module):
    def __init__(self):
        super().__init__()
        self.predict = dspy.Predict(MySignature)

    def forward(self, sentence):
        return self.predict(sentence=sentence)
```
MIPROv2 will generate variations of the instructions, these are refered to as “instruction candindates”. The optimizer evaluates these instruction candidates to try and find the most effective one for your DSPy Program’s objective.
Demo Examples: DSPy may bundle input->output examples in the final prompt.

Examples show an LLM correct input->output pairs to model in responding to a request. In DSPy these may appear as several terms, including: few-shots or labeled demos or bootstrap demos or demo sets. There are some subtle differences between these sometimes, they will be covered below.

MIPROv2 tries to find the best examples to include (aka bootstrap) with the instructions througth its multi-step optimization process.

A final comprehensive system prompt generated by MIPROv2, to be used in production, includes the most effective instructions and demo examples MIPROv2 can identify.

Training

DSPy Optimizers often use language from traditional machine learning model training. This might be confusing to some, so lets clarify what some training terms mean for MIPROv2.

Training Set: This is usually a large collection of data that has correct input->output pairs. In Machine Learning these would be used to train a model. In MIPROv2 a training set is used to derive Demo Examples mentioned above. “Training” in this case is more like guiding a LLM, there is no real training a model activity in the traditional sense.
Validation Set: This is another collection of data just like the Training Set, that also has correct input and output pairs. It is used to test your DSPy Program at multiple stages in the optimization process. Unlike the Training Set, only the input, often a question for an LLM, is sent to the LLM and the LLM tries to answer. The result is evaluated against the known answer. This is how your DSPy Program can be scored for accuracy.
Metric: a function that compares a response from an LLM to an expected answer from the Validation Set. The metric sets the rules for accuracy, or determining if the LLM was accurate or not.

How MIPROv2 Works: The 5-Step Process

Now lets walk through how MIPROv2 works and we will review all the parameters that influence its behavior and how to tune them. There is a good amount of complexity to the steps, take it one at a time. For your convenience I have also prepared a companion page for this guide that contains all the parameters of the Constructor and the Compiler in tables with descriptions.

Before continuing - do this NOW!

⭐👉 Open this other page in another browser tab MIPROv2 Properties 👈⭐

The page contains tables of the Constructor and Compiler parameters.

🧠 It is recommended you keep it open side by side with this guide!

Steps Summary

Demo Set Selection - Creates labeled and bootstrapped demonstration sets from a training set
Instruction Generation - Generates instruction variations using multiple inputs
Search Space Creation - Pairs instruction candidates with demonstration sets to create solution candidates
Bayesian Optimization - Intelligently tests instruciton and demo set combinations
Final Selection - Returns the highest-scoring program

In a nut shell what MIPROv2 is going to do is create several variations of your instructions and examples for the LLM, then its going to run through a smart evaluation process of those variations to find the most effective one. There are many parameters to control how this is done.

Just do you have a code sense of what this generatlly looks like implemneting MIPROv2 here is a very minimal code sample of using the constructor and compiler. Some minor parts like declaring LLM models, datasets, metrics are left out for sake of focus on the parameters of the constructor and compiler.

# Create the MIPROv2 optimizer
optimizer = dspy.MIPROv2(
    num_candidates=5,           # Generate 5 instruction variations and 5 demo sets
    num_trials=10,              # Run 10 optimization trials
    max_bootstrapped_demos=4,   # Include up to 4 successful examples per demo set
    max_labeled_demos=4,        # Include up to 4 random examples per demo set
    metric=accuracy_metric      # Function to score LLM responses
)

# Compile your program with the optimizer
compiled_program = optimizer.compile(
    student=MyClassifier(),     # Your DSPy program to optimize
    trainset=training_data,     # Data for selecting demo examples
    valset=validation_data,     # Data for evaluating combinations
    teacher=teacher_model       # Optional: higher-quality model for evaluations
)

# Save optimize program for future use
optimized_program.save(f"optimized.json")

On to the details…

Steps Details

1. Demo Set Selection:

The optimizer creates multiple demonstration sets to evaluate along side instructions. These sets are a mix of two types of examples:

Labeled Demos: Up to max_labeled_demos examples are selected from your trainset. These randomly selected examples that do not pass through an LLM evaluation.
Bootstrapped Demos: Up to max_bootstrapped_demos examples are selected from your trainset with the additional factor your DSPy program successfully processes them when sent to an LLM (as verified by the metric).

Labeled demos and bootstrap demos are both taken from the training set, to be presented as examples in a final prompt. Labeled are selected without going through the LLM in an evaluation, where as bootstrap demos are selected rows that an LLM evaluates successfully when using your DSPy program.

This dual selection process is repeated to create multiple sets of demonstrations that will be used in subsequent optimization steps. MIPROv2 ultimately tries to find the best example set to include with instructions.

You can select how many of each labeled and boostrapped demos to include in each set, from 0 to as many as you like. The default is 4 and 4.

Constructor Parameters affecting this step:

num_candidates (number of different demo sets to create, also used for number of instruction variations to generate in step 2)
max_bootstrapped_demos (max number of training examples to select that the student program can successfully process, per demo set)
max_labeled_demos (max number of training examples to select without evaluation, per demo set)
metric (scoring function that determines if an LLM output is “correct”, used for selecting bootstrap demos and in final evaluation)
metric_threshold (Metric threshold for bootstrap demo selection if your metric is a float)

Compiler Parameters affecting this step:

student (your DSPy program being optimized)
trainset (the training dataset you provide for selecting labeled demos and bootstrap demo examples)
teacher (an optional LLM model to run the bootstrap demo evaluations through, usually a higher model)
max_bootstrapped_demos (can be used to override the constructor setting)
max_labeled_demos (can be used to override the constructor setting)

2. Instruction Generation:

MIPROv2 uses information DSPy program, training data set, collected demonstrations, and DSPy “tips”, to generate alternative versions of your DSPy Program’s instructions. These alternatives are evaluated to be more or less effective for your specific task and dataset.

Constructor Parameters affecting this step:

num_candidates (number of instruction variations to generate, as above for number of data sets)

Compiler Parameters affecting this step:

student (your DSPy program being optimized containing baseline instructions)
trainset (the training dataset you provide)
program_aware_proposer (enables inclusion of the student program structure)
data_aware_proposer (enables LLM analysis of trainset characteristics for instruction generation, default uses 10 samples for analysis)
tip_aware_proposer (enables inclusion of randomly generated tips, e.g., “be creative”, “be concise”)
fewshot_aware_proposer (enables inclusion of selected bootstrap demos from step 1 in the instruction generation processs. Not the same as bootstrapping in the final prompt)

As you can see there are several options to add context to the instruction candidate generation process. It is up to you to experiment with which of these to include.

3. Search Space Creation:

The optimizer creates a comprehensive search space by pairing each instruction candidate with each demonstration set (containing labeled and bootstrapped demos). This creates all possible combinations of instructions and demo sets that could be tested.

Constructor Parameters affecting this step:

num_candidates (determines the number of demo sets and instruction candidates)

It is crucial to be aware of the combinatorial effect of setting num_candidates.

The num_candidates parameter determines both the number of instruction variations and demo sets that will be generated, which are then paired to create the full search space. For example, if num_candidates=5, MIPROv2 will generate 5 instruction variations and 5 demo sets, creating a search space of 25 possible combinations (5 x 5 = 25 total unique pairs) to evaluate during optimization. Not all combinations are tested, but these numbers can stil have a signifcant effect on the total runs in the optimization process.

4. Bayesian Optimization:

The optimizer uses an intelligent search strategy to efficiently test combinations from the search space. It builds a predictive model to guess which combinations might work best, then tests the most promising ones in “trials” (each trial evaluates one specific combination of instruction candidates and demonstration sets). After each trial, it updates its predictions based on the results.

Steps to this process:

Initialize a surrogate model (e.g., Gaussian Process) to predict metric scores for untested combinations in the search space.
For each trial, select a specific combination (instruction candidate + demo set) from the search space based on the surrogate model’s predictions.
Evaluate the selected combination on the valset (or a minibatch subset of size minibatch_size) by running the student program on each row in valset and computing the aggregate metric score.
Update the surrogate model with the trial score, refining its predictions for remaining untested combinations in the search space.
When using mini batches, periodically performs a full evaluation on the most promising candidate (based on average minibatch performance).

Constructor Parameters affecting this step:

num_trials (how many optimization trials to run)
metric (how to score individual runs within each trial)
task_model (the LLM model for running the optimization trials, defaults to dspy.settings.lm if set)

Compiler Parameters affecting this step:

valset (the validation dataset used for evaluating combinations from the search space)
num_trials (number of optimization trials to run)
minibatch (use minibatches to evaluate candidates)
minibatch_size (how many valset examples to use per trial)
minibatch_full_eval_steps (how often to run a full evaluation)

Note num_trials is required when not using auto mode. Auto mode uses a fancy formula to set the number of trials to run.

What ever the trial run number is, there is a combinatorial effect to be aware of when combining with the valset count. As each trial will be default run the entire set of valset. If your valset is large then that may not be desirable.

To provide more control over the optimization total run size, there is the minibatch option to limit the number of test runs in a trial. Then minibatch_full_eval_steps then provides control over how frequently to run a full evaluation to more deeply test progress.

Important numbers to consider that add up to total optimization LLM run counts:

num_candidates
num_trials
valset size
minibatch true/false
minibatch_size if using minibatch

As of the newly released 3.0.0 there is nolonger a warning of the size of the optimization run. Be aware how many LLM calls are adding up, especially if using expensive models.

5. Final Selection:

After all trials are complete, the optimizer returns the program that achieved the highest score. When minibatching is used, this is not based on the minibatch scores directly. The final program selected is the one that scored highest during one of the full evaluations.

This is a little out of scope of this guide, but worth the reminder: remember to save the optimization results!

optimized_program.save(f"optimized.json")

Conclusion

MIPROv2 is considered a very powerful pompt optimizer, but it has a lot of potential complexity. Engineers do not need to make use of all the potential variables and tools available from MIPROv2, but hopefully this rundown helps bring clarity to the options available and how to control them.

This was considerable work to investigate the details of MIPROv2, confirm the parameters, and write up in the clearest way I could. If you find any errors or confusing language, please reach out to on socials Linkedin or X or submit a correction to the github issues here.