Action Engine
What is the Action Engine?
The Action Engine module is responsible for transforming natural language instructions into Actions and executing them.
The Action Engine module contains three sub-engines:
- đ Navigation Engine: Generates and executes Selenium code to perform an action on a web page
- đ Python Engine: Generates and executes code for tasks that do not involve navigating or interacting with a web page, such as extracting information
- đšī¸ Navigation Control: Performs frequently required navigation tasks without needing to make any extra LLM calls. So far we cover: scroll up, scroll down & wait
Getting started with the Action Engine
In our agentic workflow, the agent first gets the next instruction and the name of the next sub-engine to be used from the World Model.
This information is then sent to the Action Engine through the dispatch_instruction
method.
We can test out this method directly with the Action Engine:
from lavague.drivers.selenium import SeleniumDriver
from lavague.core import ActionEngine
selenium_driver = SeleniumDriver(headless=False, url="https://huggingface.co/")
action_engine = ActionEngine(
driver=selenium_driver,
)
# Dispatch an instruction to the Navigation Engine
engine_name = "Navigation Engine"
instruction = "Click on the Models button in the top menu"
# Execute the instruction and get the output if applicable
output = action_engine.dispatch_instruction(engine_name, instruction)
The dispatch_instruction
method will call the execute_instruction
method for the relevant sub-engine, which will:
In the case of the Navigation and Python engines:
- Performs RAG - it is at this point we use the Action Engine's
embedding
model - Queries the
LLM
with ourprompt template
to generate the code needed to perform our action - Runs the
extractor
or cleaning function to extract code only from our LLM response
And in all cases then:
- Executes the action using the Action Engine's
driver
- Logs information about the action process with the Engine's
logger
The dispatch_instruction
method returns an ActionResult
object, which contains:
- instruction: the instruction an action was generated for
- code: the generated code for the action
- success: whether the code was executed successfully or not
- output: the output of executing the action where relevant, such as the result of a knowledge retrieval action using the Python Engine
Note, the Navigation Control skips the RAG and LLM query steps by using pre-defined code for common navigation tasks
Customizing an Action Engine's
Action Engine from Context
You can customize the LLM
and embedding model
used by the Action Engine by instantiating it with the ActionEngine.from_context
method and passing it a Context
.
Details on how to do this are provided in our customization guide.
Action Engine optional parameters
There are also numerous optional arguments that can be passed to the Action Engine, alongside the required driver
argument, which you can view here:
Optional arguments
Parameter | Description |
---|---|
navigation_engine | The Navigation Engine to be used |
python_engine | The Python Engine to be used |
navigation_control | The Navigation Control to be used |
llm | The LLM to be used - by default this is OpenAI's gpt-4o . You can pass this with any llama_index.llms LLM object. |
embedding | The embedding model to be used - by default this is OpenAI's text-embedding-3-large . You can pass this any llama_index.embeddings embedding model object. |
retriever | The Retriever to be used by the Navigation Engine for RAG - by default we use the OpsmSplitRetriever. |
prompt_template | The prompt template to be used by the Navigation Engine to query the LLM. You can view our default Navigation Engine prompt template here. |
extractor | The cleaning function to run on the LLM response before executing the generated code. You can view our default extractor here. |
time_between_actions | A float value for the time in seconds to wait between actions - this can be useful where you want to enforce a delay between actions to allow elements more time to load - by default, this is 1.5 seconds |
n_attempts | The number of attempts the Navigation Engine should take to successfully perform an action - by default, this is 5 |
logger | The AgentLogger instance used to log information about the Action Engine |
More info
For guidance on using the Agent Logger module to get more information about our Action Engine, such as viewing the code generated, see our Agent Logger guide
For more details about the Navigation Engine, see our Navigation Engine module guide.
Display mode
A final thing to note is the Action Engine's set_display
method.
This method is used by the Web Agent
when its run
method is passed a display=True
boolean value:
agent.run(objective, display=True)
When display
is set to True, the Action Engine will display images updated in real-time as actions are performed on our browser. This can be useful when we are using the driver in headless mode and still want to visually see the impact of our actions.
We can also use this method directly on the Action Engine with the following code:
action_engine.set_display(True)