Drivers
Drivers are interfaces for interacting with web browsers autonomously.
Our Driver modules are used to perform actions on web browsers and to get information from our current web page.
We have three Driver options:
- ✅ SeleniumDriver: the Action Engine will generate and execute code using Selenium. We use the SeleniumDriver by default and it has the most complete feature set.
- 🎭 PlaywrightDriver: the Action Engine will generate and execute code using Playwright
- 🖥️ Chrome extension: when used along the DriverServer to generate code, the Chrome extension will execute JavaScript
Driver feature support
Not all agent features are supported by all drivers. Selenium is our preferred option since it currently supports the widest range of features. We welcome community contributions to help us increase support coverage for different drivers.
Feature | Selenium | Playwright | Chrome Extension |
---|---|---|---|
Headless agents | ✅ | ⏳ | N/A |
Handle iframes | ✅ | ✅ | ❌ |
Open several tabs | ✅ | ⏳ | ✅ |
Highlight elements | ✅ | ✅ | ✅ |
Remote driver (Browserbase) | ✅ | ❌ | ❌ |
✅ supported
⏳ coming soon
❌ not supported
Selenium Driver
The Selenium Driver is installed by default when you install lavague
.
You then need to initialize the Driver and pass it to your Action Engine with the following code:
from lavague.drivers.selenium import SeleniumDriver
driver = SeleniumDriver()
action_engine = ActionEngine(selenium_driver)
You can then carry on using LaVague to perform tasks with our Web Agents.
from lavague.core import WorldModel, ActionEngine
from lavague.core.agents import WebAgent
world_model = WorldModel()
agent = WebAgent(WorldModel(), action_engine)
agent.get("https://huggingface.co/docs")
result = agent.run("Go on the quicktour of PEFT")
Playwright driver
If you prefer to use our Playwright driver, you will first need to install the Playwright driver package:
pip install lavague.drivers.playwright
Then you can initialize your PlaywrightDriver and pass it to the Action Engine with the following code:
from lavague.drivers.playwright import PlaywrightDriver
playwright_driver = PlaywrightDriver()
action_engine = ActionEngine(playwright_driver)
You can then use LaVague as usual (see the final example of the Selenium Driver section).
Playwright Driver limitations
The Playwright Driver is not compatible with:
- running LaVague in Google Colabs/notebooks
- with the Gradio demo launched via the `agent.demo()` method
This is due to compatibility issues with the Playwright async API.
If you want us to work on implementing a fix by supporting the Playwright sync API, please open a feature request on GitHub so we can gauge interest.
Optional arguments
You can see all optional driver options here:
Optional driver options
Parameter | Description |
---|---|
url | A string argument which can be used to pass a URL directly to the driver, instead of providing it via the Agent |
get_sync_playwright_page (playwright only) | This function can be used to overwrite the init function of Playwright driver to implement a custom init function, accepting custom driver options, for example |
get_selenium_driver (Selenium only) | This function can be used to overwrite the init function of Selenium driver to implement a custom init function, accepting custom driver options, for example |
headless | Boolean value to set headless or non-headless mode |
width | Integer value for width of the driver's browser window in pixels |
height | Integer value for height of the driver's browser window in pixels |
user_data_dir | Path to your Chrome profile directory. If left empty, Chrome starts a fresh session every time. If provided, Chrome starts with your profile's settings and data, this can help avoid bot protections |
Plugging in an existing browser session
By default, the driver starts with a blank user session. But you can change these settings with the following arguments:
Headless vs non-headless mode
By default, the drivers will start with a headless session. This means, it will not open a browser on your machine where you can watch in real-time the actions the driver is performing. This can be useful when running LaVague in an environment where you don't have access to the browser.
You can change to non-headless mode by initializing your driver with the headless=False
argument:
driver = SeleniumDriver(headless=False)
When using
headless
mode, you can activate adisplay
mode to display real-time updated screenshots of the agent's progress:agent.run
:agent.run("Print out the name of this week's top trending model", display=True)
. This can be useful when testing LaVague in an environment likeGoogle Colab
, for example.
Plugging in an existing browser session
By using the Driver's user_data_dir
argument, you can leverage your default browser settings, cookies, etc. This can remove the need to log in on sites you are already logged in with, which can avoid running into issues with bot protections in log in interactions.
To use your existing Chrome profile, you need to locate the profile path on your operating system. Here are the default locations for Windows, Linux, and OSX:
- Windows:
C:\Users\<YourUsername>\AppData\Local\Google\Chrome\User Data
- Linux:
/home/<YourUsername>/.config/google-chrome
- OSX:
/Users/<YourUsername>/Library/Application Support/Google/Chrome
You can then pass this path via the user_data_dir
argument.
from lavague.drivers.selenium import SeleniumDriver
driver = SeleniumDriver(headless=False, user_data_dir="C:/Users/YourUsername/AppData/Local/Google/Chrome/User Data")
Using a remote driver (Browserbase)
The SeleniumDriver
supports running remote drivers using Browserbase.
- Create an account on Browserbase
- Get your
BROWSERBASE_API_KEY
andBROWSERBASE_PROJECT_ID
. You can optionally set them as environment variables. - Create a remote driver
from lavague.drivers.selenium import SeleniumDriver
from lavague.drivers.selenium import BrowserbaseRemoteConnection
browserbase_connection = BrowserbaseRemoteConnection('http://connect.browserbase.com/webdriver', api_key = "your_key", project_id="your_project_id")
- Create and run an agent
with SeleniumDriver(remote_connection=browserbase_connection) as selenium_driver:
world_model = WorldModel()
action_engine = ActionEngine(selenium_driver)
agent = WebAgent(world_model, action_engine)
agent.get("https://wikipedia.org")
agent.run("Search for AI")