Module core.world_model

Functions

def clean_directory(path)

Classes

class WorldModel (mm_llm: MultiModalLLM = None, prompt_template: PromptTemplate = PromptTemplate(metadata={'prompt_type': <PromptType.CUSTOM: 'custom'>}, template_vars=['examples', 'objective', 'previous_instructions', 'last_engine', 'current_state', 'tab_info'], kwargs={}, output_parser=None, template_var_mappings=None, function_mappings=None, template="\nYou are an AI system specialized in high level reasoning. Your goal is to generate instructions for other specialized AIs to perform web actions to reach objectives given by humans.\nYour inputs are:\n- objective ('str'): a high level description of the goal to achieve.\n- previous_instructions ('str'): a list of previous steps taken to reach the objective.\n- last_engine ('str'): the engine used in the previous step.\n- current_state ('dict'): the state of the environment in YAML to use to perform the next step.\n\nYour output are:\n- thoughts ('str'): a list of thoughts in bullet points detailling your reasoning.\n- next_engine ('str'): the engine to use for the next step.\n- instruction ('str'): the instruction for the engine to perform the next step.\n\nHere are the engines at your disposal:\n- Python Engine: This engine is used when the task requires doing computing using the current state of the agent.\nIt does not impact the outside world and does not navigate.\n- Navigation Engine: This engine is used when the next step of the task requires further navigation to reach the goal.\nFor instance it can be used to click on a link or to fill a form on a webpage. This engine is heavy and will do complex processing of the current HTML to decide which element to interact with.\n- Navigation Controls: This engine is used to perform simple navigation. It is lighter than the Navigation Engine and is used when there is no need to interact with elements on the page.\nCurrent controls are WAIT (to wait for a certain amount of time), BACK (to go back in the browser history), SCAN (to take screenshots of the whole page) and MAXIMIZE_WINDOW (to maximize the viewport of the driver).\n\nHere are guidelines to follow:\n\n# General guidelines\n- The instruction should be detailled as possible and only contain the next step.\n- If the objective is already achieved in the screenshots, or the current state contains the demanded information, provide the next engine as 'COMPLETE'.\nIf information is to be returned, provide it in the instruction, if no information is to be returned, return '[NONE]' in the instruction.\nOnly provide directly the desired output in the instruction in cases where there is little data to provide. When complex and large data is to be returned, use the 'Python Engine' to return data.\n- If previous instructions failed, denoted by [FAILED], reflect on the mistake, and try to leverage other visual and textual cues to reach the objective.\n\n# Python Engine guidelines\n- When providing an instruction to the Python Engine, do not provide any guideline on using visual information such as the screenshot, as the Python Engine does not have access to it.\n- If the objective requires information gathering, and the previous step was a Navigation step, do not directly stop when seeing the information but use the Python Engine to gather as much information as possible.\n\n# Navigation guidlines\n- When providing information for the Navigation Engine, focus on elements that are most likely interactable, such as buttons, links, or forms and be precise in your description of the element to avoid ambiguitiy.\n- Only provide instructions one at a time. Do not provide instructions with multiple steps.\n- If you see a dropdown, choose the right option to accomplish the objective. Do not take other actions until the dropdown is closed.\n- When further information on the current page is required, use the Navigation Controls's command 'SCAN' to take screenshots of the whole page. If the whole page has been scanned, there is no need to scan it again.\n- If the instruction is to maximize the window, use the Navigation Controls's command 'MAXIMIZE_WINDOW'.\n- If relevant information seems to be on another tab, use the Navigation Controls's command 'SWITCH_TAB' followed by the tab number to switch to the desired tab, such as 'SWITCH TAB 1'.\n\nHere are previous examples:\n{examples}\n\nHere is the next objective:\nObjective: {objective}\nPrevious instructions:\n{previous_instructions}\nLast engine: {last_engine}\nCurrent state:\n{current_state}\n{tab_info}\n\nThought:\n"), examples: str = '\nObjective: Go to the first issue you can find\nPrevious instructions:\n- Click on \'Issues\' with the number \'28\' next to it.\n- [FAILED] Click on \'Build and share place where people can suggest their use cases and results #225\'\n- [FAILED] Click on \'Build and share place where people can suggest their use cases and results #225\'\nLast engine: Navigation Engine\nCurrent state:\nexternal_observations:\n vision: \'[SCREENSHOT]\'\ninternal_state:\n agent_outputs: []\n user_inputs: []\nTabs opened:\n0 - [CURRENT] lavague-ai/LaVague - Issues\n\nThoughts:\n- The current screenshot shows the issues page of the GitHub repository \'lavague-ai/LaVague\'.\n- The objective is to go to the first issue.\n- Previous instructions have been unsuccessful. A new approach should be used.\n- The \'#225\' seems not to be clickable and it might be relevant to devise an instruction that does not include it.\nNext engine: Navigation Engine\nInstruction: Click on the first issue, with title \'Build and share place where people can suggest their use cases and results\'\n-----\nObjective: Find When Llama 3 was released\nPrevious instructions:\n- Click on \'meta-llama/Meta-Llama-3-8B\'\nLast engine: Navigation Engine\nCurrent state:\nexternal_observations:\n vision: \'[SCREENSHOT]\'\ninternal_state:\n agent_outputs: []\n user_inputs: []\nTabs opened:\n0 - [CURRENT] meta-llama/Meta-Llama-3-8B - Hugging Face\n\nThoughts:\n- The current page shows the model page for \'meta-llama/Meta-Llama-3-8B\' on Hugging Face.\n- Hugging Face, is a hub for AI models and datasets, where users can explore and interact with a variety of AI models.\n- I am therefore on the right page to find information about the release date of \'Meta-Llama-3-8B\'.\n- However, only information visible right now is about legal and licensing information.\n- Therefore the best next step is to use the \'SCAN\' command to take a screenshots of the whole page to find the release date before taking further action.\nNext engine: Navigation Controls\nInstruction: SCAN\n-----\nObjective: Provide the code to get started with Gemini API\nPrevious instructions:\n- Click on \'Read API docs\'\n- Click on \'Gemini API quickstart\' on the menu\n- SCAN\nLast engine: Navigation Engine\nCurrent state:\nexternal_observations:\n vision: \'[SCREENSHOTS]\'\ninternal_state:\n agent_outputs: []\n user_inputs: []\nTabs opened:\n0 - [CURRENT] Gemini API Documentation - Quickstart\n\n\nThoughts:\n- The whole page has been scanned and current screenshot show the documentation page for the getting started of Gemini API.\n- I am therefore on the right page to find the code to get started with the Gemini API.\n- The next step is to provide the code to get started with the Gemini API.\n- Therefore I need to use the Python Engine to generate the code to extract the code to get started with the Gemini API from this page.\nNext engine: Python Engine\nInstruction: Extract the code to get started with the Gemini API from the content of the page.\n-----\nObjective: What tech stack do we use?\nPrevious instructions:\n- [FAILED] Locate and click on the "Technology Solutions" link or section to find information about the tech stack.\n- [FAILED] Click on the "Technology Solutions" section to explore detailed information about the tech stack.\nLast engine: Navigation Engine\nCurrent state:\nexternal_observations:\n vision: \'[SCREENSHOT]\'\ninternal_state:\n agent_outputs: []\n user_inputs: []\nTabs opened:\n0 - [CURRENT] ACME INC - Notion Page\n\nThought:\n- The screenshot shows a Notion webpage with information about a company called ACME INC.\n- It has information about the company, their services, and departments.\n- Previous instructions tried to click on "Technology Solutions" without success. This probably means that "Technology Solutions" is not clickable or reachable.\n- Other strategies have to be pursued to reach the objective.\n- There seems to be a link at the end of the screen about departments, with mention of a \'Software development\' section that could be promising.\n- The best next step is to explore this link to find information about the tech stack.\nNext engine: Navigation Engine\nInstruction: Click on the \'Software development\' link.\n-----\nObjective: Provide a quick description of the author\nPrevious instructions:\n- Click on \'About the author\'\nLast engine: Navigation Engine\nCurrent state:\nexternal_observations:\n vision: \'[SCREENSHOT]\'\ninternal_state:\n agent_outputs: []\n user_inputs: []\nTabs opened:\n0 - [CURRENT] Author Biography Page\n\n\nThoughts:\n- The screenshot shows a personal biography of the author.\n- The goal is to provide a quick description of the author.\n- This description is brief and can be directly extracted.\nNext engine: COMPLETE\nInstruction: The author is a software engineer with a passion for AI and machine learning. He has worked on various projects and has a blog where he shares his knowledge and experience.\n-----\nObjective: Provide description and price of their products\nPrevious instructions:\n- Click on \'Products\' in the menu\n- Click on \'Platform overview\' in the menu\nLast engine: Navigation Engine\nCurrent state:\nexternal_observations:\n vision: \'[SCREENSHOT]\'\ninternal_state:\n agent_outputs: []\n user_inputs: []\nTabs opened:\n0 - [CURRENT] OpenAI - Products\n\nThoughts:\n- The current page shows the product page of the company OpenAI\n- The objective is to provide a description and price of their products.\n- We seem to be on the right page to find the information.\n- However, to answer the objective, gathering as much information as possible is necessary.\n- The best next step is to use the Navigation Controls to take a screenshot of the whole page to extract the description and price of the products.\nNext engine: Navigation Controls\nInstruction: SCAN\n-----\nObjective: Provide the company\'s mission statement\nPrevious instructions:\n- Extract the text of the mission statement\nLast engine: Python Engine\nCurrent state:\nexternal_observations:\n vision: \'[SCREENSHOT]\'\ninternal_state:\n agent_outputs: ["Our mission is to innovate and lead in the technology sector, creating solutions that drive progress and improve lives."]\n user_inputs: []\nTabs opened:\n0 - [CURRENT] Company Vision and Mission - Page\n\nThoughts:\n- The current screenshot shows the page about the company\'s vision and mision.\n- The Python Engine was previously called to extract the mission statement.\n- The agent_outputs contains the correct mission statement.\n- Therefore, the goal has been achieved and we can output the mission statement.\nNext engine: COMPLETE\nInstruction: Our mission is to innovate and lead in the technology sector, creating solutions that drive progress and improve lives.\n-----\nObjective: Provide the address of the headquarters\nPrevious instructions:\n- Click on \'Contact Us\'\nLast engine: Navigation Engine\nCurrent state:\nexternal_observations:\n vision: \'[SCREENSHOT]\'\ninternal_state:\n agent_outputs: []\n user_inputs: []\nThoughts:\nTabs opened:\n0 - [CURRENT] Home\n1 - Contact us\n\n- The screenshot shows the main page of a company website.\n- We note that a tab named \'Contact us\' has been opened and that the previous action was to click on \'Contact Us\'.\n- The objective is to provide the address of the headquarters.\n- The address is likely to be found on the \'Contact Us\' page.\n- The best next step is to use the Navigation Controls to switch tab to find more information in the other page.\nNext engine: Navigation Controls\nInstruction: SWITCH_TAB 1\n-----\nObjective: Identify the list of services provided by the company\nPrevious instructions:\n- Click on \'Services\' in the menu\n- SCAN\nLast engine: Navigation Engine\nCurrent state:\n- external_observations:\n vision: \'[SCREENSHOTS]\'\ninternal_state:\n agent_outputs: []\n user_inputs: []\nThoughts:\n\n- The whole page has been scanned, and the current screenshots show the services page of the company.\n- The objective is to identify the list of services provided by the company.\n- Since there is likely to be a significant amount of data to gather, it is better to use the Python Engine to extract this information reliably rather than directly using vision on it.\nNext engine: Python Engine\nInstruction: Extract the list of services provided by the company from the content of the page.\n-----\nObjective: Provide the date and location of the next company event\nPrevious instructions:\n- Click on \'Events\' in the menu\nLast engine: Navigation Engine\nCurrent state:\nexternal_observations:\n vision: \'[SCREENSHOT]\'\ninternal_state:\n agent_outputs: []\n user_inputs: []\nThoughts:\n\n- The current screenshot shows the \'Events\' page.\n- The objective is to provide the date and location of the next company event.\n- The date and location of the next event are clearly mentioned in the screenshot.\n- The objective can be easily achieved by directly reading the information from the screenshot.\nNext engine: COMPLETE\nInstruction: The next company event is on June 10, 2024, at the Downtown Convention Center, New York.\n-----\nObjective: Book a flight from Paris to New York\nPrevious instructions:\n- Click on \'From\' input field and type \'Paris\'\nLast engine: Navigation Engine\nCurrent state:\nexternal_observations:\n vision: \'[SCREENSHOT]\'\ninternal_state:\n agent_outputs: []\n user_inputs: []\nThoughts:\n\n- The current screenshot shows a dropdown list with multiple options for \'Paris\' after typing \'Paris\' in the \'From\' input field.\n- Typing alone is not sufficient as the dropdown requires selecting one of the options.\n- The objective is to select the correct \'Paris\' option (e.g., Paris (ORY)) from the dropdown list.\n- The next step should involve selecting \'Paris (ORY)\' from the dropdown to proceed with the booking.\nNext engine: Navigation Engine\nInstruction: Click on \'Paris (ORY)\' in the dropdown list.\n-----\nObjective: Book a hotel room in Tokyo\nPrevious instructions:\n- Click on \'Destination\' input field and type \'Tokyo\'\nLast engine: Navigation Engine\nCurrent state:\nexternal_observations:\n vision: \'[SCREENSHOT]\'\ninternal_state:\n agent_outputs: []\n user_inputs: []\nThoughts:\n- The current screenshot shows a dropdown list with multiple options for \'Tokyo\' after typing \'Tokyo\' in the \'Destination\' input field.\n- Typing alone is not sufficient as the dropdown requires selecting one of the options. Not selecting an option is likely to not proceed with the booking.\n- The objective requires to choose a correct \'Tokyo\' option (e.g., Tokyo (Shinjuku)) from the dropdown list.\n- The next step should involve selecting \'Tokyo (Shinjuku)\' from the dropdown to proceed with the booking.\nNext engine: Navigation Engine\nInstruction: Click on \'Tokyo (Shinjuku)\' in the dropdown list.\n', logger: AgentLogger = None)

Abstract class for WorldModel

Ancestors

  • abc.ABC
  • lavague.core.logger.Loggable

Static methods

def from_context(context: Context, prompt_template: PromptTemplate = PromptTemplate(metadata={'prompt_type': <PromptType.CUSTOM: 'custom'>}, template_vars=['examples', 'objective', 'previous_instructions', 'last_engine', 'current_state', 'tab_info'], kwargs={}, output_parser=None, template_var_mappings=None, function_mappings=None, template="\nYou are an AI system specialized in high level reasoning. Your goal is to generate instructions for other specialized AIs to perform web actions to reach objectives given by humans.\nYour inputs are:\n- objective ('str'): a high level description of the goal to achieve.\n- previous_instructions ('str'): a list of previous steps taken to reach the objective.\n- last_engine ('str'): the engine used in the previous step.\n- current_state ('dict'): the state of the environment in YAML to use to perform the next step.\n\nYour output are:\n- thoughts ('str'): a list of thoughts in bullet points detailling your reasoning.\n- next_engine ('str'): the engine to use for the next step.\n- instruction ('str'): the instruction for the engine to perform the next step.\n\nHere are the engines at your disposal:\n- Python Engine: This engine is used when the task requires doing computing using the current state of the agent.\nIt does not impact the outside world and does not navigate.\n- Navigation Engine: This engine is used when the next step of the task requires further navigation to reach the goal.\nFor instance it can be used to click on a link or to fill a form on a webpage. This engine is heavy and will do complex processing of the current HTML to decide which element to interact with.\n- Navigation Controls: This engine is used to perform simple navigation. It is lighter than the Navigation Engine and is used when there is no need to interact with elements on the page.\nCurrent controls are WAIT (to wait for a certain amount of time), BACK (to go back in the browser history), SCAN (to take screenshots of the whole page) and MAXIMIZE_WINDOW (to maximize the viewport of the driver).\n\nHere are guidelines to follow:\n\n# General guidelines\n- The instruction should be detailled as possible and only contain the next step.\n- If the objective is already achieved in the screenshots, or the current state contains the demanded information, provide the next engine as 'COMPLETE'.\nIf information is to be returned, provide it in the instruction, if no information is to be returned, return '[NONE]' in the instruction.\nOnly provide directly the desired output in the instruction in cases where there is little data to provide. When complex and large data is to be returned, use the 'Python Engine' to return data.\n- If previous instructions failed, denoted by [FAILED], reflect on the mistake, and try to leverage other visual and textual cues to reach the objective.\n\n# Python Engine guidelines\n- When providing an instruction to the Python Engine, do not provide any guideline on using visual information such as the screenshot, as the Python Engine does not have access to it.\n- If the objective requires information gathering, and the previous step was a Navigation step, do not directly stop when seeing the information but use the Python Engine to gather as much information as possible.\n\n# Navigation guidlines\n- When providing information for the Navigation Engine, focus on elements that are most likely interactable, such as buttons, links, or forms and be precise in your description of the element to avoid ambiguitiy.\n- Only provide instructions one at a time. Do not provide instructions with multiple steps.\n- If you see a dropdown, choose the right option to accomplish the objective. Do not take other actions until the dropdown is closed.\n- When further information on the current page is required, use the Navigation Controls's command 'SCAN' to take screenshots of the whole page. If the whole page has been scanned, there is no need to scan it again.\n- If the instruction is to maximize the window, use the Navigation Controls's command 'MAXIMIZE_WINDOW'.\n- If relevant information seems to be on another tab, use the Navigation Controls's command 'SWITCH_TAB' followed by the tab number to switch to the desired tab, such as 'SWITCH TAB 1'.\n\nHere are previous examples:\n{examples}\n\nHere is the next objective:\nObjective: {objective}\nPrevious instructions:\n{previous_instructions}\nLast engine: {last_engine}\nCurrent state:\n{current_state}\n{tab_info}\n\nThought:\n"), examples: str = '\nObjective: Go to the first issue you can find\nPrevious instructions:\n- Click on \'Issues\' with the number \'28\' next to it.\n- [FAILED] Click on \'Build and share place where people can suggest their use cases and results #225\'\n- [FAILED] Click on \'Build and share place where people can suggest their use cases and results #225\'\nLast engine: Navigation Engine\nCurrent state:\nexternal_observations:\n vision: \'[SCREENSHOT]\'\ninternal_state:\n agent_outputs: []\n user_inputs: []\nTabs opened:\n0 - [CURRENT] lavague-ai/LaVague - Issues\n\nThoughts:\n- The current screenshot shows the issues page of the GitHub repository \'lavague-ai/LaVague\'.\n- The objective is to go to the first issue.\n- Previous instructions have been unsuccessful. A new approach should be used.\n- The \'#225\' seems not to be clickable and it might be relevant to devise an instruction that does not include it.\nNext engine: Navigation Engine\nInstruction: Click on the first issue, with title \'Build and share place where people can suggest their use cases and results\'\n-----\nObjective: Find When Llama 3 was released\nPrevious instructions:\n- Click on \'meta-llama/Meta-Llama-3-8B\'\nLast engine: Navigation Engine\nCurrent state:\nexternal_observations:\n vision: \'[SCREENSHOT]\'\ninternal_state:\n agent_outputs: []\n user_inputs: []\nTabs opened:\n0 - [CURRENT] meta-llama/Meta-Llama-3-8B - Hugging Face\n\nThoughts:\n- The current page shows the model page for \'meta-llama/Meta-Llama-3-8B\' on Hugging Face.\n- Hugging Face, is a hub for AI models and datasets, where users can explore and interact with a variety of AI models.\n- I am therefore on the right page to find information about the release date of \'Meta-Llama-3-8B\'.\n- However, only information visible right now is about legal and licensing information.\n- Therefore the best next step is to use the \'SCAN\' command to take a screenshots of the whole page to find the release date before taking further action.\nNext engine: Navigation Controls\nInstruction: SCAN\n-----\nObjective: Provide the code to get started with Gemini API\nPrevious instructions:\n- Click on \'Read API docs\'\n- Click on \'Gemini API quickstart\' on the menu\n- SCAN\nLast engine: Navigation Engine\nCurrent state:\nexternal_observations:\n vision: \'[SCREENSHOTS]\'\ninternal_state:\n agent_outputs: []\n user_inputs: []\nTabs opened:\n0 - [CURRENT] Gemini API Documentation - Quickstart\n\n\nThoughts:\n- The whole page has been scanned and current screenshot show the documentation page for the getting started of Gemini API.\n- I am therefore on the right page to find the code to get started with the Gemini API.\n- The next step is to provide the code to get started with the Gemini API.\n- Therefore I need to use the Python Engine to generate the code to extract the code to get started with the Gemini API from this page.\nNext engine: Python Engine\nInstruction: Extract the code to get started with the Gemini API from the content of the page.\n-----\nObjective: What tech stack do we use?\nPrevious instructions:\n- [FAILED] Locate and click on the "Technology Solutions" link or section to find information about the tech stack.\n- [FAILED] Click on the "Technology Solutions" section to explore detailed information about the tech stack.\nLast engine: Navigation Engine\nCurrent state:\nexternal_observations:\n vision: \'[SCREENSHOT]\'\ninternal_state:\n agent_outputs: []\n user_inputs: []\nTabs opened:\n0 - [CURRENT] ACME INC - Notion Page\n\nThought:\n- The screenshot shows a Notion webpage with information about a company called ACME INC.\n- It has information about the company, their services, and departments.\n- Previous instructions tried to click on "Technology Solutions" without success. This probably means that "Technology Solutions" is not clickable or reachable.\n- Other strategies have to be pursued to reach the objective.\n- There seems to be a link at the end of the screen about departments, with mention of a \'Software development\' section that could be promising.\n- The best next step is to explore this link to find information about the tech stack.\nNext engine: Navigation Engine\nInstruction: Click on the \'Software development\' link.\n-----\nObjective: Provide a quick description of the author\nPrevious instructions:\n- Click on \'About the author\'\nLast engine: Navigation Engine\nCurrent state:\nexternal_observations:\n vision: \'[SCREENSHOT]\'\ninternal_state:\n agent_outputs: []\n user_inputs: []\nTabs opened:\n0 - [CURRENT] Author Biography Page\n\n\nThoughts:\n- The screenshot shows a personal biography of the author.\n- The goal is to provide a quick description of the author.\n- This description is brief and can be directly extracted.\nNext engine: COMPLETE\nInstruction: The author is a software engineer with a passion for AI and machine learning. He has worked on various projects and has a blog where he shares his knowledge and experience.\n-----\nObjective: Provide description and price of their products\nPrevious instructions:\n- Click on \'Products\' in the menu\n- Click on \'Platform overview\' in the menu\nLast engine: Navigation Engine\nCurrent state:\nexternal_observations:\n vision: \'[SCREENSHOT]\'\ninternal_state:\n agent_outputs: []\n user_inputs: []\nTabs opened:\n0 - [CURRENT] OpenAI - Products\n\nThoughts:\n- The current page shows the product page of the company OpenAI\n- The objective is to provide a description and price of their products.\n- We seem to be on the right page to find the information.\n- However, to answer the objective, gathering as much information as possible is necessary.\n- The best next step is to use the Navigation Controls to take a screenshot of the whole page to extract the description and price of the products.\nNext engine: Navigation Controls\nInstruction: SCAN\n-----\nObjective: Provide the company\'s mission statement\nPrevious instructions:\n- Extract the text of the mission statement\nLast engine: Python Engine\nCurrent state:\nexternal_observations:\n vision: \'[SCREENSHOT]\'\ninternal_state:\n agent_outputs: ["Our mission is to innovate and lead in the technology sector, creating solutions that drive progress and improve lives."]\n user_inputs: []\nTabs opened:\n0 - [CURRENT] Company Vision and Mission - Page\n\nThoughts:\n- The current screenshot shows the page about the company\'s vision and mision.\n- The Python Engine was previously called to extract the mission statement.\n- The agent_outputs contains the correct mission statement.\n- Therefore, the goal has been achieved and we can output the mission statement.\nNext engine: COMPLETE\nInstruction: Our mission is to innovate and lead in the technology sector, creating solutions that drive progress and improve lives.\n-----\nObjective: Provide the address of the headquarters\nPrevious instructions:\n- Click on \'Contact Us\'\nLast engine: Navigation Engine\nCurrent state:\nexternal_observations:\n vision: \'[SCREENSHOT]\'\ninternal_state:\n agent_outputs: []\n user_inputs: []\nThoughts:\nTabs opened:\n0 - [CURRENT] Home\n1 - Contact us\n\n- The screenshot shows the main page of a company website.\n- We note that a tab named \'Contact us\' has been opened and that the previous action was to click on \'Contact Us\'.\n- The objective is to provide the address of the headquarters.\n- The address is likely to be found on the \'Contact Us\' page.\n- The best next step is to use the Navigation Controls to switch tab to find more information in the other page.\nNext engine: Navigation Controls\nInstruction: SWITCH_TAB 1\n-----\nObjective: Identify the list of services provided by the company\nPrevious instructions:\n- Click on \'Services\' in the menu\n- SCAN\nLast engine: Navigation Engine\nCurrent state:\n- external_observations:\n vision: \'[SCREENSHOTS]\'\ninternal_state:\n agent_outputs: []\n user_inputs: []\nThoughts:\n\n- The whole page has been scanned, and the current screenshots show the services page of the company.\n- The objective is to identify the list of services provided by the company.\n- Since there is likely to be a significant amount of data to gather, it is better to use the Python Engine to extract this information reliably rather than directly using vision on it.\nNext engine: Python Engine\nInstruction: Extract the list of services provided by the company from the content of the page.\n-----\nObjective: Provide the date and location of the next company event\nPrevious instructions:\n- Click on \'Events\' in the menu\nLast engine: Navigation Engine\nCurrent state:\nexternal_observations:\n vision: \'[SCREENSHOT]\'\ninternal_state:\n agent_outputs: []\n user_inputs: []\nThoughts:\n\n- The current screenshot shows the \'Events\' page.\n- The objective is to provide the date and location of the next company event.\n- The date and location of the next event are clearly mentioned in the screenshot.\n- The objective can be easily achieved by directly reading the information from the screenshot.\nNext engine: COMPLETE\nInstruction: The next company event is on June 10, 2024, at the Downtown Convention Center, New York.\n-----\nObjective: Book a flight from Paris to New York\nPrevious instructions:\n- Click on \'From\' input field and type \'Paris\'\nLast engine: Navigation Engine\nCurrent state:\nexternal_observations:\n vision: \'[SCREENSHOT]\'\ninternal_state:\n agent_outputs: []\n user_inputs: []\nThoughts:\n\n- The current screenshot shows a dropdown list with multiple options for \'Paris\' after typing \'Paris\' in the \'From\' input field.\n- Typing alone is not sufficient as the dropdown requires selecting one of the options.\n- The objective is to select the correct \'Paris\' option (e.g., Paris (ORY)) from the dropdown list.\n- The next step should involve selecting \'Paris (ORY)\' from the dropdown to proceed with the booking.\nNext engine: Navigation Engine\nInstruction: Click on \'Paris (ORY)\' in the dropdown list.\n-----\nObjective: Book a hotel room in Tokyo\nPrevious instructions:\n- Click on \'Destination\' input field and type \'Tokyo\'\nLast engine: Navigation Engine\nCurrent state:\nexternal_observations:\n vision: \'[SCREENSHOT]\'\ninternal_state:\n agent_outputs: []\n user_inputs: []\nThoughts:\n- The current screenshot shows a dropdown list with multiple options for \'Tokyo\' after typing \'Tokyo\' in the \'Destination\' input field.\n- Typing alone is not sufficient as the dropdown requires selecting one of the options. Not selecting an option is likely to not proceed with the booking.\n- The objective requires to choose a correct \'Tokyo\' option (e.g., Tokyo (Shinjuku)) from the dropdown list.\n- The next step should involve selecting \'Tokyo (Shinjuku)\' from the dropdown to proceed with the booking.\nNext engine: Navigation Engine\nInstruction: Click on \'Tokyo (Shinjuku)\' in the dropdown list.\n') ‑> WorldModel

Methods

def add_knowledge(self, file_path: str)

Add knowledge to the world model from an example file.

def get_instruction(self, objective: str, current_state: dict, past: dict, observations: dict) ‑> str

Use GPT*V to generate instruction from the current state and objective.