Automated job application
Automatic form filling with LaVague
This notebook shows how one can combine Hugging Face Idefics2-8b model with LaVague to create an agent that can take a CV, perform OCR and apply to jobs automatically by filling a candidate form.
You can see it in action below:
You can run this example directly with a CLI script available here
This notebook will go through this example step by step to get into more details.
Pre-requisites
Note: We use OpenAI's models, for the embedding, LLM and Vision model. You will need to set the OPENAI_API_KEY variable in your local environment with a valid API key for this example to work.
If you don't have an OpenAI API key, please get one here.
You can also get a Hugging Face token here
Installation
For this example, we will use Hugging Face Inference API, OpenAI API, and LaVague.
pip install text-generation lavague
We will need to set our OpenAI Key. If you are running this as a Colab, you can provide it through Colab secrets (see the key icon on the left-hand side of the Colab notebook) named 'OPENAI_API_KEY' and then convert it to an environment variable with the same name.
import os
# Check if running in Google Colab
try:
from google.colab import userdata
IN_COLAB = True
except ImportError:
IN_COLAB = False
if IN_COLAB:
fetch_secret = userdata.get
else:
fetch_secret = os.getenv
import yaml
from text_generation import Client
os.environ["OPENAI_API_KEY"] = fetch_secret("OPENAI_API_KEY")
OCR with Hugging Face Idefics2-8b
We will use here Hugging Face Inference API to leverage the model Idefics2-8b, an open-source Multimodal LLM, in order to extract the description of the candidate from her resume.
from text_generation import Client
import os
BASE_URL = "https://api-inference.huggingface.co/models/"
BASE_MODEL= "HuggingFaceM4/idefics2-8b"
SYSTEM_PROMPT = "System: The following is a conversation between Idefics2, a highly knowledgeable and intelligent visual AI assistant created by Hugging Face, referred to as Assistant, and a human user called User. In the following interactions, User and Assistant will converse in natural language, and Assistant will do its best to answer Userβs questions. Assistant has the ability to perceive images and reason about them, but it cannot generate images. Assistant was built to be respectful, polite and inclusive. It knows a lot, and always tells the truth. When prompted with an image, it does not make up facts.<end_of_utterance>\nAssistant: Hello, I'm Idefics2, Huggingface's latest multimodal assistant. How can I help you?<end_of_utterance>\n"
class HuggingFaceMMLLM:
def __init__(self, hf_api_key=None, model=BASE_MODEL, base_url = BASE_URL):
if hf_api_key is None:
hf_api_key = fetch_secret("HF_TOKEN")
if hf_api_key is None:
raise ValueError("HF_TOKEN is not set")
api_url = base_url + model
self.client = Client(
base_url=api_url,
headers={"x-use-cache": "0", "Authorization": f"Bearer {hf_api_key}"},
)
def upload_image(self, file_path, cloudinary_config=None):
import cloudinary
import cloudinary.uploader
if cloudinary_config is None:
cloudinary_config = {
"cloud_name": fetch_secret("CLOUDINARY_CLOUD_NAME"),
"api_key": fetch_secret("CLOUDINARY_API_KEY"),
"api_secret": fetch_secret("CLOUDINARY_API_SECRET"),
}
if None in cloudinary_config.values():
raise ValueError("CLOUDINARY_CLOUD_NAME, CLOUDINARY_API_KEY, or CLOUDINARY_API_SECRET is not set")
cloudinary.config(**cloudinary_config)
img_url = cloudinary.uploader.upload(file_path)["url"]
return img_url
def complete(self, query, file_path=None, url=None):
if file_path is None and url is None:
raise ValueError("Either file_path or url must be provided")
generation_args = {
"max_new_tokens": 512,
"repetition_penalty": 1.1,
"do_sample": False,
}
if file_path:
img_url = self.upload_image(file_path)
else:
img_url = url
prompt_with_image = SYSTEM_PROMPT + f"User:![]({img_url}) {query}<end_of_utterance>\nAssistant:"
generated_text = self.client.generate(prompt=prompt_with_image, **generation_args).generated_text
return generated_text
We can see the resume used below:
from IPython.display import display, Image
url = "https://d25zcttzf44i59.cloudfront.net/minimalist-resume-template.png"
display(Image(url))
We can now extract the person's details using Idefics2-8b:
hf_mm_llm = HuggingFaceMMLLM()
query = "Extract name, email, phone number, current company, a summary of experience, and a summary of education from this cv. Provide your output in YAML format."
user_data = hf_mm_llm.complete(query=query, url=url)
print(user_data)
Our class defined above takes URL as inputs. This seems to be because Hugging Face API only accepts images through URL prompt_with_image = SYSTEM_PROMPT + f"User:![]({img_url}) {query}<end_of_utterance>\nAssistant:"
.
We provide a workaround by using Cloudinary to upload an image first and then get the URL of the uploaded image to Hugging Face API.
You will need to look at https://cloudinary.com/documentation/image_upload_api_reference for more information to get your credentials.
AI Web Agent building with LaVague
Here we will define and run an agent using LaVague.
LaVague has two main components:
- A World Model: Module specialized in reasoning, which takes as input the userβs objective ("Fill this form") and the screenshot of the web driver, and produces instructions ("Click on Apply button") to our action engine. It is powered by OpenAI GPT-4o.
- An Action Engine: Module specialized in turning instructions from the World Model into Selenium code. It is powered by Llama Index to perform RAG on the HTML.
For this demo, we will use a local embedding model (bge-small) and GPT-3.5 for our Action Engine.
pip install llama-index-embeddings-huggingface
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
embedding = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
We can now run the model using the previously extracted data.
from lavague.drivers.selenium import SeleniumDriver
from lavague.core import ActionEngine, WorldModel
from lavague.core.agents import WebAgent
selenium_driver = SeleniumDriver()
action_engine = ActionEngine(selenium_driver, embedding=embedding)
world_model = WorldModel()
agent = WebAgent(world_model, action_engine)
url = "https://form.jotform.com/241472287797370"
objective = "Fill out this form. Do not provide a cover letter"
agent.get(url)
agent.run(objective, user_data=user_data, display=True)