Module core.retrievers

Functions

def filter_for_xpathed_nodes(nodes: List)
def get_default_retriever(driver: BaseDriver, embedding: Optional[BaseEmbedding] = None) ‑> BaseHtmlRetriever
def get_nodes_text(nodes: List[NodeWithScore]) ‑> List[str]
def get_trivial_retriever(driver: BaseDriver, embedding: Optional[BaseEmbedding] = None) ‑> BaseHtmlRetriever
def merge_html_chunks(html_chunks: List[str], separator='\n') ‑> str

Classes

class BM25HtmlRetriever (top_k=10, xpathed_only=True)

Mainly for benchmarks, do not use it as the performances are not up to par with the other retrievers

Ancestors

Inherited members

class BaseHtmlRetriever

Helper class that provides a standard way to create an ABC using inheritance.

Ancestors

  • abc.ABC

Subclasses

Methods

def retrieve(self, query: QueryBundle, html_nodes: List[str], viewport_only=True) ‑> List[str]

This method must be implemented by the child retriever

class CleanHTMLRetriever (drop_base_64: bool = True, drop_svg: bool = True)

Helper class that provides a standard way to create an ABC using inheritance.

Ancestors

Inherited members

class FromXPathNodesExpansionRetriever (chunk_size: int = 750)

Retriever with expansion of HTML context around interactive elements (chunk_size in characters), then semantic contraction up to top_k results (number of chunks). Expansion is symmetrical so every step will add previous and next sibling. If no more sibling is present, then context is extended to parent node. When interactive chunks intersect, they are merged together.

Ancestors

Methods

def get_expanded_chunks(self, html_chunks: List[str]) ‑> List[str]
def get_included_xpaths(self, element) ‑> List[str]

Inherited members

class InteractiveXPathRetriever (driver: BaseDriver)

Helper class that provides a standard way to create an ABC using inheritance.

Ancestors

Methods

def get_html_with_xpath(self,
html_content,
filter_by_possible_interactions: Optional[PossibleInteractionsByXpath],
xpath_prefix='')

Inherited members

class OpsmSplitRetriever (driver: BaseDriver,
top_k: int = 5,
group_by: int = 10,
rank_fields: List[str] = ['element', 'placeholder', 'text', 'name'])

Helper class that provides a standard way to create an ABC using inheritance.

Ancestors

Inherited members

class RetrieversPipeline (*retrievers: BaseHtmlRetriever)

Executor for retrievers pipeline

Ancestors

Class variables

var retrievers : Tuple[BaseHtmlRetriever]

Inherited members

class SemanticRetriever (embedding: Optional[BaseEmbedding], top_k: int = 10, xpathed_only=True)

Semantic retriever up to top_k results (number of chunks)

Ancestors

Inherited members

class SyntaxicRetriever (top_k: int = 5, xpathed_only=True)

Syntaxic retriever up to top_k results (number of chunks)

Ancestors

Inherited members

class UniqueXPathRetriever (driver: BaseDriver)

Retriever that removes rendudancy when elements have the same bounding box

Ancestors

Inherited members

class XPathedChunkRetriever

Helper class that provides a standard way to create an ABC using inheritance.

Ancestors

Inherited members