Module core.retrievers
Functions
def filter_for_xpathed_nodes(nodes: List)def get_default_retriever(driver: BaseDriver, embedding: Optional[BaseEmbedding] = None) ‑> BaseHtmlRetrieverdef get_nodes_text(nodes: List[NodeWithScore]) ‑> List[str]def get_trivial_retriever(driver: BaseDriver, embedding: Optional[BaseEmbedding] = None) ‑> BaseHtmlRetrieverdef merge_html_chunks(html_chunks: List[str], separator='\n') ‑> str
Classes
class BM25HtmlRetriever (top_k=10, xpathed_only=True)-
Mainly for benchmarks, do not use it as the performances are not up to par with the other retrievers
Ancestors
- BaseHtmlRetriever
- abc.ABC
Inherited members
class BaseHtmlRetriever-
Helper class that provides a standard way to create an ABC using inheritance.
Ancestors
- abc.ABC
Subclasses
- BM25HtmlRetriever
- CleanHTMLRetriever
- FromXPathNodesExpansionRetriever
- InteractiveXPathRetriever
- OpsmSplitRetriever
- RetrieversPipeline
- SemanticRetriever
- SyntaxicRetriever
- UniqueXPathRetriever
- XPathedChunkRetriever
Methods
def retrieve(self, query: QueryBundle, html_nodes: List[str], viewport_only=True) ‑> List[str]-
This method must be implemented by the child retriever
class CleanHTMLRetriever (drop_base_64: bool = True, drop_svg: bool = True)-
Helper class that provides a standard way to create an ABC using inheritance.
Ancestors
- BaseHtmlRetriever
- abc.ABC
Inherited members
class FromXPathNodesExpansionRetriever (chunk_size: int = 750)-
Retriever with expansion of HTML context around interactive elements (
chunk_sizein characters), then semantic contraction up totop_kresults (number of chunks). Expansion is symmetrical so every step will add previous and next sibling. If no more sibling is present, then context is extended to parent node. When interactive chunks intersect, they are merged together.Ancestors
- BaseHtmlRetriever
- abc.ABC
Methods
def get_expanded_chunks(self, html_chunks: List[str]) ‑> List[str]def get_included_xpaths(self, element) ‑> List[str]
Inherited members
class InteractiveXPathRetriever (driver: BaseDriver)-
Helper class that provides a standard way to create an ABC using inheritance.
Ancestors
- BaseHtmlRetriever
- abc.ABC
Methods
def get_html_with_xpath(self,
html_content,
filter_by_possible_interactions: Optional[PossibleInteractionsByXpath],
xpath_prefix='')
Inherited members
class OpsmSplitRetriever (driver: BaseDriver,
top_k: int = 5,
group_by: int = 10,
rank_fields: List[str] = ['element', 'placeholder', 'text', 'name'])-
Helper class that provides a standard way to create an ABC using inheritance.
Ancestors
- BaseHtmlRetriever
- abc.ABC
Inherited members
class RetrieversPipeline (*retrievers: BaseHtmlRetriever)-
Executor for retrievers pipeline
Ancestors
- BaseHtmlRetriever
- abc.ABC
Class variables
var retrievers : Tuple[BaseHtmlRetriever]
Inherited members
class SemanticRetriever (embedding: Optional[BaseEmbedding], top_k: int = 10, xpathed_only=True)-
Semantic retriever up to
top_kresults (number of chunks)Ancestors
- BaseHtmlRetriever
- abc.ABC
Inherited members
class SyntaxicRetriever (top_k: int = 5, xpathed_only=True)-
Syntaxic retriever up to
top_kresults (number of chunks)Ancestors
- BaseHtmlRetriever
- abc.ABC
Inherited members
class UniqueXPathRetriever (driver: BaseDriver)-
Retriever that removes rendudancy when elements have the same bounding box
Ancestors
- BaseHtmlRetriever
- abc.ABC
Inherited members
class XPathedChunkRetriever-
Helper class that provides a standard way to create an ABC using inheritance.
Ancestors
- BaseHtmlRetriever
- abc.ABC
Inherited members