Skip to content

[Store] Implement Recursive Character Text TransformerΒ #1250

@aszenz

Description

@aszenz

Right now the store has transformer interface with a few basic text splitters.

In practice the most common and widely used method is to split the text based on a set of separators recursively usually paragraphs, lines, words and then characters.

So for a chunk size of 100, it will try to split by paragraphs first, if the paragraph is too long, then split by lines and so on.

Implemented in lang chain here:

https://github.com/langchain-ai/langchain/blob/9ef2feb6747f5a69d186bd623b569ad722829a5e/libs/langchain/langchain/text_splitter.py#L826

This is in turn a foundation for document specific splitting like for markdown you may want to spilt by headers, for php code files by functions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    FeatureNew featureRFCRFC = Request For Comments (proposals about features that you want to be discussed)StoreIssues & PRs about the AI Store component

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions