-
-
Notifications
You must be signed in to change notification settings - Fork 149
Open
Labels
FeatureNew featureNew featureRFCRFC = Request For Comments (proposals about features that you want to be discussed)RFC = Request For Comments (proposals about features that you want to be discussed)StoreIssues & PRs about the AI Store componentIssues & PRs about the AI Store component
Description
Right now the store has transformer interface with a few basic text splitters.
In practice the most common and widely used method is to split the text based on a set of separators recursively usually paragraphs, lines, words and then characters.
So for a chunk size of 100, it will try to split by paragraphs first, if the paragraph is too long, then split by lines and so on.
Implemented in lang chain here:
This is in turn a foundation for document specific splitting like for markdown you may want to spilt by headers, for php code files by functions.
OskarStark and chr-hertel
Metadata
Metadata
Assignees
Labels
FeatureNew featureNew featureRFCRFC = Request For Comments (proposals about features that you want to be discussed)RFC = Request For Comments (proposals about features that you want to be discussed)StoreIssues & PRs about the AI Store componentIssues & PRs about the AI Store component