The TextSlidingWindowChunking Class

The TextSlidingWindowChunking class refers to the strategy for chunking text based on overlapping windows of characters, also known as a “sliding window”.

Although it is a text-based method, it can also be used by media input types by generating a transcript which is used as the content rather of the media itself.

from gemini_batcher.strategies import TextSlidingWindowChunking

strategy = TextSlidingWindowChunking(chunk_char_size, window_char_size)
Class Attributes  
chunk_char_size (int) The number of characters per chunk.
window_char_size (int, optional) The number of characters that overlap between consecutive chunks. The default value is 0, meaning no overlap.

Note: This class is a dataclass, therefore, initialisation requires the exact same parameters as those described in the Class Attributes.

There are also some restrictions on the class attributes: -chunk_char_size <= window_char_size -window_char_size < 0