HTML-aware SEO Content Chunking: Free Online Tool

Free online tool to perform HTML-aware web page text content chunking.

To start, click the "HTML" button, paste the HTML code of your page and press "Chunk".

The tool breaks down the web page content into chunks using HTML tags as context when making decisions about chunk boundaries.

This tool prepends each text chunk with a hierarchy of H1-H6 headers and page title to provide context for the chunk's body, to get more meaningful embedding vectors from these chunks.

The tool relies on Mozilla's Readability JS library to find the main content of the page. Following the best practices for HTML structure, the H1 tag is often stripped or replaced with H2. This is a behavior of the Readability library, not my design choice.

While it's likely not exactly how search engines break pages into chunks, this tool allows you to get a better picture of how your content looks after breaking it into chunks and how the structure and headers help or hurt your relevance.

To get the HTML code of any web page, you can use my Copy HTML bookmarklet.