Automate Content Pruning

This content pruning tool is meant to help automate some of the tedious work of grouping URLs.

How it works:

  1. Crawls and extracts the body content on your URL list. When crawling, the tool ignores any content within a header, nav, and footer tag.
  2. Then, the tool caches that content into a temporary storage.
  3. Using the NLP library sk-learn, the tool uses a similarity percentage to classify and group the URLs.
  4. The final export is a list of URLs grouped by a similar number value.