Automate Content Pruning
This content pruning tool is meant to help automate some of the tedious work of grouping URLs.
How it works:
- Crawls and extracts the body content on your URL list. When crawling, the tool ignores any content within a header, nav, and footer tag.
- Then, the tool caches that content into a temporary storage.
- Using the NLP library sk-learn, the tool uses a similarity percentage to classify and group the URLs.
- The final export is a list of URLs grouped by a similar number value.