HTML to text
html2text is a Python package that converts a page of
HTMLinto clean, easy-to-read plainASCII text.
The ASCII also happens to be a valid Markdown (a text-to-HTML format).
Installation and Setupβ
pip install html2text
Document Transformerβ
See a usage example.
from langchain_community.document_loaders import Html2TextTransformer