Start a new topic

Only extract conent

Hi,

I am looking for a smart solution to extract only the real informative content of a range of different webpages. I had the idea, that certain html tags tend to have more content than others. Is that a good way of filtering content in the preprocessing or do you have any other ideas?

Thank you for your help.

Login to post a comment