Only extract conent

Posted over 5 years ago by tobi123

Post a topic
t
tobi123

Hi,

I am looking for a smart solution to extract only the real informative content of a range of different webpages. I had the idea, that certain html tags tend to have more content than others. Is that a good way of filtering content in the preprocessing or do you have any other ideas?

Thank you for your help.

0 Votes


0 Comments

Login to post a comment