Only extract conent

Posted about 7 years ago by tobi123

Post a topic

tobi123

Hi,

I am looking for a smart solution to extract only the real informative content of a range of different webpages. I had the idea, that certain html tags tend to have more content than others. Is that a good way of filtering content in the preprocessing or do you have any other ideas?

Thank you for your help.

0 Votes

0 Comments