I am looking for a smart solution to extract only the real informative content of a range of different webpages. I had the idea, that certain html tags tend to have more content than others. Is that a good way of filtering content in the preprocessing or do you have any other ideas?
Hi,
I am looking for a smart solution to extract only the real informative content of a range of different webpages. I had the idea, that certain html tags tend to have more content than others. Is that a good way of filtering content in the preprocessing or do you have any other ideas?
Thank you for your help.
0 Votes
0 Comments
Login to post a comment