In this post we look at the approach taken to improve extraction of articles from web pages, what went well and didn’t go too well.