Skip to main content

Posts

Showing posts from 2020

Web scraping using Python package Goose

Web scraping is one of the powerful technique used to collect large amounts of data from internet. Companies with quality data strive in today's world when it comes to Machine learning. Let's take a scenario. You set out to build worlds best restaurant review classification system. You collect all the reviews from several restaurants and use a fancy deep learning algorithm to do the classification.Turns out your classification algorithm is not doing well out in public. What went wrong ? Well, machine learning is all about capturing the pattern and generalizing it so well that unseen data will also work well. Given the situation you are in, you have these options. Try GPU, incorporate latest ML techniques, build an ensemble of many models, revisit feature engineering... or Get more data. As trivial as it might sound, fetching more data would enable any ML algorithm to capture more pattern with in the data and perform well on unseen data. I am going to talk about not so f