Skip to main content

Posts

Showing posts from 2017

Text classification using CNN written in tensorflow.

Problem statement :
You are supposed to build a model which automatically classifies an article under Finance, Law, Fashion and Lifestyle. Use the data from leading magazines for training the model.


Solution:
Github Repo : link

In past, I had used NLTK and python to solve the above problem, but neural networks have proven to be more accurate when it comes to NLP. I had researched on text classification libraries and different approaches to solve this problem and decided to use CNN.

I have used Denny Britz code for implementing the CNN( convolutional neural network). Here is the  link for his blog post.

I would describe the files and the procedure I followed to get the data, train the model, test the model and the results.

First, I went to the leading newspaper TheGuardian and looked for the labels i.e Finance, Law, Fashion, Lifestyle. Scraping the data from the same source would be help in keeping the homogeneity in the articles.

I have used Goose and BeautifulSoup to scrape the arti…