Problem statement :
You are supposed to build a model which automatically classifies an article under Finance, Law, Fashion and Lifestyle. Use the data from leading magazines for training the model.
Solution:
Github Repo : linkIn past, I had used NLTK and python to solve the above problem, but neural networks have proven to be more accurate when it comes to NLP. I had researched on text classification libraries and different approaches to solve this problem and decided to use CNN.
I have used Denny Britz code for implementing the CNN( convolutional neural network). Here is the link for his blog post.
I would describe the files and the procedure I followed to get the data, train the model, test the model and the results.
First, I went to the leading newspaper TheGuardian and looked for the labels i.e Finance, Law, Fashion, Lifestyle. Scraping the data from the same source would be help in keeping the homogeneity in the articles.
I have used Goose and BeautifulSoup to scrape the articles. Code for the same is uploaded in the Github. The folder structure and the data files description is as follows:
raw_data/ Contains files related to train and test
├── collect_url_data.py Python script that scrapes articles
├── data Training data folder
│ ├── fashion_7000.txt 7000 training data for class fashion
│ ├── finance_7000.txt 7000 training data for class finance
│ ├── law_7000.txt 7000 training data for class law
│ └── lifestyle_7000.txt 7000 training data for class lifestyle
├── fashion From original scraped data and cleaned one
│ ├── fashion_7000.txt 7000 training data for class fashion
│ ├── fashion_original.txt Original scraped data
│ ├── log Log output of python
│ ├── test_fashion.txt test data for python 1001 samples
│ ├── urls.txt urls which were scraped
│ └── urltext.txt raw text from urls
├── finance From original scraped data and cleaned one
│ ├── finance.txt raw text from urls
│ ├── finance_7000.txt 7000 training data for class finance
│ ├── finance_urls.txt urls scraped for finance
│ ├── log_finance log output
│ ├── original_finance.txt Original scraped file
│ └── test_finance.txt test sample for finance
├── law Data folder for law
│ ├── law.txt scraped data for law
│ ├── law_7000.txt 7000 training samples for law
│ ├── law_urls.txt urls scraped for law
│ ├── log_law log output
│ ├── original_law.txt original scraped data for law
│ └── test_law.txt test data for law
└── lifestyle Data folder for lifestyle
├── lifestyle.txt cleaned data for lifestyle
├── lifestyle_7000.txt 7000 training samples for lifestyle
├── lifestyle_urls.txt urls collected for scraping
├── log_lifestyle log output of the script
├── original_lifestyle.txt original scraped data
└── test_lifestyle.txt test data for lifestyle
Using the python script I have scraped the above categories. Each folder has the respective raw data and the cleaned data. I have cleaned the unnecessary lines using sed.
Once the data was ready, I went through the basics of neural networks and made appropriate changes in the tensorflow code to solve the problem. Changes include changing the source files and increasing the array size on lines 16-20 and 52 in this script
Once the script was ready and the required python libraries were installed, I was able to successfully run the code and tensorflow created a new folder called runs, which holds the final results.
Here is a screenshot of the results.
Picture link
Results:
I was able to get 94 % accuracy using the second checkpoint in the runs folder.
Here is the google sheet link
Unfortunately there were few empty lines since I had split the article in the training data, CNN predicted the label for that anyway. I could have avoided this mistake.
If you have more questions, feel free to reach out to me at shanker.mani0@gmail.com.
Happy hacking !
Thanks for information Hadoop developer training
ReplyDeleteI wanted to thank for sharing this article and I have bookmarked this page to check out new stuff.
ReplyDeletePython Training in Chennai
Python course in Chennai
ccna Training institute in Chennai
ccna institute in Chennai
R Training in Chennai
R Programming Training in Chennai
Python Training in Velachery
Python Training in Tambaram
This comment has been removed by the author.
ReplyDeleteHey, greetings and applause for your excellent post. You never fail to provide quality information in an appropriate quantity. I must say you must be consistent in this activity of yours. Hey, if you are a QuickBooks user, then pay attention to my advice. Consult the best tech support service providers for 24*7 services at QuickBooks Helpline Number +1 833-228-2822.
ReplyDeleteQuickBooks Error 6000
Hi, I was searching for a post that can help me in completing my task. Thank you very much for sharing such a beautiful post! You can manage your business accounting and fiscal tasks accurately with just a few clicks, with QuickBooks Accounting Solution. For more help, you can contact QuickBooks Support Phone Number +1-800-329-0391. We are available 24 x7 hours and 365 days a year.
ReplyDeleteBest Python Course Training Institute in Madhapur Ameerpet Hyderabad
ReplyDeleteGreat! Continue posting these stuffs. It's fun. I l would also like sharing a thought on QuickBooks–a user-friendly accounting program used by practitioners around the world. While you may have small glitches, you shouldn't panic, since QuickBooks Support Phone Number +1 833-441-8848 is accessible24x7 for your assistance. Ring instantly! And connect with the best executive to fix your issue.
ReplyDeleteGreat! I must say this blog post is awesome. Besides, I really like the use of QuickBooks, an exemplary accounting program that functions in depth and simplicity for the administration of accounting. Moreover, you can dial QuickBooks Desktop Support Phone Number +1 833-441-8848 if you encounter any problem. Call us anytime as we operate round the clock.
ReplyDeleteReach us via QuickBooks Desktop Support Phone Number +1(833)780-0086 to get rid of the QuickBooks issues. The professional team is always there to assist you with the finest solutions. For More Visit: http://www.santrasolutions.com/quickbooks-desktop-support/
ReplyDeleteQuickBooks Tech Support Number
ReplyDeleteQuickBooks Support Phone Number Oregon
QuickBooks Support Phone Number Texas
QuickBooks Support Phone Number Atlanta
QuickBooks Support Phone Number New York
QuickBooks Support Phone Number Alaska
QuickBooks Helpline Number
The information which you have provided is very good. It is very useful who is looking for
ReplyDeleteBig data consulting services Singapore
Data Warehousing services Singapore
Data Warehousing services
Data migration services Singapore
Data migration services
Just saying thanks will not just be sufficient, for the fantasti c lucidity in your writing. I will instantly grab your rss feed to stay informed of any updates.
ReplyDelete360DigiTMG data science course
I really enjoy simply reading all of your weblogs. Simply wanted to inform you that you have people like me who appreciate your work. Definitely a great post. Hats off to you! The information that you have provided is very helpful.
ReplyDelete360digiTMG.com artificial intelligence online course
It is really a very informative post for all those budding entreprenuers planning to take advantage of post for business expansions. You always share such a wonderful articlewhich helps us to gain knowledge .Thanks for sharing such a wonderful article, It will be deinitely helpful and fruitful article.
ReplyDeleteCyber Security Training Course in Chennai | Certification | Cyber Security Online Training Course | Ethical Hacking Training Course in Chennai | Certification | Ethical Hacking Online Training Course | CCNA Training Course in Chennai | Certification | CCNA Online Training Course | RPA Robotic Process Automation Training Course in Chennai | Certification | RPA Training Course Chennai | SEO Training in Chennai | Certification | SEO Online Training Course
ReplyDeleteNice post. Thanks for sharing! I want people to know just how good this information is in your article. It’s interesting content and Great work.
Cyber Security Training Course in Chennai | Certification | Cyber Security Online Training Course | Ethical Hacking Training Course in Chennai | Certification | Ethical Hacking Online Training Course | CCNA Training Course in Chennai | Certification | CCNA Online Training Course | RPA Robotic Process Automation Training Course in Chennai | Certification | RPA Training Course Chennai | SEO Training in Chennai | Certification | SEO Online Training Course
Great Article
ReplyDeleteArtificial Intelligence Projects
Project Center in Chennai
JavaScript Training in Chennai
JavaScript Training in Chennai
thanks for the information seeks such more blogs with complete knowledge.
ReplyDelete360DigiTMG data analytics course
Nice blog, it's so knowledgeable, informative, and good looking site. I appreciate your hard work. Good job. Thank you for this wonderful sharing with us.data science course in Hyderabad
ReplyDeleteVery awesome!!! When I searched for this I found this website at the top of all blogs in search engines
ReplyDeletebusiness analytics course
I think I actually have never seen such blogs ever before that has complete things with all details that i would like. therefore kindly update this ever for us.
ReplyDeletedata scientist certification
"Thank you very much for your information.
ReplyDeleteFrom,
"data scientist online course
I wanted to thank you for this great read!! I definitely enjoy every little bit of it. I have you bookmarked to check out new stuff you post.
ReplyDeletedata scientist course in hyderabad
I am sure that this is going to help a lot of individuals. Keep up the good work. It is highly convincing and I enjoyed going through the entire blog.
ReplyDeletedata science course
Thank you for excellent article.You made an article that is interesting.
ReplyDeleteai courses chennai
Wonderful illustrated information. I thank you for that. No doubt it will be very useful for my future projects. Would like to see some other posts on the same subject!
ReplyDeletedata science course fee in hyderabad
Really fantastic blog. Informative and knowledgeable content. This blog is useful to many people. Keep sharing more stuff like this. Thank you.
ReplyDeleteOnline Data Science Training in Hyderabad
Really an awesome blog and informative content. Thanks for sharing with us. If you want to become a data scientist, then check out the following link.
ReplyDeleteAI Patasala Data Science Training in Hyderabad
Nice Content ! If you are looking for contact information for QuickBooks, or just want to learn more about how QuickBooks works, dial QuickBooks Customer Support Phone Number +1 855-941-1563 for help along the way.
ReplyDelete