Write code that automatically crawls and extract content from web servers and learn how to parse out data fields from the raw HTML that is returned. Learn how to extract content from any website without the need for an API. This book introduces various Python concepts and libraries such as BeautifulSoup, exception handling, regular expressions, domain traversing, spidering rules and data storage. Learn to export data in CSV as well as saving output into MySQL databases. Learn the basics of MySQL and best practices on database design. Advance scraping techniques cover data encoding techniques as well as extracting content from Microsoft Word documents.
Learn more about data cleaning, natural language processing and also scraping Javascript output. Understand APIs and JSON data structures. This book also covers image processing, text recognition, scraping security, cookies, form fields, multiprocess scraping and interprocess communication. Lastly, this book also covers the legal aspect of scraping and laws dealing with its abuse.
In short, this book is everything a hardcore coder would need to extract and store large scale content safely and efficiently.
Check out this book in detail on Amazon