arrow_back
Github Respository
Repo Link
Introduction
Welcome Note & Course Structure
1 - Introduction to Web Scraping
1.1 - About Web Scraping
1.2 - Types of Web Scraping
1.3 - Ethical Considerations of Web Scraping
1.4 - Advantages of Web Scraping
1.5 - Disadvantages of Web Scraping
1.6 - Alternatives to Web Scraping
Setting up Python Environment
2 - Environment Setup
2.1 - About Anaconda
2.2 - Tutorial on Common Anaconda Prompts
2.3 - Creating a Project Environment and Kernel
Primer on Web
3 - Primer on Web
3.1 - Client Server Architecture
3.2 - HTTP Request & Response
3.3 - HTTP Methods
3.4 - HTTP Status Codes
3.5 - Web Technologies
Mastering Requests Library
4 - Web Interaction With Requests Module
4.1 - About Requests
4.2 - Working with GET method
4.3 - Working with POST method
4.4 - Working with PUT and DELETE methods
4.5 - Working with HTTP Headers
4.6 - Working with Response Object
4.7 - Working with Public API
Beautiful Soup
5.1 - About Beautiful Soup
5.2 - Creating a Soup object
5.3 - Exploring the Soup object
5.4 - Soup Object using Requests
5.5 - Mini Project using Beautiful Soup, Requests & Pandas
Selenium
6 - Web Automation using Selenium
6.1 - About Selenium
6.2 - Getting Started with Selenium
6.3.1 - Strategies for Locating Web Elements
6.3.2 - Understanding XPath
6.3.3 - Basic Interaction with Web Elements
6.3.4 - Working with Dropdowns
6.3.5 - Working with Multiselect
6.3.6 - Basic Scrolling
6.3.7 - Infinite Scrolling
6.4.1 - Intro
6.4.2 - Explicit Waits
6.4.3 - Implicit Waits
6.4.4 - Working with IFrames
6.4.5 - Working with Alerts
6.5 - Best Practices & Optimization
Project - Yahoo Finance Stocks
7 - Real-world Projects
7.1.1 - Action Plan
7.1.2 - Prerequisites
7.1.3 - Prerequisites (continued)
7.1.4 - Scraping the Data
7.1.5 - Cleaning the Data
7.1.6 - Restructuring the Code
Project - Real Estate Listings
7.2.1 - Action Plan
7.2.2 - Prerequisites
7.2.3 - Interacting with Target Website - 99acres
7.2.4 - Scraping the Data + Page Navigation Fix
7.2.5 - Cleaning the Data
7.2.6 - Restructuring the Code
Course Outro
Handling Captchas
8 - Agenda
8.1 - Understanding Captchas
8.2 - Preventing Captchas
8.3 - Handling Captchas (Theory)
8.4 - Handling Captchas using input()
8.5 - Handling Captchas using pytesseract, OpenCV
8.6 - Best Practices & Final Takeaways
Scrapy
9 - Scrapy Module
9.1.1 - Introduction to Scrapy
9.1.2 - Installation
9.1.3 - Basic Set-up & Project Structure
9.1.4 - Running a Spider
9.2 - Spiders
9.2.1 - Definition of Spider
9.2.2 - Spider in Scrapy
9.2.3 - Anatomy of Spider
9.2.4 - Types of Spiders
9.3 - Working with Scrapy
9.3.1 - Scrapy Shell
9.3.2 - Scrapy Spider with Python
9.4 - Advanced Features
9.4.1 - Custom Spider Settings
9.4.2 - Data & Item Pipelines
9.4.3 - User-Agent Rotation & Proxy Usage
9.4.4 - Login Pages
9.4.5 - Handling APIs
9.5 - Mini Project
9.6 - Best Practices
Preview - WebScraping for Data Science
Discuss (
0
)
navigate_before
Previous
Next
navigate_next