港股即時

Economic Calendar

Monday, April 16, 2018

Python Web Scraping

Libraries for Python Web Scraping:


Requests

  •  using it to get the raw ingredients (i.e. raw HTML)
http://docs.python-requests.org/en/master/user/quickstart/

BeautifulSoup  

  • a parsing library that can use different parsers. A parser is simply a program that can extract data from HTML and XML documents.
https://www.crummy.com/software/BeautifulSoup/bs4/doc/

lxml

  • a high-performance, production-quality HTML and XML parsing library
http://lxml.de/index.html#introduction


Selenium

  • to scrape sites with data tucked away by JavaScript.
http://selenium-python.readthedocs.io/
  • if you need to build a real spider or web-crawler
https://doc.scrapy.org/en/latest/topics/architecture.html

No comments:

Post a Comment