Further development is going in https://github.com/Megaputer/inepta, see examples in examples folder
This repository contains examples of web scrapers used in Internet Source node to demonstrate their use cases and capabilities.
- Install the newest version of python from https://python.org/downloads. Python 3.7+ is required.
- Download this repository (here we placed it to D drive, so full path is
D:\python-scraper-examples) - Open
Command Promptand navigate to the repository root folder - Create virtual environment
python -m venv env
- Install scraper dependencies
env\Scripts\pip install -r requirements.txt
- Download chromium browser for
webapp_scraper:
env\Scripts\python -m playwright install chromium
- Register web scrapers in
PolyAnalyst:- Navigate to
Server settingsinPolyAnalyst 6.5 Administrative Tool - Click on
Data and API connectionsand selectWeb scrapers - Open
Web scraperscontext menu and click onAdd item - Enter the scraper name in the
Namefield. This name will be displayed in the drop-downScrapermenu in theInternet Sourcenode wizard - Enter a command in the
Commandfield. For example,
D:\python-scraper-examples\env\Scripts\python.exe D:\python-scraper-examples\megaputer_blog.py- Click
Save changesto apply new settings
- Navigate to
- Add
Internet Sourcenode to workspace - Choose one of scrapers registered earlier in the drop-down
Scrapermenu - Set parameters if selected scraper supports them
- Execute node
This project is licensed under the MIT License - see the LICENSE file for details