PS Helper is a Python package developed by the Professional Services team.
It provides a set of helper libraries and command-line tools to speed up and standardize development workflows.
- Ready-to-use Python utilities for internal projects.
- CLI commands for common tasks (e.g., creating repository templates).
- Easy to install and extend.
You can install PS Helper in two ways:
Clone the repository and install it with pip:
git clone https://github.com/bitmakerla/ps-helper.git
cd ps-helper
pip install -e .This will install the package in editable mode, so any code changes will be reflected immediately.
You can install the package without cloning:
pip install git+https://github.com/bitmakerla/ps-helper.gitCheck available commands:
ps-helper --helpCreate a new project from the template:
ps-helper create-repo-template MyProjectGenerate beautiful HTML reports from Scrapy metrics JSON files:
ps-helper create-report scrapy_stats.jsonThis will automatically create a report named scrapy_stats-report.html in the same directory as your metrics file.
Use the reusable helper to register transfer bytes in Scrapy stats (including downloader/response_bytes):
from ps_helper.extensions import record_curl_transfer_bytes
record_curl_transfer_bytes(
stats=self.crawler.stats,
curl_response=curl_resp,
add_to_downloader_response_bytes=True,
)With MetricsExtension, this is also reflected in the final JSON report under resources.
For automatic tracking in every curl request, use TrackedCurlSession:
from ps_helper.extensions import TrackedCurlSession
self.curl_session = TrackedCurlSession(stats=self.crawler.stats)
# keep using get/post as usual
curl_resp = self.curl_session.get(url, impersonate="chrome120")Block unwanted URLs in your Scrapy projects with intelligent filtering.
- Add to your Scrapy project's
settings.py:
"DOWNLOADER_MIDDLEWARES": {
'ps_helper.blockers.url_blocker.URLBlockerMiddleware': 585,
},
# Configure words to block
"URL_BLOCKER_WORDS": ['admin', 'login', '.css', '.js', 'api/']
"URL_BLOCKER_MODE": 'partial' # or 'strict'- Run your spider - unwanted URLs will be automatically filtered!
Blocks URLs containing the word as a substring:
URL_BLOCKER_MODE = 'partial'
URL_BLOCKER_WORDS = ['auth']
# Results:
# ❌ BLOCKED: site.com/authentication (contains 'auth')
# ❌ BLOCKED: site.com/auth (contains 'auth')Blocks only exact word matches in URL components:
URL_BLOCKER_MODE = 'strict'
URL_BLOCKER_WORDS = ['auth']
# Results:
# ✅ ALLOWED: site.com/authentication ('auth' ≠ 'authentication')
# ❌ BLOCKED: site.com/auth ('auth' = 'auth')# Required
URL_BLOCKER_WORDS = ['admin', 'login', '.pdf', 'tracking']
# Optional (with defaults)
URL_BLOCKER_MODE = 'partial' # 'partial' or 'strict'
URL_BLOCKER_CASE_SENSITIVE = False # Case sensitivity
URL_BLOCKER_LOG_BLOCKED = True # Show blocked URLs in logs