Skip to content

Conversation

@tangledhelix
Copy link
Member

Ports a feature from PPtools: find page numbers in various formats and display them (roman first, arabic next).

Understands different formats like Page_1, page_1, page1 (in id attribute). Also attempts to parse numbers from <span class="pagenum"> tags (p. 1, [Pg 1], etc.).

Example of what this looked like in PPTools:
Screenshot 2025-12-31 at 12 19 04 AM

Example from this change:

----- document info ------------------------------------------------------------
[info] page numbers ( roman): i–xi
[info] page numbers (arabic): 13–168

Can display multiple ranges if numbers are missing from the sequence:

----- document info ------------------------------------------------------------
[info] page numbers ( roman): i–iii, v–xi
[info] page numbers (arabic): 13–23, 25–45, 47–168

@tangledhelix
Copy link
Member Author

Note that this adds to requirements.py, the roman module is required.

ports a feature from PPtools: find page numbers in various formats and
display them (roman first, arabic next). understands different formats
like Page_1, page_1, page1 (in id attribute) and attempts to parse out
span class=pagenum formats like p. 1, [Pg 1] and so forth.
pphtml.py Outdated
import roman
from time import strftime
from html.parser import HTMLParser
import regex as re # for unicode support (pip install regex)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we follow the convention of grouping the built-in packages together, a newline, and then the 3rd party?

import sys
import os
import argparse
import itertools
from time import strftime
from html.parser import HTMLParser

import regex as re  # for unicode support
import roman
from PIL import Image

Ideally each would be alpha-sorted but I'm not going to get wound around the axle about it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done! and sorted :)

@cpeel cpeel requested a review from srjfoo December 31, 2025 19:01
cpeel pushed a commit to DistributedProofreaders/ppwb that referenced this pull request Dec 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants