GitHub - v4ler11/pdf-summ-coxit: Advanced PDF Summarizer for specific COXIT documents using latest LLMs and semi-agentic workflows

PDF Summarizer for COXIT

A powerful tool for automated PDF document processing and summarization using LLM models. The tool processes PDF documents, extracts their content, and generates structured summaries using advanced language models.

Note: Tool is specifically designed for COXIT documents and won't be of use for generic PDF documents.

Note: Paid GEMINI Plan is recommended as it has higher rate limits

Features

📄 Automated PDF document processing
🔍 Intelligent section detection and organization
🤖 LLM-powered content summarization
📊 Structured output in CSV format
👀 Real-time document monitoring
🚀 Multi-threaded & asynchronous processing pipeline

How It Works

The tool operates in a pipeline:

Document Monitoring: Watches the target directory for new PDF files
PDF Processing: Extracts and processes text from PDF documents
Step 1: Initial content analysis and section detection
Step 2: Section-based summarization
Output Formatting: Generates structured CSV output

On-Device Installation & Usage

Install uv

about: uv - modern pip, pipx, poetry, venv replacement

wget -qO- https://astral.sh/uv/install.sh | sh

Clone the repository

git clone https://github.com/valaises/pdf-summ-coxit.git

Install pdf-summ-coxit

uv sync && pip install -e .

Set ENV variables

export GEMINI_API_KEY =
export OPENAI_API_KEY =

Run the summarizer

python -m src.core.main -d /path/to/your/pdfs

Command Line Arguments

-d --target-dir: Directory to monitor for PDF files (required)
--debug: Enable debug logging (optional)

Docker Usage

(assuming docker is installed)

Build an image

cd ~/code/pdf-summ-coxit

docker buid -t pdf-summ .

Specify variables in .env

cp .env.example .env

vim .env

Start container (detached mode)

docker run -d --env-file .env -v /path/to/your/pdfs:/app/target_dir pdf-summ

Docker Compose Usage

(assuming docker & docker compose are installed)

Set ENV variables in .env & .env.compose

cp .env.example .env
cp .env.compose.example .env.compose

vim .env

vim .env.compose

Start container (detached mode)

docker compose --env-file .env.compose up

Getting results

After each document is processed, output.csv, output_parts.csv, and usage.csv are automatically re-generated in artifacts inside a directory specified by the -d --output_dir argument.

Notes about `usage`:

N-requests needed to summarize a document in most cases is: page_count + sections_count
Model is generally gemini-2, unless it fails to generate JSON, then it's gpt-4o
time needed to summarize all documents != sum(t for t in doc.usage), as documents are processed asynchronously
cost is calculated using data provided in assets/model_list.json

Evaluation, locally

What does evaluation?

It compares target results from tests/expected.json with results generated by the summarizer

Attention

Eval only works if there's an ongoing summarization. e.g. if you started eval.py after summarizer finished it's work, eval.py will show nothing.

export ENV variables

export GEMINI_API_KEY =
export OPENAI_API_KEY =

Copy PDFs from dataset into dataset dir

mkdir ~/code/pdf-summ-coxit/dataset && cp /path/to/your/pdfs/* ~/code/pdf-summ-coxit/dataset

Run eval.py

cd ~/code/pdf-summ-coxit/dataset

python tests/eval.py -d dataset

Run summarizer

python -m src.core.main -d dataset

As PDFs getting processed, watch STDOUT of eval.py for results and output.csv, output_parts.csv in ~/code/pdf-summ-coxit/dataset

Evaluation, docker / compose

Start container (see docker / docker-compose usage)
Inside a container, run eval.py

docker exec -it <container_name> bash

python tests/eval.py -d target_dir

Video: how to run eval

Screen.Recording.2025-02-17.at.12.1.mp4

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
assets		assets
scripts		scripts
src		src
tests		tests
.dockerignore		.dockerignore
.env.compose.example		.env.compose.example
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yaml		docker-compose.yaml
pyproject.toml		pyproject.toml
run.sh		run.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PDF Summarizer for COXIT

Features

How It Works

The tool operates in a pipeline:

On-Device Installation & Usage

Command Line Arguments

Docker Usage

Docker Compose Usage

Getting results

Notes about `usage`:

Evaluation, locally

What does evaluation?

Attention

Evaluation, docker / compose

Video: how to run eval

About

Uh oh!

Uh oh!

Languages

v4ler11/pdf-summ-coxit

Folders and files

Latest commit

History

Repository files navigation

PDF Summarizer for COXIT

Features

How It Works

The tool operates in a pipeline:

On-Device Installation & Usage

Command Line Arguments

Docker Usage

Docker Compose Usage

Getting results

Notes about usage:

Evaluation, locally

What does evaluation?

Attention

Evaluation, docker / compose

Video: how to run eval

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages

Notes about `usage`: