Skip to content
Change the repository type filter

All

    Repositories list

    • MegaHan97K

      Public
      [PR 2025] The official GitHub page of "MegaHan97K: A Large-Scale Dataset for Mega-Category Chinese Character Recognition with over 97K Categories"
      Python
      57020Updated Dec 22, 2025Dec 22, 2025
    • AutoHDR

      Public
      [ACL 2025 main] The official GitHub page of "Reviving Cultural Heritage: A Novel Approach for Comprehensive Historical Document Restoration"
      Python
      45120Updated Dec 22, 2025Dec 22, 2025
    • HisDoc1B

      Public
      11710Updated Dec 18, 2025Dec 18, 2025
    • WenMind

      Public
      WenMind benchmark.
      Python
      1800Updated Dec 17, 2025Dec 17, 2025
    • MCS-Bench

      Public
      Python
      1300Updated Dec 17, 2025Dec 17, 2025
    • ACP-RAG

      Public
      [NAACL 2025] Large-Scale Corpus Construction and Retrieval-Augmented Generation for Ancient Chinese Poetry: New Method and Data Insights (ACP-Corpus; ACP-QA; ACP-RAG)
      Python
      0500Updated Dec 17, 2025Dec 17, 2025
    • OCR-Reasoning

      Public
      [arXiv: 2505.17163] OCR-Reasoning Benchmark: Unveiling the True Capabilities of MLLMs in Complex Text-Rich Image Reasoning
      Python
      36820Updated Dec 17, 2025Dec 17, 2025
    • TongGu-VL

      Public
      A Multimodal large language model for Classical Chinese Studies
      0100Updated Dec 16, 2025Dec 16, 2025
    • TVSIP

      Public
      [ACM MM 2025] The official GitHub page of "From Pixels to Semantics: A Novel MLLM-Driven Approach for Explainable Tampered Text Detection"
      Python
      0800Updated Dec 10, 2025Dec 10, 2025
    • [PRCV 25] Towards Real-World Document Specular Highlight Removal: The DocHighlight Dataset and DocSHRNet Method
      0200Updated Oct 8, 2025Oct 8, 2025
    • A Comprehensive Benchmark for Chinese Long Historical Document Understanding
      Python
      0400Updated Sep 23, 2025Sep 23, 2025
    • MCCD

      Public
      [ICDAR 2025] The official GitHub page of "MCCD: A Multi-Attribute Chinese Calligraphy Character Dataset Annotated with Script Styles, Dynasties, and Calligraphers"
      Python
      01120Updated Sep 2, 2025Sep 2, 2025
    • DOLPHIN

      Public
      [IEEE TIFS 2024] Online Writer Retrieval with Chinese Handwritten Phrases: A Synergistic Temporal-Frequency Representation Learning Approach
      Python
      15510Updated Aug 3, 2025Aug 3, 2025
    • PAVENet

      Public
      [IEEE TPAMI 2025] Privacy-Preserving Biometric Verification With Handwritten Random Digit String
      Python
      06510Updated Aug 3, 2025Aug 3, 2025
    • SigBench

      Public
      0000Updated Jun 19, 2025Jun 19, 2025
    • [PR 2026] The official GitHub page of "AutoScaler: Self Scale Alignment for Handwritten Mathematical Expression Recognition"
      Python
      0910Updated Jun 8, 2025Jun 8, 2025
    • C3bench

      Public
      C3 benchmark
      0310Updated Mar 30, 2025Mar 30, 2025
    • Algorithms, papers, datasets, performance comparisons for Document AI. Continuously updating.
      920300Updated Mar 1, 2025Mar 1, 2025
    • DCOH-120K

      Public
      1500Updated Feb 20, 2025Feb 20, 2025
    • RFUND

      Public
      [MM'2024] Official release of RFUND introduced in the MM'2024 paper "PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for End-to-end Document Pair Extraction"
      02000Updated Dec 4, 2024Dec 4, 2024
    • [EMNLP 2024] TongGu, a classical Chinese language model.
      45360Updated Sep 28, 2024Sep 28, 2024
    • .github

      Public
      0000Updated Jun 4, 2024Jun 4, 2024
    • SCUT-EnsExam is a real-world handwritten text erasure dataset for examination paper scenarios, which consists of 545 examination paper images. The dataset is randomly divided into training set and test set of 430 and 115 images, respectively.
      01400Updated Dec 5, 2023Dec 5, 2023
    • Evaluation of the Optical Character Recognition (OCR) capabilities of GPT-4V(ision)
      Python
      412500Updated Nov 13, 2023Nov 13, 2023
    • A CNN model builds with Pytorch and reaches 99.7% accuracy
      Python
      2400Updated May 1, 2021May 1, 2021