Also known as: Lumina
Open source API service to parse complex documents
Company is active
Event Year: 2024
Company is active
Event Year: 2024
Chunkr provides a robust and adaptable vision infrastructure, delivered as an open-source API service, designed to transform a wide array of document formats—including PDFs, PPTs, Word documents, Excel spreadsheets, PNG images, and JPEGs—into data optimized for Large Language Models (LLMs).
Born from the challenges of building lumina.sh, where the team processed approximately 600 million pages of scientific literature, Chunkr addresses the unmet needs of developers seeking a reliable ingestion pipeline. While researchers focused on results, developers recognized the value of our underlying technology, leading to the creation of Chunkr.
Chunkr delivers high-quality layout analysis, Optical Character Recognition (OCR), bounding box detection, granular Vision Language Model (VLM) controls, semantic chunking, and comprehensive last-mile engineering essential for developing exceptional AI applications. Typical use cases include Retrieval-Augmented Generation (RAG) and the automation of document-centric workflows, such as converting invoices and medical reports into structured database entries.
Chunkr provides a robust and adaptable vision infrastructure, delivered as an open-source API service, designed to transform a wide array of document formats—including PDFs, PPTs, Word documents, Excel spreadsheets, PNG images, and JPEGs—into data optimized for Large Language Models (LLMs).
Born from the challenges of building lumina.sh, where the team processed approximately 600 million pages of scientific literature, Chunkr addresses the unmet needs of developers seeking a reliable ingestion pipeline. While researchers focused on results, developers recognized the value of our underlying technology, leading to the creation of Chunkr.
Chunkr delivers high-quality layout analysis, Optical Character Recognition (OCR), bounding box detection, granular Vision Language Model (VLM) controls, semantic chunking, and comprehensive last-mile engineering essential for developing exceptional AI applications. Typical use cases include Retrieval-Augmented Generation (RAG) and the automation of document-centric workflows, such as converting invoices and medical reports into structured database entries.
Total Raised: Unknown (Y Combinator backed)
Last Round: Winter 2024
Total Raised: Unknown (Y Combinator backed)
Last Round: Winter 2024
B2B
B2B
B2B -> Engineering, Product and Design
B2B -> Engineering, Product and Design
Team size: 3
Hiring: No
Team size: 3
Hiring: No