PaperLens

AI-powered data extraction and labeling for academic papers

1 Choose your task

What would you like to do?

Choose a task to get started

ℹ How does this work?

This tool uses AI vision models (GPT-4o, Gemini) to read academic PDFs and extract or classify structured information — no manual copy-pasting.

Extract pulls specific values (statistics, factor loadings, effect sizes) into a JSON schema you define. Label classifies papers by content. In both cases you describe your task in plain language, the AI generates a tailored prompt, and you upload one or more PDFs.

After extraction, results are shown alongside the source PDF with yellow highlights marking the exact passages the model cited as evidence. You can flip through all referenced pages and zoom in to verify each value against the original.

Every field in the results is editable: click any value to correct it. Edits are tracked as human overrides. When you download the results, the exported JSON includes the original model values, your corrections, the full prompt, the model used, and a timestamp — everything you need for a reproducible audit trail.

You need an API key for OpenAI or Google Gemini. Your key is sent only to the provider and is never stored.

You'll need: an API key (OpenAI / Gemini / DeepSeek / your own server) · one or more PDFs (max 50 MB each) · ~30–120 s per paper

2 Configure your AI model

Configure your AI model

Choose a provider, select a model, and enter your API key

Provider

Model

Server URL The base URL of your vLLM (or any OpenAI-compatible) server. /v1 is appended automatically if missing.

ℹ DeepSeek does not support image input. Instead, the PDF's text layer is extracted and passed as text. This works well for native text PDFs but cannot process scanned papers. Prompt generation also uses DeepSeek.

ℹ OpenAI enforces a per-request body-size limit (~50 MB). For very large or image-heavy PDFs, vision extraction can exceed this. If a paper fails with "request entity too large", switch to Text extraction in step 5 — or use Gemini, which has higher request limits.

API key

Your key is sent only to the selected provider and is never stored.

3 Describe your task

How would you like to provide a prompt?

You can describe your task and let AI write a prompt, or paste your own

4 Review prompt

Review your generated prompt

Does this prompt accurately capture what you need?

5 Upload your papers

Upload your papers

Upload one or more PDFs — they will be processed in parallel

View confirmed prompt

📄

Drop PDFs here

Up to 20 papers per batch · max 50 MB per file

Extraction results

Click any extracted value to edit it — the PDF on the right jumps to the evidence page automatically. Changes are tracked and included in the downloaded JSON.

Page

No page reference
found in this entry.

`?`	Open / close this help drawer
`Esc`	Close help / download menu
On the results page
`n` / `→`	Next paper
`p` / `←`	Previous paper
`j` / `↓`	Next entry
`k` / `↑`	Previous entry
`]`	Next evidence page
`[`	Previous evidence page
`e`	Start editing the first cell

PaperLens

MASEMiner

What would you like to do?

Extract data from a paper

Label a paper

Review existing results

Or pick a pre-built workflow

Recent extractions

Review existing results

Load files

Configure your AI model

How would you like to provide a prompt?

Generate with AI

I already have a prompt

Configure your MASEMiner extraction

Describe what you want to extract

Write your own prompt

Generating your prompt…

Review your generated prompt

Extracting data…

Upload your papers

Extraction results