PaperLens

AI-powered data extraction and labeling for academic papers

1 Choose your task

What would you like to do?

Choose a task to get started

ℹ How does this work?

This tool uses AI vision models (GPT-4o, Gemini) to read academic PDFs and extract or classify structured information — no manual copy-pasting.

Extract pulls specific values (statistics, factor loadings, effect sizes) into a JSON schema you define. Label classifies papers by content. In both cases you describe your task in plain language, the AI generates a tailored prompt, and you upload one or more PDFs.

After extraction, results are shown alongside the source PDF with yellow highlights marking the exact passages the model cited as evidence. You can flip through all referenced pages and zoom in to verify each value against the original.

Every field in the results is editable: click any value to correct it. Edits are tracked as human overrides. When you download the results, the exported JSON includes the original model values, your corrections, the full prompt, the model used, and a timestamp — everything you need for a reproducible audit trail.

You need an API key for OpenAI or Google Gemini. Your key is sent only to the provider and is never stored.

You'll need: an API key (OpenAI / Gemini / DeepSeek / your own server) · one or more PDFs (max 50 MB each) · ~30–120 s per paper
2 Configure your AI model

Configure your AI model

Choose a provider, select a model, and enter your API key

Your key is sent only to the selected provider and is never stored.
3 Describe your task

How would you like to provide a prompt?

You can describe your task and let AI write a prompt, or paste your own

4 Review prompt

Review your generated prompt

Does this prompt accurately capture what you need?

5 Upload your papers

Upload your papers

Upload one or more PDFs — they will be processed in parallel

View confirmed prompt

      
📄

Drop PDFs here

Up to 20 papers per batch · max 50 MB per file