学术研究图谱_academic-research-mapper

学术研究图谱_academic-research-mapper

以下为本文档的中文说明

该技能用于绘制任何技术或学术主题的研究领域图谱。它通过搜索arXiv、Semantic Scholar等学术数据库,系统性地收集和分析相关文献,识别研究趋势、关键论文、主要研究者和机构合作关系。该技能自动构建主题的知识结构图谱,展示研究方向的分支脉络和演进路径。适用于研究生、科研人员和学术新手需要快速了解一个研究领域的全貌。通过自动化文献检索和分析,大大缩短了文献调研的时间周期,帮助研究人员在论文撰写、课题立项或研究方向选择时获得全面的文献基础支持。该技能提供了详细的操作指南和最佳实践,帮助用户快速上手并深入掌握。通过系统的功能模块划分和丰富的应用场景说明,用户可以在实际项目中有效运用该技能提升工作效率。该技能注重实用性和可操作性,涵盖从基础配置到高级功能的完整知识体系,满足不同层次用户的学习需求。持续更新和优化的内容确保用户始终能够接触到最新的技术发展和行业实践。通过此技能的学习和应用,用户可以减少摸索时间,快速获得可用的解决方案,将精力集中在核心业务逻辑和创新工作上,从而在技术快速迭代的环境中保持竞争力。该技能的模块化设计使其易于扩展和定制,用户可以根据自身需求灵活调整应用方式,实现最大化的价值产出。该技能整合了常见的设计模式和最佳实践,提供了清晰的学习路径和参考资料,帮助用户在短时间内建立起完整的知识框架,并有能力在实际项目中灵活运用所学内容解决问题。


Research Landscape Mapper — Understand a Field Before You Build or Write

You have access to the TinyFish CLI (tinyfish), a tool that runs browser automations from the terminal using natural language goals. This skill uses it to search arXiv, Semantic Scholar, and Google Scholar in parallel, then synthesizes results into a structured landscape report with identified gaps.

Pre-flight Check (REQUIRED)

Before making any TinyFish call, always run BOTH checks:

1. CLI installed?

bash/zsh:

whichtinyfish&&tinyfish--version||echo"TINYFISH_CLI_NOT_INSTALLED"

PowerShell:

Get-Commandtinyfish;tinyfish--version

If not installed, stop and tell the user:

Install the TinyFish CLI:npm install -g @tiny-fish/cli

2. Authenticated?

tinyfish auth status

If not authenticated, stop and tell the user:

You need a TinyFish API key. Get one at: https://agent.tinyfish.ai/api-keys

Then authenticate:

Option 1 — CLI login (interactive):

tinyfish auth login

Option 2 — bash/zsh (Mac/Linux, current session):

exportTINYFISH_API_KEY="your-api-key-here"

Option 3 — bash/zsh (persist across sessions, add to ~/.bashrc or ~/.zshrc):

echo'export TINYFISH_API_KEY="your-api-key-here"'>>~/.zshrcsource~/.zshrc

Option 4 — PowerShell (current session only):

$env:TINYFISH_API_KEY="your-api-key-here"

Option 5 — Claude Code settings:Add to~/.claude/settings.local.json:

{"env":{"TINYFISH_API_KEY":"your-api-key-here"}}

Do NOT proceed until both checks pass.


What This Skill Does

Given a research topic (e.g.“retrieval-augmented generation”or“protein structure prediction”), this skill:

  1. SearchesarXivfor preprints sorted by most recent — capturing what is being worked on right now
  2. SearchesSemantic Scholarfor papers ranked by relevance with citation counts — identifying what the field considers important
  3. SearchesGoogle Scholarfor broad coverage including published venues not yet on arXiv

It then deduplicates across all three sources by title similarity, clusters papers into subtopics, and synthesizes findings into a structured landscape report: what is well-studied, what is emerging, and where the gaps are.


Core Command

tinyfish agent run--url<url>"<goal>"

Flags

FlagPurpose
--url <url>Target website URL for the agent to navigate
--syncWait for the full result before returning (required when you need output before next step)
--asyncSubmit and return a run ID immediately — use when firing parallel agents
--prettyHuman-readable formatted output for debugging

Keyword Strategy

The quality of results depends entirely on your search terms. Before running anything, derive 2–3 keyword variants from the topic. Each source has different vocabulary norms — academic terms work best on Semantic Scholar, shorter compressed terms work best on arXiv.

TopicPrimary keywordsVariant AVariant B
Retrieval-augmented generationretrieval augmented generationRAG language modeldense retrieval QA
Protein structure predictionprotein structure predictionAlphaFold protein foldingab initio structure biology
Neural architecture searchneural architecture searchNAS automated machine learninghyperparameter optimization deep learning
Federated learning privacyfederated learningfederated learning differential privacydistributed training privacy

Use the primary keywords for the first parallel pass. If any source returns fewer than 5 results, run a second pass with the variant keywords on that source only.


Step-by-Step Workflow

Step 1 — Derive keywords and build URLs

Before running any agents, construct all three search URLs. Do this in your hea
d or in a scratch note — do not make TinyFish calls yet.

arXiv URL pattern:

https://arxiv.org/search/?query=<keywords>&searchtype=all&order=-announced_date_first

Semantic Scholar URL pattern:

https://www.semanticscholar.org/search?q=<keywords>&sort=Relevance

Google Scholar URL pattern:

https://scholar.google.com/scholar?q=<keywords>&as_sdt=0%2C5&hl=en

Replace<keywords>with URL-encoded primary keywords (spaces become+).


Step 2 — Search all three sources in parallel

Fire all three agents simultaneously. Do NOT wait for one to finish before starting the next.

arXiv — sorted by most recent:

tinyfish agent run--sync\\--url"https://arxiv.org/search/?query=retrieval+augmented+generation&searchtype=all&order=-announced_date_first"\\"Extract the top 15 search results as JSON: [{\\"title\\": str,\\"authors\\": [str],\\"year\\": str,\\"abstract_snippet\\": str (first 150 chars of abstract),\\"arxiv_id\\": str,\\"url\\": str}]. If a result has no year visible, use the submission date year."

Semantic Scholar — sorted by relevance with citation counts:

tinyfish agent run--sync\\--url"https://www.semanticscholar.org/search?q=retrieval+augmented+generation&sort=Relevance"\\"Extract the top 15 search results as JSON: [{\\"title\\": str,\\"authors\\": [str],\\"year\\": str,\\"citation_count\\": str,\\"venue\\": str,\\"abstract_snippet\\": str (first 150 chars),\\"url\\": str}]. Scroll down to load more results if fewer than 10 are visible."

Google Scholar — broad coverage:

tinyfish agent run--sync\\--url"https://scholar.google.com/scholar?q=retrieval+augmented+generation&as_sdt=0%2C5&hl=en"\\"Extract the top 15 search results as JSON: [{\\"title\\": str,\\"authors\\": [str],\\"year\\": str,\\"citation_count\\": str,\\"venue\\": str,\\"snippet\\": str,\\"url\\": str}]. Citation count appears after 'Cited by' — extract that number."

Parallel Execution

All three source searches are fully independent. Always fire them simultaneously.

Good — parallel calls (fire and wait):

tinyfish agent run--sync\\--url"https://arxiv.org/search/?query=retrieval+augmented+generation&searchtype=all&order=-announced_date_first"\\"Extract the top 15 results as JSON: [{\\"title\\": str,\\"authors\\": [str],\\"year\\": str,\\"abstract_snippet\\": str,\\"arxiv_id\\": str,\\"url\\": str}]">/tmp/arxiv_results.json&tinyfish agent run--sync\\--url"https://www.semanticscholar.org/search?q=retrieval+augmented+generation&sort=Relevance"\\"Extract the top 15 results as JSON: [{\\"title\\": str,\\"authors\\": [str],\\"year\\": str,\\"citation_count\\": str,\\"venue\\": str,\\"abstract_snippet\\": str,\\"url\\": str}]">/tmp/s2_results.json&tinyfish agent run--sync\\--url"https://scholar.google.com/scholar?q=retrieval+augmented+generation&as_sdt=0%2C5&hl=en"\\"Extract the top 15 results as JSON: [{\\"title\\": str,\\"authors\\": [str],\\"year\\": str,\\"citation_count\\": str,\\"venue\\": str,\\"snippet\\": str,\\"url\\": str}]">/tmp/scholar_results.json&waitecho"All three sources complete."

Bad — sequential calls:

# Do NOT do this — triples the wait time for no benefittinyfish agent run--url"https://arxiv.org/...""search arxiv, then also search semantic scholar, then also search google scholar"

Each source is always its own separate call. Never combine them into one goal.


Step 3 — Handle sparse results (if needed)

After the parallel run completes, check each result set. If any source returned fewer than 5 papers, run a second pass on that source with variant keywords:

# Example: arXiv returned only 3 results for primary keywordstinyfish agent run--sync\\--url"https://arxiv.org/search/?query=RAG+language+model&searchtype=all&order=-announced_date_first"\\"Extract the top 15 results as JSON: [{\\"title\\": str,\\"authors\\": [str],\\"year\\": str,\\"abstract_snippet\\": str,\\"arxiv_id\\": str,\\"url\\": str}]"

Do not run second passes if the primary pass wa
s already rich — this wastes steps.


Step 4 — Synthesize into a Landscape Report

Once all three sources have returned results, synthesize findings into this structure. Use only data that TinyFish actually returned — do not hallucinate paper titles, citation counts, or author names.

## Research Landscape: <topic> ### Volume & Coverage - arXiv: <N> papers found, most recent: <year> - Semantic Scholar: <N> papers found, highest citations: <N> (paper title) - Google Scholar: <N> papers found - Unique papers after deduplication: <N> ### Key Papers (sorted by citation count) 1. <Title> — <Authors>, <Year>, <Venue if known> — <citation_count> citations <one-sentence summary from abstract snippet> 2. ... (list top 8–10 unique papers) ### Active Subtopics Cluster the papers by what they are actually about. Label each cluster with a short name. - **<Subtopic A>**: <N> papers — <1-sentence description of what this cluster covers> - **<Subtopic B>**: <N> papers — ... - **<Subtopic C>**: <N> papers — ... ### Key Authors & Groups - <Author name> — <N> papers in results, affiliated with <institution if visible> - ... (list authors appearing 2+ times across the results) ### Recency Signal - Papers from last 12 months: <N> - Papers from last 3 years: <N> - Oldest paper in results: <year> - Trend: <accelerating / stable / declining> (infer from year distribution) ### Gaps & Open Directions Based on what the papers cover and what they do not: - **Gap 1**: <specific thing that is missing or underexplored> - **Gap 2**: ... - **Gap 3**: ... ### Landscape Verdict <2–3 sentences: is this field crowded or open, mature or nascent, dominated by a few groups or distributed, and what is the single most underexplored angle?>

Deduplication Rules

Papers appear across multiple sources. Before synthesizing, deduplicate using these rules in order:

  1. Exact title match(case-insensitive) → keep one, prefer the Semantic Scholar entry (has citation count)
  2. Title similarity > 85%(same words, different punctuation) → treat as the same paper
  3. Same arXiv ID→ always the same paper regardless of title variation
  4. If unsure, keep both and note the possible duplicate in the report

Subtopic Clustering Guide

Group papers by reading their abstract snippets, not just their titles. Common cluster patterns:

If papers discuss…Cluster label
Benchmarks, evaluation datasets, metrics“Evaluation & benchmarks”
New model architectures or training methods“Model architecture”
Application to a specific domain (medical, legal, code)“Domain adaptation: ”
Efficiency, speed, compression, cost“Efficiency & scaling”
Safety, alignment, robustness, hallucination“Safety & reliability”
Surveys, meta-analyses, overviews“Surveys & overviews”

A paper can belong to at most two clusters. Name the clusters based on what you actually see, not these defaults if the topic warrants different ones.


Managing Runs

# List recent runs (useful if a run takes longer than expected)tinyfish agent run list# Get the full output of a specific run by IDtinyfish agent run get<run_id># Cancel a run that is taking too longtinyfish agent run cancel<run_id>

Output Format

The CLI streamsdata: {...}SSE lines by default. The final usable result is the event wheretype == "COMPLETE"andstatus == "COMPLETED"— the extracted data is in theresultJsonfield. Read the raw output directly; no script-side parsing is required.

When saving to files with>redirection as shown in the parallel example, the full SSE stream is saved. Extract the JSON by looking for the last line containing"COMPLETED"and parsing theresultJsonvalue from it.


Example: Full Run for “Mixture of Experts”

# Step 1 — fire all three in paralleltinyfish agent run--sync\\--url"https://arxiv.org/search/?query=mixture+of+experts+transformer&searchtype=all&order=-announced_date_first "\\"Extract top 15 results as JSON: [{\\"title\\": str,\\"authors\\": [str],\\"year\\": str,\\"abstract_snippet\\": str,\\"arxiv_id\\": str,\\"url\\": str}]"\\>/tmp/moe_arxiv.json&tinyfish agent run--sync\\--url"https://www.semanticscholar.org/search?q=mixture+of+experts+transformer&sort=Relevance"\\"Extract top 15 results as JSON: [{\\"title\\": str,\\"authors\\": [str],\\"year\\": str,\\"citation_count\\": str,\\"venue\\": str,\\"abstract_snippet\\": str,\\"url\\": str}]"\\>/tmp/moe_s2.json&tinyfish agent run--sync\\--url"https://scholar.google.com/scholar?q=mixture+of+experts+LLM&as_sdt=0%2C5&hl=en"\\"Extract top 15 results as JSON: [{\\"title\\": str,\\"authors\\": [str],\\"year\\": str,\\"citation_count\\": str,\\"venue\\": str,\\"snippet\\": str,\\"url\\": str}]"\\>/tmp/moe_scholar.json&wait# Step 2 — synthesize# Read /tmp/moe_arxiv.json, /tmp/moe_s2.json, /tmp/moe_scholar.json# Deduplicate → cluster → produce landscape report