定制化研究

GPT Researcher 工具包允许您根据需求定制研究，例如基于特定来源（URL）或本地文档进行研究，甚至可以指定执行研究时代理所用的提示指令。

基于特定来源的研究 📚

您可以通过提供一个 URL 列表来指定希望 GPT Researcher 进行研究的来源。GPT Researcher 将通过 source_urls 对提供的来源进行研究。

如果您希望 GPT Researcher 在您提供的 URL 之外进行额外的研究，即在它认为适合查询/子查询的其他各种网站上进行研究，您可以将参数 complement_source_urls 设置为 True。默认值 False 将只在您通过 source_urls 提供的网站上进行搜索。

from gpt_researcher import GPTResearcher
import asyncio

async def get_report(query: str, report_type: str, sources: list) -> str:
    researcher = GPTResearcher(query=query, report_type=report_type, source_urls=sources, complement_source_urls=False)
    await researcher.conduct_research()
    report = await researcher.write_report()
    return report

if __name__ == "__main__":
    query = "What are the biggest trends in AI lately?"
    report_source = "static"
    sources = [
        "https://en.wikipedia.org/wiki/Artificial_intelligence",
        "https://www.ibm.com/think/insights/artificial-intelligence-trends",
        "https://www.forbes.com/advisor/business/ai-statistics"
    ]
    report = asyncio.run(get_report(query=query, report_source=report_source, sources=sources))
    print(report)

指定代理提示 📝

您可以指定进行研究时所依据的代理提示指令。这允许您将研究引导到特定方向并定制报告的布局。只需将提示作为 query 参数传递给 GPTResearcher 类，并将报告类型 report_type 设为 "custom_report"。

from gpt_researcher import GPTResearcher
import asyncio

async def get_report(prompt: str, report_type: str) -> str:
    researcher = GPTResearcher(query=prompt, report_type=report_type)
    await researcher.conduct_research()
    report = await researcher.write_report()
    return report
    
if __name__ == "__main__":
    report_type = "custom_report"
    prompt = "Research the latest advancements in AI and provide a detailed report in APA format including sources."

    report = asyncio.run(get_report(prompt=prompt, report_type=report_type))
    print(report)

基于本地文档的研究 📄

您可以通过提供本地文档的路径来指示 GPT Researcher 对这些文档进行研究。目前支持的文件格式包括：PDF、纯文本、CSV、Excel、Markdown、PowerPoint 和 Word 文档。

第一步：添加环境变量 DOC_PATH，指向您文档所在的文件夹。

例如

export DOC_PATH="./my-docs"

第二步：当您创建 GPTResearcher 类的实例时，将 report_source 参数设置为 "local"。

GPT Researcher 随后将对提供的文档进行研究。

from gpt_researcher import GPTResearcher
import asyncio

async def get_report(query: str, report_source: str) -> str:
    researcher = GPTResearcher(query=query, report_source=report_source)
    await researcher.conduct_research()
    report = await researcher.write_report()
    return report
    
if __name__ == "__main__":
    query = "What can you tell me about myself based on my documents?"
    report_source = "local" # "local" or "web"

    report = asyncio.run(get_report(query=query, report_source=report_source))
    print(report)

混合研究 🔄

您可以结合上述方法进行混合研究。例如，您可以指示 GPT Researcher 同时研究网络来源和本地文档。只需提供来源，并将 report_source 参数设置为 "hybrid"，然后见证奇迹的发生。

请注意！您需要为网络来源设置合适的检索器（retrievers），并为本地文档设置文档路径（doc path），这样才能正常工作。要了解更多关于检索器的信息，请查阅检索器文档。

基于 LangChain 文档的研究 🦜️🔗

您可以指示 GPT Researcher 对一个 LangChain 文档实例列表进行研究。

例如

from langchain_core.documents import Document
from typing import List, Dict
from gpt_researcher import GPTResearcher
from langchain_postgres.vectorstores import PGVector
from langchain_openai import OpenAIEmbeddings
from sqlalchemy import create_engine
import asyncio



CONNECTION_STRING = 'postgresql://someuser:somepass@localhost:5432/somedatabase'

def get_retriever(collection_name: str, search_kwargs: Dict[str, str]):
    engine = create_engine(CONNECTION_STRING)
    embeddings =  OpenAIEmbeddings()

    index = PGVector.from_existing_index(
        use_jsonb=True,
        embedding=embeddings,
        collection_name=collection_name,
        connection=engine,
    )

    return index.as_retriever(search_kwargs=search_kwargs)


async def get_report(query: str, report_type: str, report_source: str, documents: List[Document]) -> str:
    researcher = GPTResearcher(query=query, report_type=report_type, report_source=report_source, documents=documents)
    await researcher.conduct_research()
    report = await researcher.write_report()
    return report

if __name__ == "__main__":
    query = "What can you tell me about blue cheese based on my documents?"
    report_type = "research_report"
    report_source = "langchain_documents"

    # using a LangChain retriever to get all the documents regarding cheese
    # https://api.python.langchain.com/en/latest/retrievers/langchain_core.retrievers.BaseRetriever.html#langchain_core.retrievers.BaseRetriever.invoke
    langchain_retriever = get_retriever("cheese_collection", { "k": 3 })
    documents = langchain_retriever.invoke("All the documents about cheese")
    report = asyncio.run(get_report(query=query, report_type=report_type, report_source=report_source, documents=documents))
    print(report)

基于特定来源的研究 📚​

指定代理提示 📝​

基于本地文档的研究 📄​

混合研究 🔄​

基于 LangChain 文档的研究 🦜️🔗​

基于特定来源的研究 📚

指定代理提示 📝

基于本地文档的研究 📄

混合研究 🔄

基于 LangChain 文档的研究 🦜️🔗