跳转到主内容

向量存储

GPT Researcher 软件包允许您与现有的已填充数据的 LangChain 向量存储进行集成。有关支持的 LangChain 向量存储的完整列表,请参阅此链接

您可以创建一组嵌入(embeddings)和 LangChain 文档,并将它们存储在您选择的任何受支持的向量存储中。GPT-Researcher 适用于任何实现了 asimilarity_search 方法的 LangChain 向量存储。

如果您想使用向量存储中的现有知识,请确保将 report_source 设置为 "langchain_vectorstore"。任何其他设置都将从抓取的数据中添加额外信息,并可能污染您的向量数据库(更多信息请参阅如何将抓取的数据添加到您的向量存储)。

Faiss

from gpt_researcher import GPTResearcher

from langchain.text_splitter import CharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_core.documents import Document

# exerpt taken from - https://paulgraham.com/wealth.html
essay = """
May 2004

(This essay was originally published in Hackers & Painters.)

If you wanted to get rich, how would you do it? I think your best bet would be to start or join a startup.
That's been a reliable way to get rich for hundreds of years. The word "startup" dates from the 1960s,
but what happens in one is very similar to the venture-backed trading voyages of the Middle Ages.

Startups usually involve technology, so much so that the phrase "high-tech startup" is almost redundant.
A startup is a small company that takes on a hard technical problem.

Lots of people get rich knowing nothing more than that. You don't have to know physics to be a good pitcher.
But I think it could give you an edge to understand the underlying principles. Why do startups have to be small?
Will a startup inevitably stop being a startup as it grows larger?
And why do they so often work on developing new technology? Why are there so many startups selling new drugs or computer software,
and none selling corn oil or laundry detergent?


The Proposition

Economically, you can think of a startup as a way to compress your whole working life into a few years.
Instead of working at a low intensity for forty years, you work as hard as you possibly can for four.
This pays especially well in technology, where you earn a premium for working fast.

Here is a brief sketch of the economic proposition. If you're a good hacker in your mid twenties,
you can get a job paying about $80,000 per year. So on average such a hacker must be able to do at
least $80,000 worth of work per year for the company just to break even. You could probably work twice
as many hours as a corporate employee, and if you focus you can probably get three times as much done in an hour.[1]
You should get another multiple of two, at least, by eliminating the drag of the pointy-haired middle manager who
would be your boss in a big company. Then there is one more multiple: how much smarter are you than your job
description expects you to be? Suppose another multiple of three. Combine all these multipliers,
and I'm claiming you could be 36 times more productive than you're expected to be in a random corporate job.[2]
If a fairly good hacker is worth $80,000 a year at a big company, then a smart hacker working very hard without
any corporate bullshit to slow him down should be able to do work worth about $3 million a year.
...
...
...
"""

document = [Document(page_content=essay)]
text_splitter = CharacterTextSplitter(chunk_size=200, chunk_overlap=30, separator="\n")
docs = text_splitter.split_documents(documents=document)

vector_store = FAISS.from_documents(documents, OpenAIEmbeddings())

query = """
Summarize the essay into 3 or 4 succinct sections.
Make sure to include key points regarding wealth creation.

Include some recommendations for entrepreneurs in the conclusion.
"""


# Create an instance of GPTResearcher
researcher = GPTResearcher(
query=query,
report_type="research_report",
report_source="langchain_vectorstore",
vector_store=vector_store,
)

# Conduct research and write the report
await researcher.conduct_research()
report = await researcher.write_report()

PGVector

from gpt_researcher import GPTResearcher
from langchain_postgres.vectorstores import PGVector
from langchain_openai import OpenAIEmbeddings

CONNECTION_STRING = 'postgresql://someuser:somepass@localhost:5432/somedatabase'


# assuming the vector store exists and contains the relevent documents
# also assuming embeddings have been or will be generated
vector_store = PGVector.from_existing_index(
use_jsonb=True,
embedding=OpenAIEmbeddings(),
collection_name='some collection name',
connection=CONNECTION_STRING,
async_mode=True,
)

query = """
Create a short report about apples.
Include a section about which apples are considered best
during each season.
"""

# Create an instance of GPTResearcher
researcher = GPTResearcher(
query=query,
report_type="research_report",
report_source="langchain_vectorstore",
vector_store=vector_store,
)

# Conduct research and write the report
await researcher.conduct_research()
report = await researcher.write_report()

将抓取的数据添加到您的向量存储

在某些情况下,您可能希望将抓取的数据和文档存储到您自己的向量存储中以备将来使用。GPT-Researcher 也支持您通过简单地输入您的向量存储来无缝实现此操作(请确保将 report_source 的值设置为 langchain_vectorstore 之外的其他值)。

from gpt_researcher import GPTResearcher

from langchain_community.vectorstores import InMemoryVectorStore
from langchain_openai import OpenAIEmbeddings

vector_store = InMemoryVectorStore(embedding=OpenAIEmbeddings())

query = "The best LLM"

# Create an instance of GPTResearcher
researcher = GPTResearcher(
query=query,
report_type="research_report",
report_source="web",
vector_store=vector_store,
)

# Conduct research, the context will be chunked and stored in the vector_store
await researcher.conduct_research()

# Query the 5 most relevant context in our vector store
related_contexts = await vector_store.asimilarity_search("GPT-4", k = 5)
print(related_contexts)
print(len(related_contexts)) #Should be 5