How to build a table schema with dynamic embedding model #5930
-
I am reading Hybrid search docs and I should make the table like this: embeddings = get_registry().get("openai").create()
class Documents(LanceModel):
vector: Vector(embeddings.ndims()) = embeddings.VectorField()
text: str = embeddings.SourceField()
table = db.create_table("documents", schema=Documents) what if I want to put this inside a function basd on embeddings that the user wants like: class LaunceDB:
def __init__(self, path: str, embedding_model: str = "all-MiniLM-L6-v2"):
self.db = lancedb.connect(path)
embeddings = (
get_registry().get("sentence-transformers").create(name=embedding_model)
)
ArticleScheme = # generate this
self.table = self.db.create_table(
"articles", schema=ArticleScheme, exist_ok=True
)
self.table.create_fts_index("text") # for full-text search
self.table.create_index() # ANN for vector search I want the schema to look like this: class ArticleScheme (LanceModel):
vector: Vector(embeddings.ndims()) = embeddings.VectorField()
text: str = embeddings.SourceField()
db_id: int = None # I do not know if None is correct here, what should it be? Update: class PandaDB:
def __init__(self, path: str, embedding_model: str = "all-MiniLM-L6-v2"):
self.db = lancedb.connect(path)
embeddings = (
get_registry().get("sentence-transformers").create(name=embedding_model)
)
class ItemSchema(LanceModel):
vector: Vector(embeddings.ndims()) = embeddings.VectorField()
text: str = embeddings.SourceField()
reference_id: int = Field(...)
self.table = self.db.create_table("articles", schema=ItemSchema, exist_ok=True)
self.table.create_fts_index("text") # for full-text search
self.table.create_index() # ANN for vector search but now vector: Vector(embeddings.ndims()) = embeddings.VectorField() is giving me Pylance error: |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
The error occurs because you're trying to use a function call ( |
Beta Was this translation helpful? Give feedback.
The error occurs because you're trying to use a function call (
embeddings.ndims()
) in a class field declaration, which is not allowed in Python's type hinting. To resolve this, you can compute the embedding dimensions outside of the class definition and pass them as a constant value. Instead of callingembeddings.ndims()
inside the class, you could calculate it beforehand and then define theVector
field with that value. For example, define the dimension once likeembedding_dim = embeddings.ndims()
and useVector(embedding_dim)
in the class. This approach will avoid the Pylance error and still allow you to dynamically define the vector field based on the embedding model.