Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

timeline analysis SQL query is dead slow #470

Open
fynnos opened this issue Nov 20, 2024 · 0 comments
Open

timeline analysis SQL query is dead slow #470

fynnos opened this issue Nov 20, 2024 · 0 comments
Assignees
Labels
backend This issue is related to the backend

Comments

@fynnos
Copy link
Collaborator

fynnos commented Nov 20, 2024

the following query (generated by SQLAlchemy from python code) is dead slow for projects with large amounts of documents (and spans), like 10k+ documents. For 50k+ it never finishes and just burns Postgres CPU.

SELECT array_remove(array_agg(distinct(sourcedocument.id)), NULL) AS sdoc_ids, EXTRACT(year FROM anon_2.date) AS anon_1 
 FROM sourcedocument JOIN (SELECT sourcedocument.id AS id, sourcedocumentmetadata_1.date_value AS date, array_remove(array_agg(distinct(code.id)), NULL) AS "TA_CODE_ID_LIST" 
 FROM sourcedocument JOIN sourcedocumentmetadata AS sourcedocumentmetadata_1 ON sourcedocument.id = sourcedocumentmetadata_1.source_document_id AND sourcedocumentmetadata_1.project_metadata_id = 2234 AND sourcedocumentmetadata_1.date_value IS NOT NULL LEFT OUTER JOIN annotationdocument ON sourcedocument.id = annotationdocument.source_document_id, spanannotation LEFT OUTER JOIN code ON code.id = spanannotation.code_id 
 WHERE sourcedocument.project_id = 87 GROUP BY sourcedocument.id, sourcedocumentmetadata_1.date_value) AS anon_2 ON sourcedocument.id = anon_2.id 
 WHERE anon_2."TA_CODE_ID_LIST" @> ARRAY[9621]::INTEGER[] GROUP BY EXTRACT(year FROM anon_2.date) ORDER BY sdoc_ids DESC
@fynnos fynnos added the backend This issue is related to the backend label Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend This issue is related to the backend
Projects
None yet
Development

No branches or pull requests

2 participants