timeline analysis SQL query is dead slow #470

fynnos · 2024-11-20T15:35:34Z

the following query (generated by SQLAlchemy from python code) is dead slow for projects with large amounts of documents (and spans), like 10k+ documents. For 50k+ it never finishes and just burns Postgres CPU.

SELECT array_remove(array_agg(distinct(sourcedocument.id)), NULL) AS sdoc_ids, EXTRACT(year FROM anon_2.date) AS anon_1 
 FROM sourcedocument JOIN (SELECT sourcedocument.id AS id, sourcedocumentmetadata_1.date_value AS date, array_remove(array_agg(distinct(code.id)), NULL) AS "TA_CODE_ID_LIST" 
 FROM sourcedocument JOIN sourcedocumentmetadata AS sourcedocumentmetadata_1 ON sourcedocument.id = sourcedocumentmetadata_1.source_document_id AND sourcedocumentmetadata_1.project_metadata_id = 2234 AND sourcedocumentmetadata_1.date_value IS NOT NULL LEFT OUTER JOIN annotationdocument ON sourcedocument.id = annotationdocument.source_document_id, spanannotation LEFT OUTER JOIN code ON code.id = spanannotation.code_id 
 WHERE sourcedocument.project_id = 87 GROUP BY sourcedocument.id, sourcedocumentmetadata_1.date_value) AS anon_2 ON sourcedocument.id = anon_2.id 
 WHERE anon_2."TA_CODE_ID_LIST" @> ARRAY[9621]::INTEGER[] GROUP BY EXTRACT(year FROM anon_2.date) ORDER BY sdoc_ids DESC

fynnos added the backend This issue is related to the backend label Nov 20, 2024

fynnos assigned bigabig Nov 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

timeline analysis SQL query is dead slow #470

timeline analysis SQL query is dead slow #470

fynnos commented Nov 20, 2024

timeline analysis SQL query is dead slow #470

timeline analysis SQL query is dead slow #470

Comments

fynnos commented Nov 20, 2024