DatabaseClient stream() and stream_to_file() #109

stephen29xie · 2021-01-16T22:07:54Z

Current behaviour:

from omniduct.duct import Duct
duct = Duct.for_protocol(protocol='sqlalchemy')(...)

query = 'SELECT * FROM ...'

# 1
duct.stream(query, format='csv', batch=2)

# 2
duct.stream_to_file(query, '.../data.csv', batch=2)

# 3
duct.stream_to_file(query, '.../data.csv')

1: Batched stream() to memory repeatedly writes the column names with each batch.

2: Thus, when wrapped by stream_to_file(), the column names are written to file repeatedly for each batch

Eg:

State,City
California,San Francisco
Oregon,Portland
State,City
Texas,Houston
California,Los Angeles

3: When batch=None, stream(), and thus stream_to_file() does not write column names at all. So the output data file will not contain a column names header.

Eg:

California,San Francisco
Oregon,Portland
Texas,Houston
California,Los Angeles

In my opinion, the desired behaviour should be:

When streaming to csv file, the column names should be written once, as a header.
When streaming to memory, the generator should return only row data (no column names), like a cursor would.

What do you think about this? I can open a PR to get this done.

Thanks.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DatabaseClient stream() and stream_to_file() #109

DatabaseClient stream() and stream_to_file() #109

stephen29xie commented Jan 16, 2021

DatabaseClient stream() and stream_to_file() #109

DatabaseClient stream() and stream_to_file() #109

Comments

stephen29xie commented Jan 16, 2021