Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DatabaseClient stream() and stream_to_file() #109

Open
stephen29xie opened this issue Jan 16, 2021 · 0 comments
Open

DatabaseClient stream() and stream_to_file() #109

stephen29xie opened this issue Jan 16, 2021 · 0 comments

Comments

@stephen29xie
Copy link

Current behaviour:

from omniduct.duct import Duct
duct = Duct.for_protocol(protocol='sqlalchemy')(...)

query = 'SELECT * FROM ...'

# 1
duct.stream(query, format='csv', batch=2)

# 2
duct.stream_to_file(query, '.../data.csv', batch=2)

# 3
duct.stream_to_file(query, '.../data.csv')

1: Batched stream() to memory repeatedly writes the column names with each batch.

2: Thus, when wrapped by stream_to_file(), the column names are written to file repeatedly for each batch

Eg:

State,City
California,San Francisco
Oregon,Portland
State,City
Texas,Houston
California,Los Angeles

3: When batch=None, stream(), and thus stream_to_file() does not write column names at all. So the output data file will not contain a column names header.

Eg:

California,San Francisco
Oregon,Portland
Texas,Houston
California,Los Angeles

In my opinion, the desired behaviour should be:

  • When streaming to csv file, the column names should be written once, as a header.
  • When streaming to memory, the generator should return only row data (no column names), like a cursor would.

What do you think about this? I can open a PR to get this done.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant