Skip to content

Commit

Permalink
improved definition of Distinct; as per #98 (comment)
Browse files Browse the repository at this point in the history
  • Loading branch information
hartig committed Jun 16, 2023
1 parent 67ac56e commit ff80086
Showing 1 changed file with 16 additions and 4 deletions.
20 changes: 16 additions & 4 deletions spec/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -9368,10 +9368,22 @@ <h3>SPARQL Algebra</h3>
</div>
<div class="defn">
<p><b>Definition: <span id="defn_algDistinct">Distinct</span></b></p>
<p>Let Ψ be a sequence of elements which may be either solution mappings or lists of RDF terms. We define:</p>
<p>Distinct(Ψ) = [ e | e in Ψ ]</p>
<p>card[Distinct(Ψ)](e) = 1</p>
<p>The order of Distinct(Ψ) must preserve any ordering given by OrderBy (if any).</p>
<p>Let <var>Ψ</var> be a sequence of elements which may be either solution mappings or lists of RDF terms.</p>

This comment has been minimized.

Copy link
@afs

afs Jun 17, 2023

Contributor

Where does the "either ... or lists of RDF terms" come from? It's a sequence of solution mappings (rows).

This comment has been minimized.

Copy link
@kasei

kasei Jun 17, 2023

Contributor

As part of @hartig's work on fixing the mixing of "multisite" and "sequence" in the definitions for aggregation (#98, #99).

<p>Distinct(<var>Ψ</var>) is a sequence of elements that has the following properties.</p>
<ol>
<li>Every element in <var>Ψ</var> is contained in Distinct(<var>Ψ</var>).</li>

This comment has been minimized.

Copy link
@kasei

kasei Jun 16, 2023

Contributor

This wording seems confusing to me. If there are duplicated elements in Ψ, I would not think of the distinct sequence as "containing every element in Ψ" (since some of them will be omitted because they are duplicates).

This comment has been minimized.

Copy link
@TallTed

TallTed Jun 16, 2023

Member

s/Every element/Every unique element/ ?

This comment has been minimized.

Copy link
@hartig

hartig Jun 16, 2023

Author Contributor

@kasei Would it be less confusing for you with the change proposed by @TallTed ?

This comment has been minimized.

Copy link
@TallTed

TallTed Jun 19, 2023

Member

I'd have made that suggestion as a suggestion if this were part of a PR, but as it was committed directly, I'd have to create a PR just for that one word insertion, which may or may not be acceptable to @kasei. This is another point in favor of my previous requests that all changes of any significance or potential controversy be made via PR, not direct commit.

This comment has been minimized.

Copy link
@kasei

kasei Jun 19, 2023

Contributor

@TallTed isn't this commit part of #98? It wasn't committed directly to main, as far as I can tell.

This comment has been minimized.

Copy link
@hartig

hartig Jun 19, 2023

Author Contributor

Hold on. This commit was not committed directly (to main) but it was part of the corresponding PR #98 . I am totally with you that all changes should be made via PRs.

<li>Every element in Distinct(<var>Ψ</var>) is contained in <var>Ψ</var>.</li>

This comment has been minimized.

Copy link
@afs

afs Jun 17, 2023

Contributor

"Every element in Distinct occurs once" - the cardinality condition.

<li>Distinct(<var>Ψ</var>) is free of duplicates. That is, the element at the |i|-th position in Distinct(<var>Ψ</var>) is different from the element at the |j|-th position in Distinct(<var>Ψ</var>) for every two natural numbers |i| and |j| such that |i| &ne; |j|.

This comment has been minimized.

Copy link
@afs

afs Jun 17, 2023

Contributor

s/≠/is not the same term as/

"Not equal" can be read as a value condtion - SPARQL !=.

</li>
<li>For every two elements <var>e<sub>1</sub></var> and <var>e<sub>2</sub></var> in Distinct(<var>Ψ</var>), the relative order of their first occurrences in <var>Ψ</var> is preserved in Distinct(<var>Ψ</var>). That is, if <var>i<sub>1</sub></var>&nbsp;&lt;&nbsp;<var>i<sub>2</sub></var>, then <var>j<sub>1</sub></var>&nbsp;&lt;&nbsp;<var>j<sub>2</sub></var>, where

This comment has been minimized.

Copy link
@afs

afs Jun 17, 2023

Contributor

s/every/any/ because the text names the chosen as e1 and e2.

<ul>
<li><var>i<sub>1</sub></var> is the smallest natural number such that <var>e<sub>1</sub></var> is at the <var>i<sub>1</sub></var>-th position in <var>Ψ</var>,</li>
<li><var>i<sub>2</sub></var> is the smallest natural number such that <var>e<sub>2</sub></var> is at the <var>i<sub>2</sub></var>-th position in <var>Ψ</var>,</li>
<li><var>j<sub>1</sub></var> is the position of <var>e<sub>1</sub></var> in Distinct(<var>Ψ</var>), and</li>
<li><var>j<sub>2</sub></var> is the position of <var>e<sub>2</sub></var> in Distinct(<var>Ψ</var>).</li>
</ul>
</li>
</ol>
</div>
<div class="defn">
<p><b>Definition: <span id="defn_algReduced">Reduced</span></b></p>
Expand Down

3 comments on commit ff80086

@afs
Copy link
Contributor

@afs afs commented on ff80086 Jun 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All modifiers work on a sequence, which may be ordered or not. project makes things complicated - it preserves the order in the sequence but the output sequence no longer necessarily has the terms used to produce the order. Carrying the order around by keeping a sequence deals with this. There is an errata for this: https://www.w3.org/2013/sparql-errata#errata-query-20 (#24) because, after projection, which element to keep in DISTINCT is not longer clear.

The order of Distinct(Ψ) must preserve any ordering given by OrderBy.

I think it is worth keeping the text "The order of Distinct(Ψ) must preserve any ordering given by OrderBy." because it says, declaratively, what is intent is. Similar text is also in the other modifiers; a sequence is used to preserve the order for all modifiers.

When there is no ordering of the input solutions sequence, then there is no requirement to keep the order. Accumulate in a hashset and then output the set would be a valid implementation.

So point 4 is only required when there is an order.

The other consistency point is that the definition does not state what the cardinality function is now. card is in other modifiers and calculated based on the card of the input.

@kasei
Copy link
Contributor

@kasei kasei commented on ff80086 Jun 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When there is no ordering of the input solutions sequence, then there is no requirement to keep the order. Accumulate in a hashset and then output the set would be a valid implementation.

I think that's true today, but I think it would complicate using the existing mechanisms to define things like ordered GROUP_CONCAT. Since there is no inherent means to tell if a sequence has been ordered (e.g. "given by OrderBy"), language that talks specifically about OrderBy would limit the ability of extensions to provide sorting by some other means and still rely upon sequence operations to preserve that order.

The other consistency point is that the definition does not state what the cardinality function is now. card is in other modifiers and calculated based on the card of the input.

The use of card on multistep makes sense to me, but I'm a bit baffled by its need for sequences. Surely the structure of a sequence implies the cardinalities? Am I right in thinking that the only use for all the definitions of card over sequences is to support the definition of ToMultiSet?

@afs
Copy link
Contributor

@afs afs commented on ff80086 Jun 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

distinct is a "solution modifier" (e.g. https://www.w3.org/TR/sparql11-query/#convertSolMod) - it's in the text in several places and is language that is reasonable to have been used outside the spec.

We can define a separate operations for distinct in aggregates.

Combing the two seems to make it more complicated to explain and there are differences of use e.g. in the presence of project.

Please sign in to comment.