-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
- Loading branch information
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9368,10 +9368,22 @@ <h3>SPARQL Algebra</h3> | |
</div> | ||
<div class="defn"> | ||
<p><b>Definition: <span id="defn_algDistinct">Distinct</span></b></p> | ||
<p>Let Ψ be a sequence of elements which may be either solution mappings or lists of RDF terms. We define:</p> | ||
<p>Distinct(Ψ) = [ e | e in Ψ ]</p> | ||
<p>card[Distinct(Ψ)](e) = 1</p> | ||
<p>The order of Distinct(Ψ) must preserve any ordering given by OrderBy (if any).</p> | ||
<p>Let <var>Ψ</var> be a sequence of elements which may be either solution mappings or lists of RDF terms.</p> | ||
This comment has been minimized.
Sorry, something went wrong.
This comment has been minimized.
Sorry, something went wrong. |
||
<p>Distinct(<var>Ψ</var>) is a sequence of elements that has the following properties.</p> | ||
<ol> | ||
<li>Every element in <var>Ψ</var> is contained in Distinct(<var>Ψ</var>).</li> | ||
This comment has been minimized.
Sorry, something went wrong.
kasei
Contributor
|
||
<li>Every element in Distinct(<var>Ψ</var>) is contained in <var>Ψ</var>.</li> | ||
This comment has been minimized.
Sorry, something went wrong. |
||
<li>Distinct(<var>Ψ</var>) is free of duplicates. That is, the element at the |i|-th position in Distinct(<var>Ψ</var>) is different from the element at the |j|-th position in Distinct(<var>Ψ</var>) for every two natural numbers |i| and |j| such that |i| ≠ |j|. | ||
This comment has been minimized.
Sorry, something went wrong.
afs
Contributor
|
||
</li> | ||
<li>For every two elements <var>e<sub>1</sub></var> and <var>e<sub>2</sub></var> in Distinct(<var>Ψ</var>), the relative order of their first occurrences in <var>Ψ</var> is preserved in Distinct(<var>Ψ</var>). That is, if <var>i<sub>1</sub></var> < <var>i<sub>2</sub></var>, then <var>j<sub>1</sub></var> < <var>j<sub>2</sub></var>, where | ||
This comment has been minimized.
Sorry, something went wrong. |
||
<ul> | ||
<li><var>i<sub>1</sub></var> is the smallest natural number such that <var>e<sub>1</sub></var> is at the <var>i<sub>1</sub></var>-th position in <var>Ψ</var>,</li> | ||
<li><var>i<sub>2</sub></var> is the smallest natural number such that <var>e<sub>2</sub></var> is at the <var>i<sub>2</sub></var>-th position in <var>Ψ</var>,</li> | ||
<li><var>j<sub>1</sub></var> is the position of <var>e<sub>1</sub></var> in Distinct(<var>Ψ</var>), and</li> | ||
<li><var>j<sub>2</sub></var> is the position of <var>e<sub>2</sub></var> in Distinct(<var>Ψ</var>).</li> | ||
</ul> | ||
</li> | ||
</ol> | ||
</div> | ||
<div class="defn"> | ||
<p><b>Definition: <span id="defn_algReduced">Reduced</span></b></p> | ||
|
3 comments
on commit ff80086
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All modifiers work on a sequence, which may be ordered or not. project
makes things complicated - it preserves the order in the sequence but the output sequence no longer necessarily has the terms used to produce the order. Carrying the order around by keeping a sequence deals with this. There is an errata for this: https://www.w3.org/2013/sparql-errata#errata-query-20 (#24) because, after projection, which element to keep in DISTINCT is not longer clear.
The order of Distinct(Ψ) must preserve any ordering given by OrderBy.
I think it is worth keeping the text "The order of Distinct(Ψ) must preserve any ordering given by OrderBy." because it says, declaratively, what is intent is. Similar text is also in the other modifiers; a sequence is used to preserve the order for all modifiers.
When there is no ordering of the input solutions sequence, then there is no requirement to keep the order. Accumulate in a hashset and then output the set would be a valid implementation.
So point 4 is only required when there is an order.
The other consistency point is that the definition does not state what the cardinality function is now. card
is in other modifiers and calculated based on the card
of the input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When there is no ordering of the input solutions sequence, then there is no requirement to keep the order. Accumulate in a hashset and then output the set would be a valid implementation.
I think that's true today, but I think it would complicate using the existing mechanisms to define things like ordered GROUP_CONCAT
. Since there is no inherent means to tell if a sequence has been ordered (e.g. "given by OrderBy
"), language that talks specifically about OrderBy
would limit the ability of extensions to provide sorting by some other means and still rely upon sequence operations to preserve that order.
The other consistency point is that the definition does not state what the cardinality function is now. card is in other modifiers and calculated based on the card of the input.
The use of card
on multistep makes sense to me, but I'm a bit baffled by its need for sequences. Surely the structure of a sequence implies the cardinalities? Am I right in thinking that the only use for all the definitions of card
over sequences is to support the definition of ToMultiSet
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
distinct
is a "solution modifier" (e.g. https://www.w3.org/TR/sparql11-query/#convertSolMod) - it's in the text in several places and is language that is reasonable to have been used outside the spec.
We can define a separate operations for distinct in aggregates.
Combing the two seems to make it more complicated to explain and there are differences of use e.g. in the presence of project
.
Where does the "either ... or lists of RDF terms" come from? It's a sequence of solution mappings (rows).