Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge geo.location.lon and geo.location.lat #1638

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

gregkalapos
Copy link
Member

Changes

#1116 added geo fields. We started implementing support for this, and we noticed that for backends that have a dedicated geo point data type, having 2 distinct attributes (geo.location.lon and geo.location.lat) is very inconvenient. These two fields clearly belong together.

Backends that have a specialized geo point type could still merge those with some tricks, but that does not feel right. On the other hand, if a backend does not have specific support for such a field, still can store it as double[].

Therefore, this PR proposes to merge those 2 attributes into a double[].

cc @felixbarny @carsonip

Alternatives

  • Keep as is
  • Introduce a specific type for this. That one feels an overkill to me. It'd also be hard to implement for some backends.

Merge requirement checklist

@gregkalapos
Copy link
Member Author

Violation: Attribute 'geo.location' name is used as a namespace in the following attributes '["geo.location.lat", "geo.location.lon"]'.
  - Category         : naming_collision
  - Type             : semconv_attribute
  - SemConv group    : 
  - SemConv attribute: geo.location
  - Provenance: /home/weaver/source

This seems like a blocker for such PRs :( Is this maybe just an unwanted side-effect of this policy?

geo.location.lat and geo.location.lon are now deprecated, so we'd not use 'geo.location' as a namespace in those. These 3 would not exist at the same time.

@gregkalapos gregkalapos marked this pull request as ready for review November 29, 2024 19:13
@gregkalapos gregkalapos requested review from a team as code owners November 29, 2024 19:13
model/geo/registry.yaml Outdated Show resolved Hide resolved
@felixbarny
Copy link

Introduce a specific type for this. That one feels an overkill to me. It'd also be hard to implement for some backends.

An option that doesn't require a new type in OTLP (which would introduce compatibility issues) is that we add an optional type hint for attributes in KeyValue. That enables backends with support for geo points to handle these attributes in a special way. It adds semantics to the attribute, for example, for attributes with the type hint geo_point the expectation is that attributes are an array of double values where the first element represents lon, and the second lat. The benefit of the type hit would be that backends don't need explicit and up-to-date knowledge of which attributes are defined as geo points in semantic conventions and that custom attributes can also be mapped to geo points.

Example:

{
  "attributes": [
    {
      "key": "geo.location",
      "typeHint": "geo_point",
      "value": {
        "arrayValue": {
          "values": [
            {
              "doubleValue": -71.34
            },
            {
              "doubleValue": 41.12
            }
          ]
        }
      }
    }
  ]
}

Type hints could be useful for other types as well, for example, IP addresses that are currently represented in string-typed attributes.

Type hints make attributes more self-describing, while still preserving compatibility with backends that don't have native support for these more specialized types.

@gregkalapos
Copy link
Member Author

Violation: Attribute 'geo.location' name is used as a namespace in the following attributes '["geo.location.lat", "geo.location.lon"]'.
  - Category         : naming_collision
  - Type             : semconv_attribute
  - SemConv group    : 
  - SemConv attribute: geo.location
  - Provenance: /home/weaver/source

This seems like a blocker for such PRs :( Is this maybe just an unwanted side-effect of this policy?

geo.location.lat and geo.location.lon are now deprecated, so we'd not use 'geo.location' as a namespace in those. These 3 would not exist at the same time.

This was discussed in today's SIG - there was agreement that something like this is acceptable. The idea to completely drop the policy came up as well, that may not happen, but there was no push back to allow this situation when the collision only happens between new and deprecated fields.

@SylvainJuge
Copy link
Contributor

@gregkalapos this policy have been relaxed for deprecated attributes with #1642, which has already been merged, so updating this PR from main should allow to unblock it.

@jsuereth
Copy link
Contributor

jsuereth commented Dec 3, 2024

This PR is effectively the reason/rationale I want to see a new "type" concept in Semantic conventions (see discussion around embed here: #1264).

Specifically, there can be different Geo location coordinate reference systems. If we limit to lat/lon AND erase the names, I think this is just a foot-gun. particular the discussion around whether it's [lon,lat] or [lat,lon].

I think we should hold off on this PR and think about how we, generically, want to handle the issue of "grouped" attributes. While we could do something quick for lat/lon, we're not in a hurry here, and I think we could preserve geo.location.{lat|lon} while also providing a "uniform consumption" experience as motivated this PR.

cc @open-telemetry/specs-semconv-maintainers @trisch-me

@lmolkova
Copy link
Contributor

lmolkova commented Dec 4, 2024

There are similar problems with other types/namespaces (e.g. user/identity, address+port) and we can't solve them all with arrays.
Arrays would be hard to use in custom queries, dashboards, etc.

I'm really curious about

having 2 distinct attributes (geo.location.lon and geo.location.lat) is very inconvenient.

and would like to understand more - what was inconvenient? In which scenarios? Inconvenient for the end users or backend implementation? How is it different from client.address + client.port that are presumably consumed together all the time.

@gregkalapos
Copy link
Member Author

The difference between user/identity, address+port compared to geo is that backends may have a dedicated type for geo.

what was inconvenient? In which scenarios?

This is the specific PR in the exporter: open-telemetry/opentelemetry-collector-contrib#36594 So we could not just take the 2 fields and store them as is, but needed to check if both are present (and e.g. not only one) and then turned those two into a format that is accepted by the backend as the dedicated geo type. That's basically all scenarios for geo fields.

Inconvenient for the end users or backend implementation?

Prior to that PR it was inconvenient for users because geo fields were not stored with the native geopoint type. One feature is e.g. that users can visualize these on a map - things like that were not possible. With the PR we fixed that, but with that, we start treating these 2 fields differently than other fields.

How is it different from client.address + client.port that are presumably consumed together all the time.

Ultimately, in my view, the difference is that we have a dedicated type for geo, but not for the other examples. Of course not all backends will have a dedicated geo type, but in this example we do have, and I'd not be surprised if there would be more such backends.

Now, having said all that, I think the solution in PR open-telemetry/opentelemetry-collector-contrib#36594 is not that bad - it's far from unacceptable and it does the job. So we kind of went with that and accepted this inconvenience already. And I also share the other concerns regarding double[].

Yet, I still think this situation is different compared to the other examples here and the connection between these 2 fields is much stronger, given the fact that backends may store them together is a dedicated type.

@lmolkova
Copy link
Contributor

thanks for the context @gregkalapos, should we close this one? We can continue the discussion in #1669 which should address this concern.

Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Dec 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Development

Successfully merging this pull request may close these issues.

6 participants