Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] A working prototype for a suggestions helper #4557

Open
wants to merge 24 commits into
base: dev
Choose a base branch
from

Conversation

ericvergnaud
Copy link
Contributor

This is an (experimental at this point) implementation of a generic suggestions helper.
Given a parsed input and a caret position, it returns a leaf context and expected tokens.
The intent is to make this available in Java and Typescript to start with.
Other implementations may follow if there is demand.

@seb314
Copy link

seb314 commented Mar 19, 2024

Thanks for the making this happen!

I played around with the test cases a bit. Here is a test case that does not currently work as I'd expect:

# contents of java-ambiguous-prefix-parse.yml
# hypothetical complete class A)
# input: "public class Foo { public static    boolean myAttribute = 42; }"
# hypothetical complete class B)
# input: "public class Foo { public static    class Inner {} }"
# actual partial input (suppose the user has not typed out the full class yet):
input: "public class Foo { public static    "
caret:
  line: 1
  column: 34
expected:
  - boolean # expected with context chain: PrimitiveTypeContext --parent--> TypeContext --parent--> FieldDeclarationContext ...
  - class   # expected with context chain: ClassDeclarationContext --parent--> MemberDeclarationContext ...

# list incomplete, but at least both of the above should be suggested


# currently, I get the following instead of the expected tokens and contexts:
# tokens: @, private, static, protected, public, final, abstract, strictfp
# context chain: ClassOrInterfaceModifierContext --parent--> ModifierContext --parent--> ...

Edit: In a previous version of this comment, I had the expected cases mixed up.

@ericvergnaud
Copy link
Contributor Author

ericvergnaud commented Mar 20, 2024

I made some changes, can you retry ?

@seb314
Copy link

seb314 commented Mar 21, 2024

The suggested tokens for the test case look good now!

Regarding the single parser context for all suggested tokens, I'm a bit puzzled why that works. Is this really always the case, or is the java grammar maybe deliberately written to avoid ambiguities where different parser contexts could occur?

So far, with the java grammar, I did not come up with a test case that demonstrates a clear need for different contexts. I'll have to try either with a toy grammar or with our internal grammar and see what happens there.

@ericvergnaud
Copy link
Contributor Author

Regarding the single parser context for all suggested tokens, I'm a bit puzzled why that works. Is this really always the case, or is the java grammar maybe deliberately written to avoid ambiguities where different parser contexts could occur?

TBH the expected tokens relate to a state, not to a context i.e. given a state, the set of expected tokens is constant and not context dependent.
That said, you might be right that as you walk up contexts, some tokens may no longer make sense. However I'm not sure how useful it could be to cater for that ? Can you provide an example where it matters ?

I'll be replicating the java code to JS/TS so you can play with it in your exact situation.

@ericvergnaud
Copy link
Contributor Author

@seb314 I've replicated the Java code to JS/TS. Example usage is in runtime/JavaScript/spec/suggestions/SuggestionHelperSpec.js

I haven't tried TypeScript. Feel free to yell if it's not working.

seb314 pushed a commit to seb314/antlr4 that referenced this pull request Mar 22, 2024
@seb314
Copy link

seb314 commented Mar 22, 2024

I tried to come up with an example where more than one parser context is consistent with the same input & cursor position:
seb314@118e126

Thanks for the JS version! I'll try that with our real grammar and see if I can find real-world examples.

EDIT: updated link to commit with fixed typo in commit message

@ericvergnaud
Copy link
Contributor Author

The TS build is broken, but has been for a couple of weeks... I'm looking into it, I don't think it relates to this change

@seb314
Copy link

seb314 commented Mar 22, 2024

The TS build is broken, but has been for a couple of weeks... I'm looking into it, I don't think it relates to this change

As long as the js is runnable I'll probably be able to just suppress the tsc warnings for initial testing

@ericvergnaud
Copy link
Contributor Author

It's fixed now

Signed-off-by: Eric Vergnaud <[email protected]>
@ericvergnaud
Copy link
Contributor Author

I mean now 🤣 (previous commit undid all the changes ...)

@seb314
Copy link

seb314 commented Mar 28, 2024

I've now tried the js version with our regular test cases a bit.

The suggested tokens look mostly good already (modulo a few unwanted suggestions that look like they are plausibly due to the different handling of semantic predicates. Those could probably be filtered out individually, if necessary).

What I'm not sure about at the moment:
Currently, getExpectedTokensAt provides the context that corresponds to the given input, without a suggested token appended. (Or inserted/replaced, if the given position is not at the end of the input, i guess).
Is there maybe an efficient way to find out the context or rule that corresponds to the input with the suggested token, for each suggested token? One option would be to actually create all those hypothetical inputs, and call the parser on each of them, but that is maybe a lot more inefficient than necessary?

@ericvergnaud
Copy link
Contributor Author

I've now tried the js version with our regular test cases a bit.

The suggested tokens look mostly good already (modulo a few unwanted suggestions that look like they are plausibly due to the different handling of semantic predicates. Those could probably be filtered out individually, if necessary).

What I'm not sure about at the moment: Currently, getExpectedTokensAt provides the context that corresponds to the given input, without a suggested token appended. (Or inserted/replaced, if the given position is not at the end of the input, i guess). Is there maybe an efficient way to find out the context or rule that corresponds to the input with the suggested token, for each suggested token? One option would be to actually create all those hypothetical inputs, and call the parser on each of them, but that is maybe a lot more inefficient than necessary?

so you'd want to associate each suggested token with the resulting context if that token were appended ?
I don't think that's possible at all because no such context will exist if the cursor is at the end of the input.
And if it's before the end, a context will only exist for the token currently inserted at or after the cursor (and the ones after that).
The contexts are instances of the actual parsing of the input, not of a potential one.

What might be possible is to provide the rule that the parser would enter. That's something the parser knows of without instantiating anything.
But I'd suggest first trying your proposed approach i.e. parse hypothetical inputs, and see how that performs (notably it might be done asynchronously to reduce initial latency).

@seb314
Copy link

seb314 commented Apr 7, 2024

@ericvergnaud thanks for the clarification! I'll try the approach with separate parsing and then report back (might take a couple of days until I have time to test this)

(Knowing the rule instead of the context should be sufficient to classify the suggestions. I'm not sure yet if it would also be sufficient for finding out the corresponding range of the input that gets replaced if the user accepts the corresponding suggestion, but maybe there is a way)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants