Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Kyuubi Connect #6642

Draft
wants to merge 9 commits into
base: master
Choose a base branch
from
Draft

[WIP] Kyuubi Connect #6642

wants to merge 9 commits into from

Conversation

pan3793
Copy link
Member

@pan3793 pan3793 commented Aug 23, 2024

This is a quick and dirty PR that implements the Spark Connect (gRPC-based) protocol, which allows you to use pyspark or spark-shell with remote sc://<host>:<port> mode to connect Kyuubi and run Spark queries.

DISCLAIMER: this patch is not ready for review, and I know it breaks the existing functionalities, which will be fixed later.

For explorers who want to try this early version:

Source Code under development

git clone https://github.com/pan3793/kyuubi.git -b kyuubi-next kyuubi-next

Requirements

  • JAVA_HOME points to Java 17
  • SPARK_HOME points to /path/of/spark-4.0.0-preview2-bin-hadoop3

Run

Run within IDEA

  • Run build/mvn clean install -DskipTests to build the project and produce the Spark engine jar
  • Run kyuubi-server/src/main/scala/org/apache/kyuubi/server/KyuubiServer.scala using IDEA

You can set SPARK_HOME, KYUUBI_CONF_DIR in the Run/Debug Configuration Dialog

idea_run_debug_configurations_dialog

Run within Terminal

build/dist
cd dist
bin/kyuubi run --conf kyuubi.frontend.grpc.bind.port=10999

The gRPC service listens 10999 by default.

Connect to Kyuubi Connect

Spark Connect Scala client (Requires: Java 17, Spark 4.0.0-preview2)

cd /path/of/spark-4.0.0-preview2-bin-hadoop3
bin/spark-shell --remote sc://H27212-MAC-01.local:10999 --user_id chengpan --user_name chengpan

Or using PySpark Connect client (Requires: Python >=3.9)

pip install pyspark-connect==4.0.0.dev1
pyspark --remote sc://H27212-MAC-01.local:10999 --user_id chengpan --user_name chengpan

Run examples

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 4.0.0-preview2
      /_/
Type in expressions to have them evaluated.
Spark session available as 'spark'.
scala> spark.sql("select 1").show()
+---+
|  1|
+---+
|  1|
+---+

Checklist 📝

Be nice. Be informative.

}

// FIXME this is dummy implementation to discard any uploaded artifacts
override def addArtifacts(respObserver: StreamObserver[proto.AddArtifactsResponse])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the only unimplemented method, so uploading artifacts does not work now.

it must be fixed before moving on to the next step.

* @param spark A [[SparkSession]] instance
* that this backend service holds to run [[org.apache.kyuubi.operation.Operation]]s.
*/
class SparkGrpcBackendService(name: String, spark: SparkSession)
Copy link
Contributor

@tigrulya-exe tigrulya-exe Sep 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand correctly, that this class is not used at all and we just start the SparkConnect gRPC server in the SparkGrpcFrontendService?

@nqvuong1998
Copy link

nqvuong1998 commented Sep 15, 2024

Hi @yaooqinn @pan3793 @tigrulya-exe , Will Kyuubi Connect support AuthN (LDAP) and AuthZ (Apache Ranger)?

@github-actions github-actions bot added kind:infra license, community building, project builds, asf infra related, etc. module:extensions labels Sep 17, 2024
@HaoYang670
Copy link
Contributor

Hi @pan3793 What is the protocol between Kyuubi server and KyuubiGrpcEngine, it is the Spark connect protocol or else?
And does this feature rely on Spark connect server? Can a user connect to Spark's 15002 port directly?

@HaoYang670
Copy link
Contributor

Does it support running on k8s? How is the Spark app created? As far as I know Spark connect server only support creating in client mode, so it must be different from how we create Kyuubi Spark SQL engine, right?

@zhaohehuhu
Copy link
Contributor

Awesome. I'd like to take a shot at this feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:build kind:documentation Documentation is a feature! kind:infra license, community building, project builds, asf infra related, etc. module:common module:extensions module:server module:spark
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants