下载虚拟机:https://pan.baidu.com/s/1KH1pWB01E4NCzqFuFdHJsQ?pwd=3aq7 提取码: 3aq7
安装文档:推荐环境使用教程
如果jupyter notebook 里面spark运行了一半把jupyter重启了,重新运行spark任务会报错:
22/05/10 17:48:15 WARN hive.metastore: Failed to connect to the MetaStore Server...
22/05/10 17:48:16 WARN hive.metastore: Failed to connect to the MetaStore Server...
22/05/10 17:48:17 WARN metadata.Hive: Failed to access metastore. This class should not accessed in runtime.
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
需要删除master节点的spark根目录下的metastore_db中的左右lck文件(缓存),然后重启spark,hive元数据。
jps
列举出来的 main
是 hbase-shell 的进程。RunJar
是hive的进程。sparksubmit
是spark的application进程,如果有父进程在(bash或者jupyter notebook)在,用ps -ef 查看可以看父进程的pid,需要杀掉父进程再杀子进程。spark.sparkContext.addPyFile("path/to/jar")
来临时加入jar包,但是要在spark环境创建时加入,因此需要重启jupyter,重建一个新的sparkapplication实例。