PySpark SparkContext名称jupyter中的'sc'错误

我是pyspark的新用户，并且想在我的Ubuntu 12.04机器中使用Ipython笔记本使用pyspark。以下是pyspark和Ipython笔记本的配置。PySpark SparkContext名称jupyter中的'sc'错误

[email protected]:~$ echo $JAVA_HOME 
/usr/lib/jvm/java-8-oracle 

# Path for Spark 
[email protected]:~$ ls /home/sparkuser/spark/ 
bin CHANGES.txt data examples LICENSE NOTICE R   RELEASE scala-2.11.6.deb 
build conf   ec2 lib  licenses python README.md sbin  spark-1.5.2-bin-hadoop2.6.tgz

我安装Anaconda2 4.0.0和路径水蟒：

[email protected]:~$ ls anaconda2/ 
bin conda-meta envs etc Examples imports include lib LICENSE.txt mkspecs pkgs plugins share ssl tests

用于IPython中创建PySpark档案。

ipython profile create pyspark 

[email protected]:~$ cat .bashrc 

export SPARK_HOME="$HOME/spark" 
export PYSPARK_SUBMIT_ARGS="--master local[2]" 
# added by Anaconda2 4.0.0 installer 
export PATH="/home/sparkuser/anaconda2/bin:$PATH"

创建一个名为〜/ .ipython/profile_pyspark /启动/ 00-pyspark-setup.py：

[email protected]:~$ cat .ipython/profile_pyspark/startup/00-pyspark-setup.py 
import os 
import sys 

spark_home = os.environ.get('SPARK_HOME', None) 
sys.path.insert(0, spark_home + "/python") 
sys.path.insert(0, os.path.join(spark_home, 'python/lib/py4j-0.8.2.1-src.zip')) 

filename = os.path.join(spark_home, 'python/pyspark/shell.py') 
exec(compile(open(filename, "rb").read(), filename, 'exec')) 

spark_release_file = spark_home + "/RELEASE" 

if os.path.exists(spark_release_file) and "Spark 1.5.2" in open(spark_release_file).read(): 
    pyspark_submit_args = os.environ.get("PYSPARK_SUBMIT_ARGS", "") 
    if not "pyspark-shell" in pyspark_submit_args: 
     pyspark_submit_args += " pyspark-shell" 
     os.environ["PYSPARK_SUBMIT_ARGS"] = pyspark_submit_args

登录到pyspark终端：

[email protected]:~$ ~/spark/bin/pyspark 
Python 2.7.11 |Anaconda 4.0.0 (64-bit)| (default, Dec 6 2015, 18:08:32) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2 
Type "help", "copyright", "credits" or "license" for more information. 
Anaconda is brought to you by Continuum Analytics. 
Please check out: http://continuum.io/thanks and https://anaconda.org 
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 
16/04/22 21:06:55 INFO SparkContext: Running Spark version 1.5.2 
16/04/22 21:07:27 INFO BlockManagerMaster: Registered BlockManager 
Welcome to 
     ____    __ 
    /__/__ ___ _____/ /__ 
    _\ \/ _ \/ _ `/ __/ '_/ 
    /__/.__/\_,_/_/ /_/\_\ version 1.5.2 
     /_/ 

Using Python version 2.7.11 (default, Dec 6 2015 18:08:32) 
SparkContext available as sc, HiveContext available as sqlContext. 
>>> sc 
<pyspark.context.SparkContext object at 0x7facb75b50d0> 
>>>

当我运行以下命令，打开juypter浏览器

[email protected]:~$ ipython notebook --profile=pyspark 
[TerminalIPythonApp] WARNING | Subcommand `ipython notebook` is deprecated and will be removed in future versions. 
[TerminalIPythonApp] WARNING | You likely want to use `jupyter notebook`... continue in 5 sec. Press Ctrl-C to quit now. 
[W 21:32:08.070 NotebookApp] Unrecognized alias: '--profile=pyspark', it will probably have no effect. 
[I 21:32:08.111 NotebookApp] Serving notebooks from local directory: /home/sparkuser 
[I 21:32:08.111 NotebookApp] 0 active kernels 
[I 21:32:08.111 NotebookApp] The Jupyter Notebook is running at: http://localhost:8888/ 
[I 21:32:08.111 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation). 
Created new window in existing browser session.

在浏览器中，如果我键入以下命令，它将引发NameError。

In [ ]: print sc 
--------------------------------------------------------------------------- 
NameError         Traceback (most recent call last) 
<ipython-input-2-ee8101b8fe58> in <module>() 
----> 1 print sc 
NameError: name 'sc' is not defined

当运行在pyspark终端上面的命令，则输出所需要的输出，但是当我在jupyter运行相同的命令它是抛出上述错误。

以上是pyspark和Ipython的配置设置。如何用jupyter配置pyspark？

来源

2016-04-22 Wanderer

这里是一个解决办法，我建议你去尝试，而不依赖于pyspark加载方面为您提供： -

从

!pip install findspark

然后，只需输入安装finspark Python包和初始化sparkcontext ： -

import findspark 
import os 

findspark.init() 

import pyspark 

sc = pyspark.SparkContext()

参考：https://pypi.python.org/pypi/findspark

来源

2016-04-22 17:46:31

嗨，你需要有一个pyspark内核尝试在终端：

mkdir -p ~/.ipython/kernels/pyspark 

nano ~/.ipython/kernels/pyspark/kernel.json

复制以下文字：

{ 'display_name': 'pySpark (Spark 1.6.1)', 
'language': 'python', 
'argv': [ 
    '/usr/bin/python', // Your python Path 
    '-m', 'IPython.kernel', 
    '--profile=pyspark', 
    '-f', 
    '{connection_file}' 
] }

并保存（CTR + X，Y）

你应该现在在你的jupyter内核中有“pyspark”。

现在要么SC已经存在于你的笔记本其他（尝试调用SC在细胞），否则尝试运行这些行：

import pyspark 
conf = (pyspark.SparkConf().setAppName('test').set("spark.executor.memory", "2g").setMaster("local[2]")) 
sc = pyspark.SparkContext(conf=conf)

现在你应该有你的SC运行

来源

2016-05-04 14:41:26 Romain

简单建议将不会使pyspark安装复杂化。

使用版本> 2.2，您可以执行简单的pip install pyspark安装pyspark软件包。
此外，如果您还想安装jupyter，请为jupyter另外安装pip。 pip install pyspark pip install jupyter

另外，如果你想使用其他版本或火花特定的分布，早期3 minute方法是： https://blog.sicara.com/get-started-pyspark-jupyter-guide-tutorial-ae2fe84f594f

来源

2018-01-07 22:16:33

PySpark SparkContext名称jupyter中的'sc'错误

回答

相关问题