，模块读取和准备数据

我jupyter笔记本，读我创建了一个模块，在Jupyter我添加，模块读取和准备数据

sc.addPyFile('wasb:///HdiNotebooks/PySpark/project/read_test_data.py')

和它加载模块OK，

然而，我的 “PY” 的文件，打开数据：

data_file= open('wasb:///example/data/fruits.txt', 'rU') 
to prepare it and do different calculations.

不过，我得到以下错误

[Errno 2] No such file or directory: 'wasb:///example/data/fruits.txt' 
Traceback (most recent call last): 
FileNotFoundError: [Errno 2] No such file or directory: 'wasb:///example/data/fruits.txt'

如果我试图创建与jupyter相同数据的数据帧我跑

df=sqlContext.read.csv('wasb:///example/data/fruits.txt',header='true', inferSchema='true')

我没得到任何错误。我做错了什么？

来源

2017-08-10 Learner

Python API open不支持基于Azure Blob存储的HDInsight DFS的协议。

如果你想直接读取，而不pyspark上HDInsight文件，唯一的办法就是使用Azure存储SDK的Python用做account_name Azure的Blob存储帐户的& account_key为HDInsight像document说了，请参考Python中用于Azure存储的官方tutorial。

希望它有帮助。

来源

2017-08-14 08:09:35

，模块读取和准备数据

回答

相关问题