2016-09-17 549 views
1

我尝试加载本地文件,如下加载本地文件中sc.textFile

File = sc.textFile('file:///D:/Python/files/tit.csv') 
File.count() 

完全回溯

IllegalArgumentException     Traceback (most recent call last) 
<ipython-input-72-a84ae28a29dc> in <module>() 
----> 1 File.count() 

/databricks/spark/python/pyspark/rdd.pyc in count(self) 
    1002   3 
    1003   """ 
-> 1004   return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum() 
    1005 
    1006  def stats(self): 

/databricks/spark/python/pyspark/rdd.pyc in sum(self) 
    993   6.0 
    994   """ 
--> 995   return self.mapPartitions(lambda x: [sum(x)]).fold(0, operator.add) 
    996 
    997  def count(self): 

/databricks/spark/python/pyspark/rdd.pyc in fold(self, zeroValue, op) 
    867   # zeroValue provided to each partition is unique from the one provided 
868   # to the final reduce call 
--> 869   vals = self.mapPartitions(func).collect() 
    870   return reduce(op, vals, zeroValue) 
    871 

/databricks/spark/python/pyspark/rdd.pyc in collect(self) 
    769   """ 
    770   with SCCallSiteSync(self.context) as css: 
--> 771    port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd()) 
    772   return list(_load_from_socket(port, self._jrdd_deserializer)) 
    773 

/databricks/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py in __call__(self, *args) 
    811   answer = self.gateway_client.send_command(command) 
    812   return_value = get_return_value(
--> 813    answer, self.gateway_client, self.target_id, self.name) 
    814 
    815   for temp_arg in temp_args: 

/databricks/spark/python/pyspark/sql/utils.pyc in deco(*a, **kw) 
    51     raise AnalysisException(s.split(': ', 1)[1], stackTrace) 
    52    if s.startswith('java.lang.IllegalArgumentException: '): 
---> 53     raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace) 
    54    raise 
    55  return deco 

IllegalArgumentException: u'java.net.URISyntaxException: Expected scheme-specific part at index 2: D:' 

有什么不对?我通常的方式 例如 load a local file to spark using sc.textFile()How to load local file in sc.textFile, instead of HDFS 这些examles是斯卡拉但是对于蟒蛇是,如果我不介意

val File = 'D:\\\Python\\files\\tit.csv' 


SyntaxError: invalid syntax 
    File "<ipython-input-132-2a3878e0290d>", line 1 
    val File = 'D:\\\Python\\files\\tit.csv' 
     ^
SyntaxError: invalid syntax 
+0

您是否尝试过'textFile('D:/'或者因为您在Windows上而使用了反斜杠? –

+0

虽然,看到'/ databricks/spark /'让我觉得你根本不使用Windows机器,而是一些Databricks平台 –

+0

我在Windows上我试过sc.textFile('file:// D: /Python/files/tit.csv')&sc.textFile('file:/ D:/Python/files/tit.csv')&sc.textFile('D:/Python/files/tit.csv') – Edward

回答

1

更新THR同样的方式: 似乎有与问题 “:” 在Hadoop中......

filenames with ':' colon throws java.lang.IllegalArgumentException 

https://issues.apache.org/jira/browse/HDFS-13

Path should handle all characters 

https://issues.apache.org/jira/browse/HADOOP-3257

在这个问答&一个人管理与火花来克服它2.0

Spark 2.0: Relative path in absolute URI (spark-warehouse)


中有问题的几个问题:

1)蟒蛇访问本地文件的窗口

File = sc.textFile('file:///D:/Python/files/tit.csv') 
File.count() 

能不能请你:

import os 
inputfile = sc.textFile(os.path.normpath("file://D:/Python/files/tit.csv")) 
inputfile.count() 

os.path.normpath(路径)

被倒塌的冗余分离正常化路径和A/B,A/B /,A/./ B和A/foo /../ B都成为A/B。此字符串操作可能会更改包含符号链接的路径的含义。在Windows上,它将正斜杠转换为反斜杠。要标准化大小写,请使用normcase()。

https://docs.python.org/2/library/os.path.html#os.path.normpath

的输出是:在python测试

>>> os.path.normpath("file://D:/Python/files/tit.csv") 
'file:\\D:\\Python\\files\\tit.csv' 

2)阶代码:

val File = 'D:\\\Python\\files\\tit.csv' 
SyntaxError: invalid syntax 

此代码不会在python运行,因为它是阶代码。

+0

是的,我看到斯卡拉 – Edward

+0

我已阅读有关错误( 及无法找到如何改善它 – Edward

0

我做

import os 
os.path.normpath("file:///D:/Python/files/tit.csv") 
Out[131]: 'file:/D:/Python/files/tit.csv' 

然后

inputfile = sc.textFile(os.path.normpath("file:/D:/Python/files/tit.csv")) 
inputfile.count() 
IllegalArgumentException: u'java.net.URISyntaxException: Expected scheme-specific part at index 2: D:' 

,如果我不喜欢这样

inputfile = sc.textFile(os.path.normpath("file:\\D:\\Python\\files\\tit.csv")) 
inputfile.count() 
IllegalArgumentException: u'java.net.URISyntaxException: Relative path in absolute URI: file:%5CD:%5CPython%5Cfiles%5Ctit.csv' 

,我也喜欢这个

os.path.normcase("file:///D:/Python/files/tit.csv") 
Out[136]: 'file:///D:/Python/files/tit.csv' 
inputfile = sc.textFile(os.path.normpath("file:///D:/Python/files/tit.csv")) 
inputfile.count() 
IllegalArgumentException: u'java.net.URISyntaxException: Expected scheme-specific part at index 2: D:' 
+1

它可能是一个全球性问题 - http://stackoverflow.com/questions/38669206/火花2-0-相对路径在绝对-URI-火花仓库 – Yaron