我有csv数据并使用read_csv创建了pnadas数据帧并强制所有列作为字符串。 然后,当我尝试从熊猫数据框创建火花数据帧时,我收到下面的错误消息。pandas数据帧触发数据帧“无法合并类型错误”
from pyspark import SparkContext
from pyspark.sql import SQLContext
from pyspark.sql.types import *
z=pd.read_csv("mydata.csv", dtype=str)
z.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 74044003 entries, 0 to 74044002
Data columns (total 12 columns):
primaryid object
event_dt object
age object
age_cod object
age_grp object
sex object
occr_country object
drug_seq object
drugname object
route object
outc_cod object
pt object
q= sqlContext.createDataFrame(z)
File "<stdin>", line 1, in <module>
File "/usr/hdp/2.4.2.0-258/spark/python/pyspark/sql/context.py", line 425, in createDataFrame
rdd, schema = self._createFromLocal(data, schema)
File "/usr/hdp/2.4.2.0-258/spark/python/pyspark/sql/context.py", line 341, in _createFromLocal
struct = self._inferSchemaFromList(data)
File "/usr/hdp/2.4.2.0-258/spark/python/pyspark/sql/context.py", line 241, in _inferSchemaFromList
schema = reduce(_merge_type, map(_infer_schema, data))
File "/usr/hdp/2.4.2.0-258/spark/python/pyspark/sql/types.py", line 862, in _merge_type
for f in a.fields]
File "/usr/hdp/2.4.2.0-258/spark/python/pyspark/sql/types.py", line 856, in _merge_type
raise TypeError("Can not merge type %s and %s" % (type(a), type(b)))
TypeError: Can not merge type <class 'pyspark.sql.types.DoubleType'> and <class 'pyspark.sql.types.StringType'>
这里是一个例子。我正在下载公共数据并创建pandas数据框,但spark并不会从pandas数据框中创建spark数据框。
import pandas as pd
from pyspark import SparkContext
from pyspark.sql import SQLContext
from pyspark.sql.types import *
url ="http://www.nber.org/fda/faers/2016/demo2016q1.csv.zip"
import requests, zipfile, StringIO
r = requests.get(url, stream=True)
z = zipfile.ZipFile(StringIO.StringIO(r.content))
z.extractall()
z=pd.read_csv("demo2016q1.csv") # creates pandas dataframe
Data_Frame = sqlContext.createDataFrame(z)
一)为什么你读本地数据只是并行。这是反模式。 b)被标记为“object”的所有列都会显示一些Spark DataFrames不支持的异构数据。 – zero323
你是对的,这是不正确的方式来本地阅读,但由于其他选项失败我希望来自熊猫的数据框将很容易火花处理。正如你所说,这些列是异构的。有没有可以尝试的解决方法? –
你能提供[mcve]吗?一些玩具样品,将说明那里正在发生什么... – zero323