我想连接2个基于df1.portfolio名称的数据帧到df2.portId 生成的数据帧我不想重复相同的密钥。使用左外连接的火花不能加入数据帧
这里是我到目前为止的代码
val df = spark.read.json("C:\\json\\portmast")
val pgetsec = spark.read.json("C:\\json\\pgetsec")
val portfolio_master = df.select("PortfolioCode","Legal Entity Name","Asofdate")
val pgetsecs= pgetsec.select("TransId", "SecId","portId","GaapCurBkBal","ParBal","SetlDt","SetlPric","OrgBkBal","TradeDt","StatCurBkBal","NaicRtg","SecurityTypeCode","CamraSecType","FundType","CountryIso")
val pg = portfolio_master.join(pgetsec,Seq("PortfolioCode","portId"),"left_outer")
我得到的错误是
Exception in thread "main" org.apache.spark.sql.AnalysisException: using columns ['PortfolioCode,'portId] can not be resolved given input columns:
最终JSON应该是这样的
|-- Portfolio Code: string (nullable = true)
|-- Legal Entity Name: string (nullable = true)
|-- Asofdate: string (nullable = true)
((SI, S&P 500 Index,9/30/2016),[0.0,Equity,Common Stock])
((SI, S&P 500 Index,9/30/2016),[0.0,Equity,Common Stock])
((SI, S&P 500 Index,9/30/2016),[0.0,Equity,Common Stock])
[SI1, S&P 500 Index,9/30/2016,CompactBuffer([0.0,Equity,Common Stock], [0.0,Equity,Common Stock], [0.0,Equity,Common Stock])]
root
|-- Portfolio Code: string (nullable = true)
|-- Legal Entity Name: string (nullable = true)
|-- Asofdate: string (nullable = true)
|-- Security: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- BondPrice: double (nullable = true)
| | |-- CoreSectorLevel1Code: string (nullable = true)
| | |-- CoreSectorLevel2Code: string (nullable = true)
+--------------+-------------------+---------+--------------------+
|Portfolio Code| Legal Entity Name| Asofdate| Security|
+--------------+-------------------+---------+--------------------+
| SI | S&P 500 Index |9/30/2016|[[0.0,Equity,Comm...|
+--------------+-------------------+---------+--------------------+
任何帮助表示赞赏。
您尝试加入的列在第二个DataFrame中不存在? – eliasah
两者都是Json文件,我在文件1中读取它称为portfolioCode,在第二个Json文件中称为portId。键背后的数据是相同的我想要做这样的事情选择p.portfoliocode,...,ps.secid,ps.Transid,... from portfolio_master p在p.portfoliocode上添加pgetsec ps = ps.portid – user2315840