2015-02-07 239 views
2

转换JSON数据转化为具体的表格式我有JSON文件已采用以下格式:使用猪

"Properties2":[{"K":"A","T":"String","V":"M "}, {"K":"B","T":"String","V":"N"}, {"K":"D","T":"String","V":"O"}] 
"Properties2":[{"K":"A","T":"String","V":"W”"},{"K":"B","T":"String","V":"X"},{"K":"C","T":"String","V":"Y"},{"K":"D","T":"String","V":"Z"}] 

我想提取表格式的数据从上面用猪提到的JSON格式:

预期格式: enter image description here

注意: - 在第一条记录中,C列应该为空或为空,因为在第一条记录中C列没有值。

我试着用jsonloader和eliphantbird jar但没有得到预期的输出,请建议我任何适当的方法来获得预期的输出。

回答

1

你可以试试这个自定义UDF吗?

样品INPUT1:
input.json

{"Properties2":[{"K":"A","T":"String","V":"M "}, {"K":"B","T":"String","V":"N"}, {"K":"D","T":"String","V":"O"}]} 
{"Properties2":[{"K":"A","T":"String","V":"W"},{"K":"B","T":"String","V":"X"},{"K":"C","T":"String","V":"Y"},{"K":"D","T":"String","V":"Z"}]} 

PigScript:

REGISTER jsonparse.jar 
A= LOAD 'input.json' Using JsonLoader('Properties2:{(K:chararray,T:chararray,V:chararray)}'); 
B= FOREACH A GENERATE FLATTEN(STRSPLIT(mypackage.JSONPARSE(BagToString(Properties2)),'_',4)); 
STORE B INTO 'output' USING PigStorage(); 

输出:

M  N    O 
W  X  Y  Z 

样品输入2:

{"Properties2":[{"K":"A","T":"String","V":"W"},{"K":"B","T":"String","V":"X"},{"K":"C","T":"String","V":"Y"},{"K":"D","T":"String","V":"Z"}]} 
{"Properties2":[{"K":"A","T":"String","V":"M"},{"K":"B","T":"String","V":"N"},{"K":"D","T":"String","V":"O"}]} 
{"Properties2":[{"K":"A","T":"String","V":"J"}]} 
{"Properties2":[{"K":"B","T":"String","V":"X"}]} 
{"Properties2":[{"K":"C","T":"String","V":"Y"}]} 
{"Properties2":[{"K":"D","T":"String","V":"Z"}]} 

输出2:

W  X  Y  Z 
M  N    O 
J 
     X 
       Y 
         Z 

UDF代码:下面的java文件的被编译和作为jsonparse.jar产生(这只是一个暂时的Java代码,你可以根据你的需要进行优化或修改)

JSONPARSE.java

package mypackage; 
    import java.io.IOException; 
    import org.apache.pig.EvalFunc; 
    import org.apache.pig.data.Tuple; 
    import java.util.LinkedHashMap; 
    import org.apache.commons.lang.StringUtils; 

    public class JSONPARSE extends EvalFunc<String> { 
    @Override 
    public String exec(Tuple arg0) throws IOException { 
    try 
     { 
      //Get the input 
      String input = ((String) arg0.get(0)); 

      //Parse the input "_" as the delimiter 
      String[] parts = input.split("_"); 

      //Init the hash with key as(A,B,C,D) and value as empty string 
      LinkedHashMap<String,String> mymap= new LinkedHashMap<String,String>(); 
      mymap.put("A", ""); 
      mymap.put("B", ""); 
      mymap.put("C", ""); 
      mymap.put("D", ""); 
      for(int i=0,j=2;i<parts.length;i=i+3,j=j+3) 
      { 
       //Find each key from the input and update the respective value 
       if(mymap.containsKey(parts[i])) 
       { 
        mymap.put(parts[i],parts[j]); 
       } 
      } 

      //Final output. 
      String output=""; 
      for(String key: mymap.keySet()) 
      { 
       //append each output "_" as delimiter 
       output=output+(String)mymap.get(key)+"_"; 
      } 

      //Remove the extra delimiter "_" from the output 
      return StringUtils.removeEnd(output,"_"); 
     } 
     catch(Exception e) 
     { 
       throw new IOException("Caught exception while processing the input row ", e); 
     } 
    } 
    } 

如何编译和构建jar文件:

$ ls 
    JSONPARSE.java input.json 
$ javac JSONPARSE.java 
$ mkdir mypackage 
$ mv JSONPARSE.class mypackage/ 
$ jar -cvf jsonparse.jar mypackage/ 
$ ls 
    JSONPARSE.java input.json jsonparse.jar mypackage 

1.Download 2 jar files from the below link(apache-commons-lang.jar,piggybank.jar) 
    http://www.java2s.com/Code/Jar/a/Downloadapachecommonslangjar.htm 
    http://www.java2s.com/Code/Jar/p/Downloadpiggybankjar.htm 

2. Set the above 2 jar files to your class path 
    >> export CLASSPATH=/tmp/piggybank.jar:/tmp/apache-commons-lang.jar 

3. Create directory name mypackage 
    >>mkdir mypackage 

4. Compile your JSONPARSE.java file (make sure the two jars are included in the classpath otherwise compilation issue will come) 
    >>javac JSONPARSE.java 

5. Move the class file to mypackage folder 
    >>mv JSONPARSE.class mypackage/ 

6. Create jar file name jsonparse.jar 
    >>jar -cvf jsonparse.jar mypackage/ 

7. (jsonparse.jar) file will be created, include into your pig script using REGISTER command. 

从命令行实施例