Pyspark rdd aggregate. schema = StructType([ StructField("_id", Stri...