scala条件替换,Scala：如何使用Scala替换Dataframs中的值

327 阅读 0 评论 216 点赞

我是靠谱客的博主无辜故事，这篇文章主要介绍scala条件替换,Scala：如何使用Scala替换Dataframs中的值，现在分享给大家，希望可以做个参考。

For example I want to replace all numbers equal to 0.2 in a column to 0. How can I do that in Scala? Thanks

Edit:

|2012|Tesla| S | No comment | |

|1997| Ford| E350|Go get one now th...| |

This is my Dataframe I'm trying to change Tesla in make column to S

解决方案

Note:

As mentionned by Olivier Girardot, this answer is not optimized and the withColumn solution is the one to use (Azeroth2b answer)

Can not delete this answer as it has been accepted

Here is my take on this one:

val rdd = sc.parallelize(

List( (2012,"Tesla","S"), (1997,"Ford","E350"), (2015,"Chevy","Volt"))

)

val sqlContext = new SQLContext(sc)

// this is used to implicitly convert an RDD to a DataFrame.

import sqlContext.implicits._

val dataframe = rdd.toDF()

dataframe.foreach(println)

dataframe.map(row => {

val row1 = row.getAs[String](1)

val make = if (row1.toLowerCase == "tesla") "S" else row1

Row(row(0),make,row(2))

}).collect().foreach(println)

//[2012,S,S]

//[1997,Ford,E350]

//[2015,Chevy,Volt]

You can actually use directly map on the DataFrame.

So you basically check the column 1 for the String tesla.

If it's tesla, use the value S for make else you the current value of column 1

Then build a tuple with all data from the row using the indexes (zero based) (Row(row(0),make,row(2))) in my example)

There is probably a better way to do it. I am not that familiar yet with the Spark umbrella

本图文内容来源于网友提供，作为学习参考使用，或来自网络收集整理，版权属于原作者所有。