在spark-scala应用程序中把df.show()的内容保存为一个字符串[英] Saving contents of df.show() as a string in spark-scala app

本文是小编为大家收集整理的关于在spark-scala应用程序中把df.show()的内容保存为一个字符串的处理/解决方法,可以参考本文帮助大家快速定位并解决问题,中文翻译不准确的可切换到English标签页查看源文。

问题描述

我需要将df.show()的输出保存为字符串,以便我可以直接发送电子邮件.

for ex.,以下示例取自官方的火花文档:

val df = spark.read.json("examples/src/main/resources/people.json")

// Displays the content of the DataFrame to stdout
df.show()
// +----+-------+
// | age|   name|
// +----+-------+
// |null|Michael|
// |  30|   Andy|
// |  19| Justin|
// +----+-------+

我需要将上表格保存为在控制台中打印的字符串.我确实查看了log4j以打印日志,但无法遇到有关仅记录输出的任何信息.

有人可以帮我吗?

推荐答案

scala.Console有一种withOut方法:

val outCapture = new ByteArrayOutputStream
Console.withOut(outCapture) {
  df.show()
}
val result = new String(outCapture.toByteArray)

其他推荐答案

解决方法是将标准输出重定向到变量:

val baos = new java.io.ByteArrayOutputStream();
val ps =  new java.io.PrintStream(baos);

val oldPs = Console.out
Console.setOut(ps)
df.show()
val content = baos.toString()
Console.setOut(oldPs)

请注意,我在这里有一个弃用警告.

您还可以重新实现方法Dataset.showString,该方法生成数据.它在后台使用take.也许这也是创建PR来制作 showString 公共? :)

本文地址:https://www.itbaoku.cn/post/1574810.html

问题描述

I need to save the output of df.show() as a string so that i can email it directly.

For ex., the below example taken from official spark docs,:

val df = spark.read.json("examples/src/main/resources/people.json")

// Displays the content of the DataFrame to stdout
df.show()
// +----+-------+
// | age|   name|
// +----+-------+
// |null|Michael|
// |  30|   Andy|
// |  19| Justin|
// +----+-------+

I need to save the above table as a string which is printed in the console. I did look at log4j to print the log, but couldnt come across any info on logging only the output.

Can someone help me with it?

推荐答案

scala.Console has a withOut method for this kind of thing:

val outCapture = new ByteArrayOutputStream
Console.withOut(outCapture) {
  df.show()
}
val result = new String(outCapture.toByteArray)

其他推荐答案

Workaround is to redirect standard output to variable:

val baos = new java.io.ByteArrayOutputStream();
val ps =  new java.io.PrintStream(baos);

val oldPs = Console.out
Console.setOut(ps)
df.show()
val content = baos.toString()
Console.setOut(oldPs)

Note that I have one deprecation warning here.

You can also re-implement method Dataset.showString, which generated data. It uses take in background. Maybe it's also a good moment to create PR to make showString public? :)

相关标签/搜索