什么时候在Java中对字符串进行加权处理是有益的?[英] When is it beneficial to flyweight Strings in Java?

本文是小编为大家收集整理的关于什么时候在Java中对字符串进行加权处理是有益的?的处理/解决方法,可以参考本文帮助大家快速定位并解决问题,中文翻译不准确的可切换到English标签页查看源文。

问题描述

我了解基本思想 java的字符串Interning,但我试图弄清楚它发生在哪些情况下,并且我需要自己进行轻型.

有些相关:

他们一起告诉我,String s = "foo"是好的,String s = new String("foo")是不好的,但没有任何其他情况.

特别是,如果我解析具有很多重复值的文件(例如CSV),Java的字符串Intivering会覆盖我,还是我需要自己做某事?我已经在我的其他问题上,我对推荐答案

不要在代码中使用string.intern().至少如果您可能会得到20个或更多不同的字符串.根据我的经验,使用String.intern在您有数百万个字符串时会减慢整个应用程序.

要避免重复String对象,只需使用HashMap.

private final Map<String, String> pool = new HashMap<String, String>();

private void interned(String s) {
  String interned = pool.get(s);
  if (interned != null) {
    return interned;
  pool.put(s, s);
  return s;
}

private void readFile(CsvFile csvFile) {
  for (List<String> row : csvFile) {
    for (int i = 0; i < row.size(); i++) {
      row.set(i, interned(row.get(i)));
      // further process the row
    }
  }
  pool.clear(); // allow the garbage collector to clean up
}

使用该代码,您可以避免使用一个CSV文件重复字符串.如果您需要大规模避免它们,请在另一个地方致电pool.clear().

其他推荐答案

一个选项 guava 在这里给您使用 interner 而不是使用String.intern()>> interner .与String.intern()不同,guava Interner使用堆而不是永久一代.此外,您可以选择使用弱参考来实现String s,以便在使用这些String s完成后,Interner不会阻止它们被垃圾收集.但是,如果您以Interner的方式使用Interners.newStrongInterner(),则可以使用Interners.newStrongInterner()的强引用,而不是为了更好的性能.

Interner<String> interner = Interners.newWeakInterner();
String a = interner.intern(getStringFromCsv());
String b = interner.intern(getStringFromCsv());
// if a.equals(b), a == b will be true

其他推荐答案

在大多数情况下,字符串是由byte或char array创建的(除非代码中的字符串文字),因此您可以对其进行测试.

    String s = "test";
    String s1 = new String(s.getBytes());
    String s2 = String.valueOf(s.toCharArray());
    String s3 = new String(s.toCharArray());

    System.out.println(s == s1);
    System.out.println(s == s2);
    System.out.println(s == s3);

打印false所有人.但是,您可以明确实习字符串,如果您有很多重复值.如果将其添加到上面的示例中,它将打印true对于所有三个比较

    s1 = s1.intern();
    s2 = s2.intern();
    s3 = s3.intern();

#Intern描述在API中.

编辑
那么,在每个值中使用intern()会以合理的方式读取flyweighting吗?
是的,假设没有对旧字符串的参考.如果不再使用旧的字符串引用,则将被垃圾收集.

本文地址:https://www.itbaoku.cn/post/627693.html

问题描述

I understand the basic idea of java's String interning, but I'm trying to figure out which situations it happens in, and which I would need to do my own flyweighting.

Somewhat related:

Together they tell me that String s = "foo" is good and String s = new String("foo") is bad but there's no mention of any other situations.

In particular, if I parse a file (say a csv) that has a lot of repeated values, will Java's string interning cover me or do I need to do something myself? I've gotten conflicting advice about whether or not String interning applies here in my other question


The full answer came in several fragments, so I'll sum up here:

By default, java only interns strings that are known at compile-time. String.intern(String) can be used at runtime, but it doesn't perform very well, so it's only appropriate for smaller numbers of Strings that you're sure will be repeated a lot. For larger sets of Strings it's Guava to the rescue (see ColinD's answer).

推荐答案

Don't use String.intern() in your code. At least not if you might get 20 or more different strings. In my experience using String.intern slows down the whole application when you have a few millions strings.

To avoid duplicated String objects, just use a HashMap.

private final Map<String, String> pool = new HashMap<String, String>();

private void interned(String s) {
  String interned = pool.get(s);
  if (interned != null) {
    return interned;
  pool.put(s, s);
  return s;
}

private void readFile(CsvFile csvFile) {
  for (List<String> row : csvFile) {
    for (int i = 0; i < row.size(); i++) {
      row.set(i, interned(row.get(i)));
      // further process the row
    }
  }
  pool.clear(); // allow the garbage collector to clean up
}

With that code you can avoid duplicate strings for one CSV file. If you need to avoid them on a larger scale, call pool.clear() in another place.

其他推荐答案

One option Guava gives you here is to use an Interner rather than using String.intern(). Unlike String.intern(), a Guava Interner uses the heap rather than the permanent generation. Additionally, you have the option of interning the Strings with weak references such that when you're done using those Strings, the Interner won't prevent them from being garbage-collected. If you use the Interner in such a way that it's discarded when you're done with the strings, though, you can just use strong references with Interners.newStrongInterner() instead for possibly better performance.

Interner<String> interner = Interners.newWeakInterner();
String a = interner.intern(getStringFromCsv());
String b = interner.intern(getStringFromCsv());
// if a.equals(b), a == b will be true

其他推荐答案

In most cases, string is created from byte or char array (unless it's a string literal in the code), so you can test it.

    String s = "test";
    String s1 = new String(s.getBytes());
    String s2 = String.valueOf(s.toCharArray());
    String s3 = new String(s.toCharArray());

    System.out.println(s == s1);
    System.out.println(s == s2);
    System.out.println(s == s3);

Prints false for all. But you can explicitly intern string, if you thing you'll have a lot of repeating values. If you add this to above example, it'll print true for all three comparisons

    s1 = s1.intern();
    s2 = s2.intern();
    s3 = s3.intern();

See String#intern description in the API.

edit
So would using intern() on each value that's read in be a reasonable way to achieve flyweighting?
Yes, assuming there're no references held to the old string. If old string reference isn't used anywhere anymore, it'll be garbage-collected.