从字符串中删除非ascii字符[英] Removing non-ascii characters from a string

本文是小编为大家收集整理的关于从字符串中删除非ascii字符的处理/解决方法,可以参考本文帮助大家快速定位并解决问题,中文翻译不准确的可切换到English标签页查看源文。

问题描述

你好,

我相信.net中的所有字符串默认都是unicode,我正在寻找一个
从字符串中删除所有非 ascii 字符的方法(或可选
替换它们).

有一篇关于代码项目的文章看起来像它
我想要什么,但我不禁认为它比它更复杂
需要.

我已经查看了与编码有关的 msdn 页面,但我不是很
熟悉这个话题.

如果我能得到一个ASCII字符列表,那应该很容易写
一种根据列表检查每个字符并执行替换的方法
或必要时删除操作.但是我找不到任何确切的东西
像这样与可信赖的老谷歌一样,我有什么遗漏吗?.

如果它有助于我需要这个的原因是因为我正在编写前端
对于蹩脚的命令行mp3编码器,它不喜欢被传递,或者
要求输出到包含 unicode 字符的文件路径.

--
Eps

推荐答案

"Eps"<ms**********@epscylonb.com 在留言中写道
新闻:呃***************@TK2MSFTNGP05.phx.gbl...
你好,

我相信.net中的所有字符串默认都是unicode,我正在寻找一个
从字符串中删除所有非 ascii 字符的方法(或可选
替换它们).

有一篇关于代码项目的文章看起来像它
我想要什么,但我不禁认为它比它更复杂
需要.

我已经查看了与编码有关的 msdn 页面,但我不是很
熟悉这个话题.

如果我能得到一个ASCII字符列表,那应该很容易写
一种根据列表检查每个字符并执行替换的方法
或必要时删除操作.但是我找不到任何确切的东西
像这样与可信赖的老谷歌一样,我有什么遗漏吗?.

如果它有助于我需要这个的原因是因为我正在编写前端
对于蹩脚的命令行mp3编码器,它不喜欢被传递,或者
要求输出到包含 unicode 字符的文件路径.

也许我错过了这段代码:-

byte[] asciiChars = Encoding.ASCII.GetBytes("AB £ CD");
字符串结果 = Encoding.ASCII.GetString(asciiChars);
Console.WriteLine(结果);

创建字符串:-

乙?光盘

--
Anthony Jones - MVP ASP/ASP.NET

Anthony Jones 写道:
也许我错过了这段代码:-

byte[] asciiChars = Encoding.ASCII.GetBytes("AB £ CD");
字符串结果 = Encoding.ASCII.GetString(asciiChars);
Console.WriteLine(结果);

创建字符串:-

乙?光盘
我以前看过这段代码,谁能解释一下为什么
Encoding.ASCII.GetString() 方法不接受字符串作为参数 ?.

--
Eps

8 月 29 日下午 1:12*pm,Eps <msnewsgro...@epscylonb.com 写道:
* * byte[] asciiChars = Encoding.ASCII.GetBytes("AB £ CD");
* * 字符串结果 = Encoding.ASCII.GetString(asciiChars);
* * Console.WriteLine(结果);
创建字符串:-
乙?光盘

我以前看过这段代码,谁能解释一下为什么
Encoding.ASCII.GetString() 方法不接受字符串作为参数?
因为 Encoding 类对 CLR 字符串进行编码和解码(
_always_ Unicode) 到/从指定编码的字节数组,通常
用于序列化或互操作目的.世上没有非
Unicode System.String(好吧,您可以将字符串视为普通数组
char,但任何 .NET 函数仍会将字符串视为 UTF-16).

你问的还是可以的,因为ASCII是
的纯子集统一码.使用 LINQ,您可以使用以下单行:

string ascii = new string(s.Where(c =(int)c >= 0 && (int)c <=
127).ToArray());

然而请注意,"ascii"仍然是一个 Unicode 字符串 - 它只是
不会包含任何非 ASCII 字符.

本文地址:https://www.itbaoku.cn/post/1050790.html

问题描述

Hi there,

I believe all strings in .net are unicode by default, I am looking for a
way to remove all non ascii characters from a string (or optionally
replace them).

There is an article on code project which kind of looks like it does
what I want but I can''t help thinking it makes it more complex than it
needs to be.

I have looked at the msdn pages to do with Encodings but I am not very
familiar with this topic.

If I can get a list of ascii characters then it should be easy to write
a method that checks each char against the list and performs the replace
or remove operation if necessary. Yet I can''t find anything exactly
like this with trusty old google, is there something I am missing ?.

If it helps the reason I need this is because I am writing a front end
for the lame command line mp3 encoder, it doesn''t like being passed, or
asked to output to, file paths containing unicode characters.

--
Eps

推荐答案

"Eps" <ms**********@epscylonb.comwrote in message
news:er*************@TK2MSFTNGP05.phx.gbl...
Hi there,

I believe all strings in .net are unicode by default, I am looking for a
way to remove all non ascii characters from a string (or optionally
replace them).

There is an article on code project which kind of looks like it does
what I want but I can''t help thinking it makes it more complex than it
needs to be.

I have looked at the msdn pages to do with Encodings but I am not very
familiar with this topic.

If I can get a list of ascii characters then it should be easy to write
a method that checks each char against the list and performs the replace
or remove operation if necessary. Yet I can''t find anything exactly
like this with trusty old google, is there something I am missing ?.

If it helps the reason I need this is because I am writing a front end
for the lame command line mp3 encoder, it doesn''t like being passed, or
asked to output to, file paths containing unicode characters.

Perhaps I''m missing something this code:-

byte[] asciiChars = Encoding.ASCII.GetBytes("AB £ CD");
string result = Encoding.ASCII.GetString(asciiChars);
Console.WriteLine(result);

creates the string:-

AB ? CD

--
Anthony Jones - MVP ASP/ASP.NET

Anthony Jones wrote:
Perhaps I''m missing something this code:-

byte[] asciiChars = Encoding.ASCII.GetBytes("AB £ CD");
string result = Encoding.ASCII.GetString(asciiChars);
Console.WriteLine(result);

creates the string:-

AB ? CD
I have seen this code before, can anyone explain why the
Encoding.ASCII.GetString() method does not accept a string as a parameter ?.

--
Eps

On Aug 29, 1:12*pm, Eps <msnewsgro...@epscylonb.comwrote:
* * byte[] asciiChars = Encoding.ASCII.GetBytes("AB £ CD");
* * string result = Encoding.ASCII.GetString(asciiChars);
* * Console.WriteLine(result);
creates the string:-
AB ? CD

I have seen this code before, can anyone explain why the
Encoding.ASCII.GetString() method does not accept a string as a parameter?.
Because Encoding classes encode and decode CLR strings (which are
_always_ Unicode) to/from byte arrays in specified encoding, typically
for serialization or interop purposes. There''s no such thing as a non-
Unicode System.String (well, you could treat a string as a plain array
of char, but any .NET function will still treat string as UTF-16).

What you ask is still possible, because ASCII is a pure subset of
Unicode. With LINQ, you could use this one-liner:

string ascii = new string(s.Where(c =(int)c >= 0 && (int)c <=
127).ToArray());

Note however that "ascii" would still be a Unicode string - it just
wouldn''t contain any non-ASCII characters.