你好我创建在vb.net一个简单的控制台应用程序,以便将文件从任何类型UTF8转换,但我无法弄清楚这件事情如何与编码工作。我知道源文件是Unicode格式的,但是当我将它转换为新格式时,我得到了垃圾。有什么建议么?我不知道如果我的代码是正确的从任何类型的转换CSV文件,以UTF-8
这是我的代码。
Imports System.IO
Imports System.Text
Module Module1
Sub Main()
Console.Write("Please give the filepath (example:c:/tesfile.csv):")
Dim filepath As String = Console.ReadLine()
Dim sEncoding As String = DetermineFileType(filepath)
Dim strContents As String
Dim strEncodedContents As String
Dim objReader As StreamReader
Dim ErrInfo As String
Dim bString As Byte()
Try
'Read the file
objReader = New StreamReader(filepath)
'Read untill the end
strContents = objReader.ReadToEnd()
'Close The file
objReader.Close()
'Write Contents on DOS
Console.WriteLine(strContents)
Console.WriteLine("")
bString = EncodeString(strContents, "UTF-8")
strEncodedContents = System.Text.Encoding.UTF8.GetString(bString)
Dim objWriter As New System.IO.StreamWriter(filepath.Replace(".csv", "_encoded.csv"))
objWriter.WriteLine(strEncodedContents)
objWriter.Close()
Console.WriteLine("Encoding Finished")
Catch Ex As Exception
ErrInfo = Ex.Message
Console.WriteLine(ErrInfo)
End Try
Console.ReadKey()
End Sub
Public Function DetermineFileType(ByVal aFileName As String) As String
Dim sEncoding As String = String.Empty
Dim oSR As New StreamReader(aFileName, True)
oSR.ReadToEnd()
' Add this line to read the file.
sEncoding = oSR.CurrentEncoding.EncodingName
Return sEncoding
End Function
Function EncodeString(ByRef SourceData As String, ByRef CharSet As String) As Byte()
'get a byte pointer To the source data
Dim bSourceData As Byte() = System.Text.Encoding.Unicode.GetBytes(SourceData)
'get destination encoding
Dim OutEncoding As System.Text.Encoding = System.Text.Encoding.GetEncoding(CharSet)
'Encode the data To destination code page/charset
Return System.Text.Encoding.Convert(OutEncoding, System.Text.Encoding.UTF8, bSourceData)
End Function
End Module
Unicode是一种_specification_,不编码。你的源文件使用什么编码? UTF-8? UTF-16? UCS2? ... – fge 2011-12-20 13:59:20
UTF-8也是unicode :-)我把它拿来,然后输入文件是UTF-16? – 2011-12-20 14:02:03
我现在很困惑:S Unicode是一个spesification。 UTF-8是编码,但UTF-8也是统一:就是我的一切混合起来,现在 – themis 2011-12-20 14:09:33