top of page
tranvacicolvolk

Vb6 Convert Unicode File To Ascii By Reading Lines From UTF-8 Files[^1^]



I looked into the answer from Máťa whose name hints at encoding qualifications and experience. The VBA docs say CreateTextFile(filename, [overwrite [, unicode]]) creates a file "as a Unicode or ASCII file. The value is True if the file is created as a Unicode file; False if it's created as an ASCII file. If omitted, an ASCII file is assumed." It's fine that a file stores unicode characters, but in what encoding? Unencoded unicode can't be represented in a file.


I didn't want to change all my code just to support several UTF8 strings so i let my code do it's thing, and after the file was saved (in ANSI code as it is the default of excel) i then convert the file to UTF-8 using this code:




Vb6 Convert Unicode File To Ascii



First of all there is an article from Chilkat (another component vendor) about how to use the Font's charset (assuming it is a unicode font) to set different font types (you have to manually change the .frm since charset isn't exposed in the gui). Then all you have to do is convert from AnsiToUTF8 and back to support different languages (that is what Chilkat's control does).


The .NET runtime uses Unicode as the encoding for all strings. The StreamReader and StreamWriter classes in System.IO take an Encoding as a parameter. So, to convert from one encoding to another, we just need to specify the original encoding and read the file contents into a string followed by writing out the string in the desired encoding.


Tells the charset converter the charset of the input data for a conversion. Possible values are:&ltfont size="2" face="MS Sans Serif">us-asciiunicode (also known as UTF16LE or simply UTF16)unicodefffe (also known as UTF16BE)ebcdiciso-8859-1iso-8859-2iso-8859-3iso-8859-4iso-8859-5iso-8859-6iso-8859-7iso-8859-8iso-8859-9iso-8859-13iso-8859-15windows-874windows-1250windows-1251windows-1252windows-1253windows-1254windows-1255windows-1256windows-1257windows-1258utf-7utf-8utf-32utf-32beshift_jisgb2312ks_c_5601-1987big5iso-2022-jpiso-2022-kreuc-jpeuc-krmacintoshx-mac-japanesex-mac-chinesetradx-mac-koreanx-mac-arabicx-mac-hebrewx-mac-greekx-mac-cyrillicx-mac-chinesesimpx-mac-romanianx-mac-ukrainianx-mac-thaix-mac-cex-mac-icelandicx-mac-turkishx-mac-croatianasmo-708dos-720dos-862ibm01140ibm01141ibm01142ibm01143ibm01144ibm01145ibm01146ibm01147ibm01148ibm01149ibm037ibm437ibm500ibm737ibm775ibm850ibm852ibm855ibm857ibm00858ibm860ibm861ibm863ibm864ibm865cp866ibm869ibm870cp875koi8-rkoi8-u&lt/font>


Tells the charset converter the target charset for a conversion. Possible values are:&ltfont size="2" face="MS Sans Serif">us-asciiunicode (also known as UTF16LE or simply UTF16)unicodefffe (also known as UTF16BE)ebcdiciso-8859-1iso-8859-2iso-8859-3iso-8859-4iso-8859-5iso-8859-6iso-8859-7iso-8859-8iso-8859-9iso-8859-13iso-8859-15windows-874windows-1250windows-1251windows-1252windows-1253windows-1254windows-1255windows-1256windows-1257windows-1258utf-7utf-8utf-32utf-32beshift_jisgb2312ks_c_5601-1987big5iso-2022-jpiso-2022-kreuc-jpeuc-krmacintoshx-mac-japanesex-mac-chinesetradx-mac-koreanx-mac-arabicx-mac-hebrewx-mac-greekx-mac-cyrillicx-mac-chinesesimpx-mac-romanianx-mac-ukrainianx-mac-thaix-mac-cex-mac-icelandicx-mac-turkishx-mac-croatianasmo-708dos-720dos-862ibm01140ibm01141ibm01142ibm01143ibm01144ibm01145ibm01146ibm01147ibm01148ibm01149ibm037ibm437ibm500ibm737ibm775ibm850ibm852ibm855ibm857ibm00858ibm860ibm861ibm863ibm864ibm865cp866ibm869ibm870cp875koi8-rkoi8-u&lt/font>


I am confused as to the redundancy in unicode settings available. You can set the code page in the general file connection settings to 65001 (UTF-8), as you demonstrated. Then what does the "Unicode" checkbox on the same page do exactly? And why is it necessary to go to each field in a file source and change the datatype to DT_WSTR? Shouldn't they already be unicode strings?


Alternatively, you may be able to convert a file from UTF-8 to UTF-16 and then treat it as UCS-2 in SQL Server. Even use BCP or BULK INSERT. Notepad++ has a setting to alter the encoding when you save a file, but I've seen Notepad++ do funny things when editing a large file. Java comes with a converter "native2ascii" which also did some funny things, but the data seemed to come out OK in SQL Server. I used a recipe like this in a CMD file. This is actually converting to and then from UTF-32.


At the time (2011) I wrote, "Conversion by this method prior to import is not completely satisfactory, because the output file is a little larger than expected from the input size, without explanation, except that something like blank lines seems to appear in the converted output. That appears to be filtered out in import, but it may be not the only unintended alteration."


If you use a constructor without an Encoding argument, the resultant StreamWriter object will not store strings to the file in a Unicode format with 2 bytes per character. Nor will it convert your strings to ASCII. Instead, the StreamWriter object will store strings in a format known as UTF-8, which is something I'll go over shortly.


When you specify Encoding.UTF8, the StreamWriter class converts the Unicode text strings to UTF-8. In addition, it writes the three bytes OxEF, OxBB, and OxBF to the beginning of the file or stream. These bytes are the Unicode BOM converted to UTF-8.


When you specify an encoding of Encoding.ASCll, the resultant file or stream contains only ASCII characters, that is, characters in the range 0x00 through 0x7F. Any Unicode character not in this range is converted to a question mark (ASCII code 0x3F). This is the only encoding in which data is actually lost.


This function opens a file in national character set mode for input or output, with the maximum line size specified. Even though the contents of an NVARCHAR2 buffer may be AL16UTF16 or UTF8 (depending on the national character set of the database), the contents of the file are always read and written in UTF8. See "Support for the Unicode Standard in Oracle Database" for more information. UTL_FILE converts between UTF8 and AL16UTF16 as necessary.


The above functions and procedures process text files encoded in the UTF8 character set, that is, in the Unicode CESU-8 encoding. See "Universal Character Sets" for more information about CESU-8. The functions and procedures convert between UTF8 and the national character set of the database, which can be UTF8 or AL16UTF16, as needed.


For example, using FileWriter class is not safe because it converts the document to the default character encoding. The output file can suffer from a parsing error or data loss if the document contains characters that are not available in the default character encoding.


Allright, I did some testing and I can't get SSIS to export to UTF.All the settings are correct, I converted to Unicode and to UTF-8 and the damn thing just refuses to export to anything else than ASCII. I even created a UTF-8 file upfront, but SSIS just converts it back to ASCII.


creates a text file named foo.txt. CreateTextFile has two optional arguments: overwrite and unicode. By default, CreateTextFile creates ASCII files but doesn't overwrite existing files. By setting the overwrite argument to True, you're telling CreateTextFile to overwrite an existing file. By setting the unicode argument to True, you're telling CreateTextFile to create a Unicode file.


Dim unicodeBytes As Byte() = unicode.GetBytes(unicodeString) ' Perform the conversion from one encoding to the other. Dim asciiBytes As Byte() = Encoding.Convert(unicode, ascii, unicodeBytes) ' Convert the new byte array into a char array and then into a string.


re: unicode to ascii in vb6 There is one other scenario where the file can retain the BOM marker. If the old file is not really deleted and it was Unicode before, when you open it and use Put it may in fact retain the BOM marker. 2ff7e9595c


2 views0 comments

Recent Posts

See All

Comments


bottom of page