sourceforge.net logo  
Charset Detector
 
Summary
 
Summary
ChsDet is a Charset Detector - as the name says - is a stand alone executable module for automatic charset / encoding detection of a given text or file.
ChsDet can be useful for internationalisation support in multilingual applications such as web-script editors or Unicode editors.
Given input buffer will be analysed to guess used encoding. The result - charset name or codepage id - can be used as control parameter for charset conversion procedure.
Charset Detector can be compiled (and hopefully used:) for MS Windows (as dll - dynamic link library) using Delphi or Free Pascal or Linux using Delphi/Kylix.
Based on Mozilla's i18n component.
State
Version 0.2 released.
Requirements
Charset Detector doesn't need any external components.
Output
As result you will get guessed charset as MS Windows Code Page id and GNU charset canonical name.
Licence
Charset Detector is open source project and distributed under Lesser GPL.
Supported charsets

Code padeNameNote
-1   Pseudo codepage.
Charset Detector was unable
to guess encoding.
0 ASCII Pseudo codepage.
No conversation needed.
855 IBM855  
866 IBM866  
932 Shift_JIS  
950 Big5  
1200 UTF-16LE  
1201 UTF-16BE  
1251 windows-1251  
1252 windows-1252  
1253 windows-1253  
1255 windows-1255  
10007x-mac-cyrillic  
12000X-ISO-10646-UCS-4-2143 
12000UTF-32LE MS Windows hasn't CodePage.
Try to use USC-4.
12001X-ISO-10646-UCS-4-3412 
12001UTF-32BE MS Windows hasn't CodePage.
Try to use USC-4.
20866KOI8-R  
28595ISO-8859-5  
28595ISO-8859-5  
28597ISO-8859-7  
28598ISO-8859-8  
50222ISO-2022-JP  
50225ISO-2022-KR  
50227ISO-2022-CN  
51932EUC-JP  
51936x-euc-tw  
51949EUC-KR  
52936HZ-GB-2312  
54936GB18030  
65001UTF-8  
Nick Yakowlew © 2006