Search Results

Search found 5325 results on 213 pages for 'huffman encoding'.

Page 12/213 | < Previous Page | 8 9 10 11 12 13 14 15 16 17 18 19 | Next Page >

How to bulk-rename files with invalid encoding or bulk-replace invalid encoded characters?

- by qdoe

I have a debian server and I'm hosting music for an internet radio station. I have trouble with file names and paths because a lot of files got an invalid encoding, for example: ./music/BÃ¤ndname - Some Title - additional Info/B?ndname - 07 - This Title Is Cörtain, The EncÃ?ding Not.mp3 Ideally, I would like to remove everything that is not letters A-Z/a-z or numbers 0-9 or dash -/underscore _... The result should look like something like that: ./music/Bndname-SomeTitle-additionalInfo/Bndname-07-ThisTitleIsCrtain,TheEnc?dingNot.mp3 How to achieve this for a batch of a lot of files and directories? I've seen this similar question: bulk rename (or correctly display) files with special characters But this only fixes the encoding, I would prefer a more strict approach as described above.

Read the article
How to write Cyrillic text in C++ console?

- by VextoR

For example, if I write: cout << "??????!" << endl; //it's hello in Russian in console it would be something like "-?????!" ok, I know that we can use: setlocale(LC_ALL, "Russian"); but after that not working command line arguments in russian (if I start my program through BAT file): StartProgram.bat chcp 1251 MyProgram.exe -user=???? -password=?????? so, after setlocale program can't read russian arguments properly. This happens because BAT file in CP1251, but console is in CP866 So, there is a question: How to write in C++ console russian text and same time russian command line arguments have to be read properly thanks

Read the article
Can I safely interpret ISO-8859-1 strings as Win1252 when parsing e-mail?

- by sharptooth

I need to support decoding ISO-8859-1 in my e-mail client. Specifically sometimes messages contain attaches that have filenames starting with "?iso-8859-1?". Wikipedia says ISO-8859-1 is pretty much the same as Windows 1252. I already have good tested code for decoding Win1252. Can I just use it directly and expect no problems?

Read the article
How to decode HTML encoded text in MS Access

- by Dejan

Hi all, I have a table field in MS Access 2003 which contains HTML encoded strings like this: Ανταγωνισμός παγκοσμίου επιπέδου στην κατάρτι&#963 How can I decode this into "normal string", using MS Access? Thanks in advance.

Read the article
Convert PHP entities like – or &scaron; to their applicable characters

- by Martin Sikora

Hello, is there a way how to convert HTML entities to their applicable characters. Something similar to html_entity_decode()? I'm trying to make ordinary text without HTML entities from TinyMCE output.

Read the article
C# how to get current encoding type used by C# to write/read configuration for config file?

- by 5YrsLaterDBA

I am doing connection string encryption. we use our own encryption key with AES algorithm to do this. during the process, we need to convert string to byte array and then convert byte array back to string. I found the encoding play an important role on those conversions. So I need to know the encoding C# is using to get above conversion right. Any idea how to get current encoding programmably? thanks,

Read the article
Python: Which encoding is used for processing sys.argv?

- by EOL

What encoding are the elements of sys.argv in, in Python? are they encoded with the sys.getdefaultencoding() encoding? sys.getdefaultencoding(): Return the name of the current default string encoding used by the Unicode implementation. PS: As pointed out in some of the answers, sys.stdin.encoding would indeed be a better guess. I would love to see a definitive answer to this question, though, with pointers to solid sources! PPS: As Wim pointed out, Python 3 solves this issue by putting str objects in sys.argv (if I understand correctly). The question remains open for Python 2.x, though. Under Unix, the LC_CTYPE environment variable seems to be the correct thing to check, no? What should be done with Windows (so that sys.argv elements are correctly interpreted whatever the console)?

Read the article
How can I set Transfer-Encoding to chunked, explicitly or explicitly, in an ASP.NET response?

- by Cheeso

Can I simply set the Transfer-Encoding header? Will calling Response.Flush() at some point cause this to occur implicitly? EDIT No, I Cannot call Response.Headers.Add("Transfer-Encoding","anything"); That throws. any other input? Related: Enable Chunked Transfer Encoding in ASP.NET

Read the article
Why does Silverlight provides webcam and microphone support without any encoding API?

- by Shurup

In the list of new features in Silverlight 4 you will find following: Webcam and microphone to allow sharing of video and audio for instance for chat or customer service applications. Silverlight captures an audio stream as raw pcm. So how would you realize for example audio/video chat or client/server audio recording application without any encoding on the client side, where there is no APIs in Silverlight available? Much less in a Silverlight you cannot use an unmanaged dll. You can use a com automation (a new feature of the Silverlight 4, I think only for Windows) but only if it was already installed on the client side (do you know any encoding COM servers that are installed with the windows). Otherwise, how would you deploy a custom COM server within you Silverlight application? The only way I found is either to deploy a command-line encoding and use it with COM AutomationFactory.CreateObject("WScript.Shell") or to implement an encoding to use it in your own AudioSink.

Read the article
Encoding gives "'ascii' codec can't encode character … ordinal not in range(128)"

- by user140314

I am working through the Django RSS reader project here. The RSS feed will read something like "OKLAHOMA CITY (AP) — James Harden let". The RSS feed's encoding reads encoding="UTF-8" so I believe I am passing utf-8 to markdown in the code snippet below. The em dash is where it chokes. I get the Django error of "'ascii' codec can't encode character u'\u2014' in position 109: ordinal not in range(128)" which is an UnicodeEncodeError. In the variables being passed I see "OKLAHOMA CITY (AP) \u2014 James Harden". The code line that is not working is: content = content.encode(parsed_feed.encoding, "xmlcharrefreplace") I am using markdown 2.0, django 1.1, and python 2.4. What is the magic sequence of encoding and decoding that I need to do to make this work? Thanks.

Read the article
Read a file with a specific encoding in C?

- by Daziplqa

Hi, I've a file that was written on windows with encoding WINDOWS-1256, and I need to write a C program that reads bytes from this file and write them back to a new file with UTF-8 encoding. How to read a file with a specific encoding in C ??

Read the article
Why create a Huffman tree per character instead of a Node?

- by Omega

For a school assignment we're supposed to make a Java implementation of a compressor/decompresser using Huffman's algorithm. I've been reading a bit about it, specially this C++ tutorial: http://www.cprogramming.com/tutorial/computersciencetheory/huffman.html In my program, I've been thinking about having Nodes that have the following properties: Total Frequency Character (if a leaf) Right child (if any) Left child (if any) Parent (if any) So when building the Huffman tree, it is just a matter of linking a node to others, etc. However, I'm a bit confused with the following quote (emphasis mine): First, every letter starts off as part of its own tree and the trees are ordered by the frequency of the letters in the original string. Then the two least-frequently used letters are combined into a single tree, and the frequency of that tree is set to be the combined frequency of the two trees that it links together. My question: why should I create a tree per letter, instead of just a node per letter and then do the linking later? I have not begun coding, I'm just studying the algorithm first, so I guess I'm missing an important detail. What is it?

Read the article
Is it possible to change the default encoding in notepad?

- by akurtser

On windows 7 (x64), the default option, for saving text files in notepad is ANSI. One can select other encoding from the combo box, however, I'd like this option to be the default.

Read the article
Anatomy of a .NET Assembly - Custom attribute encoding

- by Simon Cooper

In my previous post, I covered how field, method, and other types of signatures are encoded in a .NET assembly. Custom attribute signatures differ quite a bit from these, which consequently affects attribute specifications in C#. Custom attribute specifications In C#, you can apply a custom attribute to a type or type member, specifying a constructor as well as the values of fields or properties on the attribute type: public class ExampleAttribute : Attribute { public ExampleAttribute(int ctorArg1, string ctorArg2) { ... } public Type ExampleType { get; set; } } [Example(5, "6", ExampleType = typeof(string))] public class C { ... } How does this specification actually get encoded and stored in an assembly? Specification blob values Custom attribute specification signatures use the same building blocks as other types of signatures; the ELEMENT_TYPE structure. However, they significantly differ from other types of signatures, in that the actual parameter values need to be stored along with type information. There are two types of specification arguments in a signature blob; fixed args and named args. Fixed args are the arguments to the attribute type constructor, named arguments are specified after the constructor arguments to provide a value to a field or property on the constructed attribute type (PropertyName = propValue) Values in an attribute blob are limited to one of the basic types (one of the number types, character, or boolean), a reference to a type, an enum (which, in .NET, has to use one of the integer types as a base representation), or arrays of any of those. Enums and the basic types are easy to store in a blob - you simply store the binary representation. Strings are stored starting with a compressed integer indicating the length of the string, followed by the UTF8 characters. Array values start with an integer indicating the number of elements in the array, then the item values concatentated together. Rather than using a coded token, Type values are stored using a string representing the type name and fully qualified assembly name (for example, MyNs.MyType, MyAssembly, Version=1.0.0.0, Culture=neutral, PublicKeyToken=0123456789abcdef). If the type is in the current assembly or mscorlib then just the type name can be used. This is probably done to prevent direct references between assemblies solely because of attribute specification arguments; assemblies can be loaded in the reflection-only context and attribute arguments still processed, without loading the entire assembly. Fixed and named arguments Each entry in the CustomAttribute metadata table contains a reference to the object the attribute is applied to, the attribute constructor, and the specification blob. The number and type of arguments to the constructor (the fixed args) can be worked out by the method signature referenced by the attribute constructor, and so the fixed args can simply be concatenated together in the blob without any extra type information. Named args are different. These specify the value to assign to a field or property once the attribute type has been constructed. In the CLR, fields and properties can be overloaded just on their type; different fields and properties can have the same name. Therefore, to uniquely identify a field or property you need: Whether it's a field or property (indicated using byte values 0x53 and 0x54, respectively) The field or property type The field or property name After the fixed arg values is a 2-byte number specifying the number of named args in the blob. Each named argument has the above information concatenated together, mostly using the basic ELEMENT_TYPE values, in the same way as a method or field signature. A Type argument is represented using the byte 0x50, and an enum argument is represented using the byte 0x55 followed by a string specifying the name and assembly of the enum type. The named argument property information is followed by the argument value, using the same encoding as fixed args. Boxed objects This would be all very well, were it not for object and object[]. Arguments and properties of type object allow a value of any allowed argument type to be specified. As a result, more information needs to be specified in the blob to interpret the argument bytes as the correct type. So, the argument value is simple prepended with the type of the value by specifying the ELEMENT_TYPE or name of the enum the value represents. For named arguments, a field or property of type object is represented using the byte 0x51, with the actual type specified in the argument value. Some examples... All property signatures start with the 2-byte value 0x0001. Similar to my previous post in the series, names in capitals correspond to a particular byte value in the ELEMENT_TYPE structure. For strings, I'll simply give the string value, rather than the length and UTF8 encoding in the actual blob. I'll be using the following enum and attribute types to demonstrate specification encodings: class AttrAttribute : Attribute { public AttrAttribute() {} public AttrAttribute(Type[] tArray) {} public AttrAttribute(object o) {} public AttrAttribute(MyEnum e) {} public AttrAttribute(ushort x, int y) {} public AttrAttribute(string str, Type type1, Type type2) {} public int Prop1 { get; set; } public object Prop2 { get; set; } public object[] ObjectArray; } enum MyEnum : int { Val1 = 1, Val2 = 2 } Now, some examples: Here, the the specification binds to the (ushort, int) attribute constructor, with fixed args only. The specification blob starts off with a prolog, followed by the two constructor arguments, then the number of named arguments (zero): [Attr(42, 84)] 0x0001 0x002a 0x00000054 0x0000 An example of string and type encoding: [Attr("MyString", typeof(Array), typeof(System.Windows.Forms.Form))] 0x0001 "MyString" "System.Array" "System.Windows.Forms.Form, System.Windows.Forms, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" 0x0000 As you can see, the full assembly specification of a type is only needed if the type isn't in the current assembly or mscorlib. Note, however, that the C# compiler currently chooses to fully-qualify mscorlib types anyway. An object argument (this binds to the object attribute constructor), and two named arguments (a null string is represented by 0xff and the empty string by 0x00) [Attr((ushort)40, Prop1 = 12, Prop2 = "")] 0x0001 U2 0x0028 0x0002 0x54 I4 "Prop1" 0x0000000c 0x54 0x51 "Prop2" STRING 0x00 Right, more complicated now. A type array as a fixed argument: [Attr(new[] { typeof(string), typeof(object) })] 0x0001 0x00000002 // the number of elements "System.String" "System.Object" 0x0000 An enum value, which is simply represented using the underlying value. The CLR works out that it's an enum using information in the attribute constructor signature: [Attr(MyEnum.Val1)] 0x0001 0x00000001 0x0000 And finally, a null array, and an object array as a named argument: [Attr((Type[])null, ObjectArray = new object[] { (byte)2, typeof(decimal), null, MyEnum.Val2 })] 0x0001 0xffffffff 0x0001 0x53 SZARRAY 0x51 "ObjectArray" 0x00000004 U1 0x02 0x50 "System.Decimal" STRING 0xff 0x55 "MyEnum" 0x00000002 As you'll notice, a null object is encoded as a null string value, and a null array is represented using a length of -1 (0xffffffff). How does this affect C#? So, we can now explain why the limits on attribute arguments are so strict in C#. Attribute specification blobs are limited to basic numbers, enums, types, and arrays. As you can see, this is because the raw CLR encoding can only accommodate those types. Special byte patterns have to be used to indicate object, string, Type, or enum values in named arguments; you can't specify an arbitary object type, as there isn't a generalised way of encoding the resulting value in the specification blob. In particular, decimal values can't be encoded, as it isn't a 'built-in' CLR type that has a native representation (you'll notice that decimal constants in C# programs are compiled as several integer arguments to DecimalConstantAttribute). Jagged arrays also aren't natively supported, although you can get around it by using an array as a value to an object argument: [Attr(new object[] { new object[] { new Type[] { typeof(string) } }, 42 })] Finally... Phew! That was a bit longer than I thought it would be. Custom attribute encodings are complicated! Hopefully this series has been an informative look at what exactly goes on inside a .NET assembly. In the next blog posts, I'll be carrying on with the 'Inside Red Gate' series.

Read the article
Why does Komodo edit not open files on doubleclick due to some path encoding error?

- by gakera

When I double click a file in finder to open in Komodo edit, it displays an error message saying it can't find the file and the path is displayed, but with wrong encoding, such that special characters such as "ó" and "ð" are displayed as some weird boxes. Screenshot: http://i51.tinypic.com/2qkleg4.jpg I'm running OSX Snow Leopard 10.6.4 and Komodo Edit version 5: "Komodo Edit, version 5.2.4, build 4343, platform macosx-x86. Built on Tue Dec 8 18:18:35 2009." When I drag the file into Komodo edit it opens just fine displaying the path correctly with "ó"s and all :P This doesn't happen if I right click the file in finder and select to open it with TextEdit or Word or (God forbid) MacVim - only when I select to open it in Komodo edit. This is annoying as all hell.

Read the article
Re-encoding a video file and increasing the size -- does this improve the quality?

- by Josh

If I have a video file at 320x240 resolution which I want to re-encode (because I don't like the encoding it's in now) and I also want to play it at double size (640x480), will I get higher quality if I scale it up to 640x480 when I convert it to a new format, verses keeping it at 320x240 in the new format and playing it at double size? This probably depends on the program used to convert, and if so, please let me know any program which might increase the quality. Here's my thinking. If I play a 320x240 file at double size, the system has to scale up each frame in real time, whereas if I scale up while recompressing the system may be able to use a more intensive algorythm like Bicubic interpolation . However I am not sure if this is true or not.

Read the article
How to cross-reference many character encodings with ASCII OR UTFx?

- by Garet Claborn

I'm working with a binary structure, the goal of which is to index the significance of specific bits for any character encoding so that we may trigger events while doing specific checks against the profile. Each character encoding scheme has an associated system record. This record's leading value will be a C++ unsigned long long binary value and signifies the length, in bits, of encoded characters. Following the length are three values, each is a bit field of that length. offset_mask - defines the occurrence of non-printable characters within the min,max of print_mask range_mask - defines the occurrence of the most popular 50% of printable characters print_mask - defines the occurrence value of printable characters The structure of profiles has changed from the op of this question. Most likely I will try to factorize or compress these values in the long-term instead of starting out with ranges after reading more. I have to write some of the core functionality for these main reasons. It has to fit into a particular event architecture we are using, Better understanding of character encoding. I'm about to need it. Integrating into non-linear design is excluding many libraries without special hooks. I'm unsure if there is a standard, cross-encoding mechanism for communicating such data already. I'm just starting to look into how chardet might do profiling as suggested by @amon. The Unicode BOM would be easily enough (for my current project) if all encodings were Unicode. Of course ideally, one would like to support all encodings, but I'm not asking about implementation - only the general case. How can these profiles be efficiently populated, to produce a set of bitmasks which we can use to match strings with common characters in multiple languages? If you have any editing suggestions please feel free, I am a lightweight when it comes to localization, which is why I'm trying to reach out to the more experienced. Any caveats you may be able to help with will be appreciated.

Read the article
Unable to debug an encodded javascript?

- by miles away

I’m having some problems debugging an encoded javacscript. This script I’m referring to given in this link over here. The encoding here is simple and it works by shifting the unicodes values to whatever Codekey was use during encoding. The code that does the decoding is given here in plain English below:- <script language="javascript"> function dF(s){ var s1=unescape(s.substr(0,s.length-1)); var t=''; for(i=0;i<s1.length;i++)t+=String.fromCharCode(s1.charCodeAt(i)-s.substr(s.length-1,1)); document.write(unescape(t)); } </script> I’m interested in knowing or understanding the values (e.g s1,t). Like for example when the value of i=0 what values would the following attributes / method would hold s1.charCodeAt(i) and s.substr(s.length-1,1) The reason I’m doing this is to understand as to how a CodeKey function really works. I don’t see anything in the code above which tells it to decode on the basis of codekey value. The only thing I can point in the encoding text is the last character which is set to 1 , 2 ,3 or 4 depending upon the codekey selected during encoding process. One can verify using the link I have given above. However, to debug, I’m using firebug addon with the script running as localhost on my wamp server. I’m able to put a breakpoint on the js using firebug but I’m unable to retrieve any of the user defined parameters or functions I mentioned above. I want to know under this context what would be best way to debug this encoded js.

Read the article
GPU On-the-fly encoding of video through Logitech HD Webcam C510

- by Ashfame

Originally asked here but I edited that to move this question as a different one on suggestion by a fellow member. I read that I should have a Core2Duo 2.2Ghz for 720p but I have a 2.0Ghz one, would it be possible for me to first record it and then encode it after recording if my processor really start giving issue when doing on-the-fly encoding? I also have a ATI HD 4850 512MB card, if it can help in encoding on-the-fly or is there a chance that my graphics card alone can handle it and those specs were just for a system without a graphics card? I believe so. Also, I got no worries in dealing with console, if I have to do some of the things above in terminal. Other possible significant details: I have a dual screen setup 29" (1360X768) & 22" (1680X1050) which might be using some good power from GPU and I have 2GB DDR2 800Mhz RAM.

Read the article
What encoding does InstallShield expect non-latin-alphabet string table entries to use?

- by DNS

I work on an app that gets distributed via a single installer containing multiple localizations. The build process includes a script that updates the .ism string table with translations for each supported language. This works fine for languages like French and German. But when testing the installer in, i.e. Japanese, the text shows up as a series of squares. It's unlikely to be a font problem, since the InstallShield-supplied strings show up fine; only the string table entries are mangled. So the problem seems to be that the strings are in the wrong encoding. The .ism is in XML format, with UTF-8 declared as its encoding, so I assumed the strings needed to be UTF-8 encoded as well. Do they actually need to use the encoding of the target platform? Is there any concern, then, about targets having different encodings, i.e. Chinese systems using one GB-encoding versus another? What is the right thing to do here?

Read the article
How to test an application for correct encoding (e.g. UTF-8)

- by Olaf

Encoding issues are among the one topic that have bitten me most often during development. Every platform insists on its own encoding, most likely some non-UTF-8 defaults are in the game. (I'm usually working on Linux, defaulting to UTF-8, my colleagues mostly work on german Windows, defaulting to ISO-8859-1 or some similar windows codepage) I believe, that UTF-8 is a suitable standard for developing an i18nable application. However, in my experience encoding bugs are usually discovered late (even though I'm located in Germany and we have some special characters that along with ISO-8859-1 provide some detectable differences). I believe that those developers with a completely non-ASCII character set (or those that know a language that uses such a character set) are getting a head start in providing test data. But there must be a way to ease this for the rest of us as well. What [technique|tool|incentive] are people here using? How do you get your co-developers to care for these issues? How do you test for compliance? Are those tests conducted manually or automatically? Adding one possible answer upfront: I've recently discovered fliptitle.com (they are providing an easy way to get weird characters written "u?op ?pisdn" *) and I'm planning on using them to provide easily verifiable UTF-8 character strings (as most of the characters used there are at some weird binary encoding position) but there surely must be more systematic tests, patterns or techniques for ensuring UTF-8 compatibility/usage. Note: Even though there's an accepted answer, I'd like to know of more techniques and patterns if there are some. Please add more answers if you have more ideas. And it has not been easy choosing only one answer for acceptance. I've chosen the regexp answer for the least expected angle to tackle the problem although there would be reasons to choose other answers as well. Too bad only one answer can be accepted. Thank you for your input. *) that's "upside down" written "upside down" for those that cannot see those characters due to font problems

Read the article
IWebBrowser: How to specify the encoding when loading html from a stream?

- by Ian Boyd

Using the concepts from the sample code provided by Microsoft for loading HTML content into an IWebBrowser from an IStream using the web browser's IPersistStreamInit interface: HRESULT LoadWebBrowserFromStream(IWebBrowser* pWebBrowser, IStream* pStream) { [snip] } How can one specify the encoding of the html inside the IStream? The IStream will contain a series of bytes, but the problem is what do those bytes represent? They could, for example, contain bytes where: each byte represents a character from the current Windows code-page (e.g. 1252) each byte could represent a character from the ISO-8859-1 character set the bytes could represent UTF-8 encoded characters every 2 bytes could represent a character, using UTF-16 encoding In my particular case, i am providing the IWebBrowser an IStream that contains a series of double-bytes characters (UTF-16), but the browser (incorrectly) believes that UTF-8 encoding is in effect. This results in garbled characters.

Read the article
How can I specify the character encoding to be used by OLEDB when querying a DBF?

- by Manga Lee

Is it possible to specify which character encoding should be used by OLEDB when querying a DBF file? A possible work-around would be to encode the query string before the OLEDB call to the DBF file's character encoding and then encode all the results when they are returned. This will work but it would be nice if OLEDB or possibly ADO.NET could do this for me. UPDATE The suggestion by Viktor Jevdokimov does not seem to work automatically. But it made me investigate manual conversion of the strings. It is possible to use the TextInfo property of CultureInfo to find out the OemCodePage and the WindowsCodePage and use those to get the corresponding Encoding instances to perform manual conversion. But I can not get ADO.NET use these encondings to perform the conversion for me.

Read the article
Set a script to automatically detect character encoding in a plain-text-file in Python?

- by Haidon

I've set up a script that basically does a large-scale find-and-replace on a plain text document. At the moment it works fine with ASCII, UTF-8, and UTF-16 (and possibly others, but I've only tested these three) encoded documents so long as the encoding is specified inside the script (the example code below specifies UTF-16). Is there a way to make the script automatically detect which of these character encodings is being used in the input file and automatically set the character encoding of the output file the same as the encoding used on the input file? findreplace = [ ('term1', 'term2'), ] inF = open(infile,'rb') s=unicode(inF.read(),'utf-16') inF.close() for couple in findreplace: outtext=s.replace(couple[0],couple[1]) s=outtext outF = open(outFile,'wb') outF.write(outtext.encode('utf-16')) outF.close() Thanks!

Read the article
Why does Python print unicode characters when the default encoding is ASCII?

- by mike

From the Python 2.6 shell: >>> import sys >>> print sys.getdefaultencoding() ascii >>> print u'\xe9' é >>> I expected to have either some gibberish or an Error after the print statement, since the "é" character isn't part of ASCII and I haven't specified an encoding. I guess I don't understand what ASCII being the default encoding means.

Read the article

< Previous Page | 8 9 10 11 12 13 14 15 16 17 18 19 | Next Page >