Search Results

Search found 4647 results on 186 pages for 'localizable strings'.

Page 55/186 | < Previous Page | 51 52 53 54 55 56 57 58 59 60 61 62  | Next Page >

  • Diffing file contents

    - by PHeiberg
    I have two plain text files, each file is containing a list of strings sorted alphabetically with one string per line. I want to diff the files and have an output of all strings that exist only in file2. Preferabbly I want the operation to be possible without any 3rd party tools, or with a minimum of installations of tools that is "normal" to find in a windows command line environment, such as GNU Diffutils, Powershell, etc. The output should be in text form (file or as command line output). Example: File 1 contents: A C D File 2 contents: A B C E Result wanted: B E

    Read the article

  • Why might a System.String object not cache its hash code?

    - by Dan Tao
    A glance at the source code for string.GetHashCode using Reflector reveals the following (for mscorlib.dll version 4.0): public override unsafe int GetHashCode() { fixed (char* str = ((char*) this)) { char* chPtr = str; int num = 0x15051505; int num2 = num; int* numPtr = (int*) chPtr; for (int i = this.Length; i > 0; i -= 4) { num = (((num << 5) + num) + (num >> 0x1b)) ^ numPtr[0]; if (i <= 2) { break; } num2 = (((num2 << 5) + num2) + (num2 >> 0x1b)) ^ numPtr[1]; numPtr += 2; } return (num + (num2 * 0x5d588b65)); } } Now, I realize that the implementation of GetHashCode is not specified and is implementation-dependent, so the question "is GetHashCode implemented in the form of X or Y?" is not really answerable. I'm just curious about a few things: If Reflector has disassembled the DLL correctly and this is the implementation of GetHashCode (in my environment), am I correct in interpreting this code to indicate that a string object, based on this particular implementation, would not cache its hash code? Assuming the answer is yes, why would this be? It seems to me that the memory cost would be minimal (one more 32-bit integer, a drop in the pond compared to the size of the string itself) whereas the savings would be significant, especially in cases where, e.g., strings are used as keys in a hashtable-based collection like a Dictionary<string, [...]>. And since the string class is immutable, it isn't like the value returned by GetHashCode will ever even change. What could I be missing? UPDATE: In response to Andras Zoltan's closing remark: There's also the point made in Tim's answer(+1 there). If he's right, and I think he is, then there's no guarantee that a string is actually immutable after construction, therefore to cache the result would be wrong. Whoa, whoa there! This is an interesting point to make (and yes it's very true), but I really doubt that this was taken into consideration in the implementation of GetHashCode. The statement "therefore to cache the result would be wrong" implies to me that the framework's attitude regarding strings is "Well, they're supposed to be immutable, but really if developers want to get sneaky they're mutable so we'll treat them as such." This is definitely not how the framework views strings. It fully relies on their immutability in so many ways (interning of string literals, assignment of all zero-length strings to string.Empty, etc.) that, basically, if you mutate a string, you're writing code whose behavior is entirely undefined and unpredictable. I guess my point is that for the author(s) of this implementation to worry, "What if this string instance is modified between calls, even though the class as it is publicly exposed is immutable?" would be like for someone planning a casual outdoor BBQ to think to him-/herself, "What if someone brings an atomic bomb to the party?" Look, if someone brings an atom bomb, party's over.

    Read the article

  • How to install/change locale on Debian?

    - by Hongli Lai
    I've written a web application for which the user interface is in Dutch. I use the system's date and time routines to format date strings in the application. However, the date strings that the system formats are in English but I want them in Dutch, so I need to set the system's locale. How do I do that on Debian? I tried setting LC_ALL=nl_NL but it doesn't seem to have any effect: $ date Sat Aug 15 14:31:31 UTC 2009 $ LC_ALL=nl_NL date Sat Aug 15 14:31:36 UTC 2009 I remember that setting LC_ALL on my Ubuntu desktop system works fine. Do I need to install extra packages to make this work, or am I doing it entirely wrong?

    Read the article

  • Replace special text with sed?

    - by user143822
    I'm using CMD on Windows Xp to replace special text with Sed. I'm using this command for replace special characters like $ or * : sed -i "s/\*/123/g;" 1.txt But how command must i use to replace this strings with ciao! in my text files? Is possible? \\ \\\ "" sed.exe -i "s/{\*)(//123/ sed -i "s/\\/123/g;" 1.txt the previous command does not work because i have \, " and other special strings that sed use to make regex.

    Read the article

  • How can I import a CSV file into an SQLite table that doesn't allow null values?

    - by Philip
    I've been using the SQLite Manager extension for Firefox to edit my Chrome web data file in order to restore my keyword searches, and I think I have everything in place except that when I import a CSV file into a table, (a) it won't import into the actual table because the table doesn't allow null values (b) if I import it into a new table that does allow null values, then all the empty strings end up as null and I have to manually edit the row, type and delete a character, and then it's set to empty string and it's fine So: is there either a way to import the CSV so that empty cells are automatically turned into empty strings instead of null, OR is there a way once a table is imported that has null values to convert it into one that doesn't allow null values, where each formerly null value is the empty string? Thanks!

    Read the article

  • Regular expression in mySQL [migrated]

    - by Rayne
    I have a mysql table that has 2 columns - Column 1 contains a string value, and Column 2 contains the number of times that string value occurred. I'm trying to find the string abc.X.def, where the beginning of the string is "abc.", followed by one or more characters, then the string ".def". There could be more characters following ".def". How can I find such strings, then add the occurrence of such strings and display the results? For example, if I have abc.111.def23 1 abc.111.def 2 abc.22.def444 1 abc.111.def 1 Then I will get abc.111.def23 1 abc.111.def 3 abc.22.def444 1 Thank you.

    Read the article

  • How to reliably map vSphere disks <-> Linux devices

    - by brianmcgee
    Task at hand After a virtual disk has been added to a Linux VM on vSphere 5, we need to identify the disks in order to automate the LVM storage provision. The virtual disks may reside on different datastores (e.g. sas or flash) and although they may be of the same size, their speed may vary. So I need a method to map the vSphere disks to Linux devices. Ideas Through the vSphere API, I am able to get the device info: Data Object Type: VirtualDiskFlatVer2BackingInfo Parent Managed Object ID: vm-230 Property Path: config.hardware.device[2000].backing Properties Name Type Value ChangeId string Unset contentId string "d58ec8c12486ea55c6f6d913642e1801" datastore ManagedObjectReference:Datastore datastore-216 (W5-CFAS012-Hybrid-CL20-004) deltaDiskFormat string "redoLogFormat" deltaGrainSize int Unset digestEnabled boolean false diskMode string "persistent" dynamicProperty DynamicProperty[] Unset dynamicType string Unset eagerlyScrub boolean Unset fileName string "[W5-CFAS012-Hybrid-CL20-004] l****9-000001.vmdk" parent VirtualDiskFlatVer2BackingInfo parent split boolean false thinProvisioned boolean false uuid string "6000C295-ab45-704e-9497-b25d2ba8dc00" writeThrough boolean false And on Linux I may read the uuid strings: [root@lx***** ~]# lsscsi -t [1:0:0:0] cd/dvd ata: /dev/sr0 [2:0:0:0] disk sas:0x5000c295ab45704e /dev/sda [3:0:0:0] disk sas:0x5000c2932dfa693f /dev/sdb [3:0:1:0] disk sas:0x5000c29dcd64314a /dev/sdc As you can see, the uuid string of disk /dev/sda looks somehow familiar to the string that is visible in the VMware API. Only the first hex digit is different (5 vs. 6) and it is only present to the third hyphen. So this looks promising... Alternative idea Select disks by controller. But is it reliable that the ascending SCSI Id also matches the next vSphere virtual disk? What happens if I add another DVD-ROM drive / USB Thumb drive? This will probably introduce new SCSI devices in between. Thats the cause why I think I will discard this idea. Questions Does someone know an easier method to map vSphere disks and Linux devices? Can someone explain the differences in the uuid strings? (I think this has something to do with SAS adressing initiator and target... WWN like...) May I reliably map devices by using those uuid strings? How about SCSI virtual disks? There is no uuid visible then... This task seems to be so obvious. Why doesn't Vmware think about this and simply add a way to query the disk mapping via Vmware Tools?

    Read the article

  • VBA: Parse preceding numbers from string

    - by buttonsrtoys
    I need to parse into two substrings a string that always starts with numeric text followed by alphnumeric text. The strings can vary a bit, but not too much. Below are examples of incoming format and the strings I need: "00 10 50 Information to Bidders" ==> "00 10 50", "Information to Bidders" "001050 Information to Bidders" ==> "001050", "Information to Bidders" "00 10 50 - Information to Bidders" ==> "00 10 50", "Information to Bidders" "001050 -- Information to Bidders" ==> "001050", "Information to Bidders" I was hoping it would only be a half dozen lines of VBA, but my code is turning into a loop where I'm testing every character in the string to see where the changeover from numeric-only to non-numeric, then parsing the string based on the changeover location. Not a big deal, but messier than I was hoping for. Are there VBA functions that would eliminate the need to iterate through each string character?

    Read the article

  • Random BSOD ntoskrnl.exe (Windows 7)

    - by nordbjerg
    I get BSOD at random times and have been for a while now on my Windows 7 machine. It is really new, and I already tried wiping the graphic card drivers and installing them again (making sure that they are of course up to date). I get a variety of bug check strings on a few different drivers. The single file that is in all of my BSODs is ntoskrnl.exe Bug Check Strings - SYSTEM_SERVICE_EXCEPTION - KMODE_EXCEPTION_NOT_HANDLED - DRIVER_IRQL_NOT_LESS_OR_EQUAL - SYSTEM_THREAD_EXCEPTION_NOT_HANDLED - PAGE_FAULT_IN_NONPAGED_AREA I would rather not resort to getting a completely new PC, as I have already thrown a lot of money on my current one. Here is a .zip file with my dumps.

    Read the article

  • Clang warning flags for Objective-C development

    - by Macmade
    As a C & Objective-C programmer, I'm a bit paranoid with the compiler warning flags. I usually try to find a complete list of warning flags for the compiler I use, and turn most of them on, unless I have a really good reason not to turn it on. I personally think this may actually improve coding skills, as well as potential code portability, prevent some issues, as it forces you to be aware of every little detail, potential implementation and architecture issues, and so on... It's also in my opinion a good every day learning tool, even if you're an experienced programmer. For the subjective part of this question, I'm interested in hearing other developers (mainly C, Objective-C and C++) about this topic. Do you actually care about stuff like pedantic warnings, etc? And if yes or no, why? Now about Objective-C, I recently completely switched to the LLVM toolchain (with Clang), instead of GCC. On my production code, I usually set this warning flags (explicitly, even if some of them may be covered by -Wall): -Wall -Wbad-function-cast -Wcast-align -Wconversion -Wdeclaration-after-statement -Wdeprecated-implementations -Wextra -Wfloat-equal -Wformat=2 -Wformat-nonliteral -Wfour-char-constants -Wimplicit-atomic-properties -Wmissing-braces -Wmissing-declarations -Wmissing-field-initializers -Wmissing-format-attribute -Wmissing-noreturn -Wmissing-prototypes -Wnested-externs -Wnewline-eof -Wold-style-definition -Woverlength-strings -Wparentheses -Wpointer-arith -Wredundant-decls -Wreturn-type -Wsequence-point -Wshadow -Wshorten-64-to-32 -Wsign-compare -Wsign-conversion -Wstrict-prototypes -Wstrict-selector-match -Wswitch -Wswitch-default -Wswitch-enum -Wundeclared-selector -Wuninitialized -Wunknown-pragmas -Wunreachable-code -Wunused-function -Wunused-label -Wunused-parameter -Wunused-value -Wunused-variable -Wwrite-strings I'm interested in hearing what other developers have to say about this. For instance, do you think I missed a particular flag for Clang (Objective-C), and why? Or do you think a particular flag is not useful (or not wanted at all), and why?

    Read the article

  • Why not to use StackTrace to find what method called you

    - by Alex.Davies
    Our obfuscator, SmartAssembly, does some pretty crazy reflection. It's an obfuscator, it's sort of its job to do things in the most awkward way possible. But sometimes, you can go too far. One such time is this little gem from the strings encoding feature: StackTrace stackTrace = new StackTrace(); StackFrame frame = stackTrace.GetFrame(1); Type ownerType = frame.GetMethod().DeclaringType; It's designed to find the type where the calling method is defined. A user found that strings encoding occasionally broke on x64 systems. Very strange. After some debugging (thank god for Reflector Pro, it would be impossible to debug processed assemblies without it) I found that the ownerType I got back was wrong. The reason is that the x64 JIT does tail call optimisation. This saves space on the stack, and speeds things up, by throwing away a method's stack frame if the last thing that it calls is the only thing returned. When this happens, the call to StackTrace faithfully tells you that the calling method is the one that called the one we really wanted. So using StackTrace isn't safe for anything other than debugging, and it will make your code fail in unpredictable ways. Don't use it!

    Read the article

  • Cannot mount USB 3 harddrive

    - by Thijs
    I cannot mount an USB 3 harddrive. When looking at dmesg I got the following error: [ 96.463269] usb 3-2.1: >new low-speed USB device number 10 using xhci_hcd [ 96.485777] usb 3-2.1: >New USB device found, idVendor=046d, idProduct=c025 [ 96.485787] usb 3-2.1: >New USB device strings: Mfr=1, Product=2, SerialNumber=0 [ 96.485792] usb 3-2.1: >Product: USB-PS/2 Optical Mouse [ 96.485797] usb 3-2.1: >Manufacturer: B16_b_02 [ 96.486118] usb 3-2.1: >ep 0x81 - rounding interval to 64 microframes, ep desc says 80 microframes [ 96.490149] input: B16_b_02 USB-PS/2 Optical Mouse as /devices/pci0000:00/0000:00:14.0/usb3/3-2/3-2.1/3-2.1:1.0/input/input12 [ 96.490500] hid-generic 0003:046D:C025.0003: >input,hidraw0: USB HID v1.10 Mouse [B16_b_02 USB-PS/2 Optical Mouse] on usb-0000:00:14.0-2.1/input0 [ 114.088984] usb 3-2.3: >new high-speed USB device number 11 using xhci_hcd [ 114.105041] usb 3-2.3: >New USB device found, idVendor=13fd, idProduct=1618 [ 114.105051] usb 3-2.3: >New USB device strings: Mfr=0, Product=0, SerialNumber=0 [ 257.531777] usb 3-2.3: >USB disconnect, device number 11 [ 258.513912] usb 3-2.4: >new high-speed USB device number 12 using xhci_hcd [ 258.514046] usb 3-2.4: >Device not responding to set address. [ 258.717649] usb 3-2.4: >Device not responding to set address. [ 258.921203] usb 3-2.4: >device not accepting address 12, error -71 [ 258.937388] hub 3-2:1.0: >unable to enumerate USB device on port 4 I have tried to mount the drive on ubuntu-12.04 and it mounts just fine.

    Read the article

  • Language parsing to find important words

    - by Matt Huggins
    I'm looking for some input and theory on how to approach a lexical topic. Let's say I have a collection of strings, which may just be one sentence or potentially multiple sentences. I'd like to parse these strings to and rip out the most important words, perhaps with a score that denotes how likely the word is to be important. Let's look at a few examples of what I mean. Example #1: "I really want a Keurig, but I can't afford one!" This is a very basic example, just one sentence. As a human, I can easily see that "Keurig" is the most important word here. Also, "afford" is relatively important, though it's clearly not the primary point of the sentence. The word "I" appears twice, but it is not important at all since it doesn't really tell us any information. I might expect to see a hash of word/scores something like this: "Keurig" => 0.9 "afford" => 0.4 "want" => 0.2 "really" => 0.1 etc... Example #2: "Just had one of the best swimming practices of my life. Hopefully I can maintain my times come the competition. If only I had remembered to take of my non-waterproof watch." This example has multiple sentences, so there will be more important words throughout. Without repeating the point exercise from example #1, I would probably expect to see two or three really important words come out of this: "swimming" (or "swimming practice"), "competition", & "watch" (or "waterproof watch" or "non-waterproof watch" depending on how the hyphen is handled). Given a couple examples like this, how would you go about doing something similar? Are there any existing (open source) libraries or algorithms in programming that already do this?

    Read the article

  • Algorithmic problem - quickly finding all #'s where value %x is some given value

    - by Steve B.
    Problem I'm trying to solve, apologies in advance for the length: Given a large number of stored records, each with a unique (String) field S. I'd like to be able to find through an indexed query all records where Hash(S) % N == K for any arbitrary N, K (e.g. given a million strings, find all strings where HashCode(s) % 17 = 5. Is there some way of memoizing this so that we can quickly answer any question of this form without doing the % on every value? The motivation for this is a system of N distributed nodes, where each record has to be assigned to at least one node. The nodes are numbered 0 - (K-1) , and each node has to load up all of the records that match it's number: If we have 3 nodes Node 0 loads all records where Hash % 3 ==0 Node 1 loads all records where Hash % 3 ==1 Node 2 loads all records where Hash % 3 ==2 adding a 4th node, obviously all the assignments have to be recomputed - Node 0 loads all records where Hash % 4 ==0 ... etc I'd like to easily find these records through an indexed query without having to compute the mod individually. The best I've been able to come up with so far: If we take the prime factors of N (p1 * p2 * ... ) if N % M == I then p % M == I % p for all of N's prime factors e.g. 10 nodes : N % 10 == 6 then N % 2 = 0 == 6 %2 N % 5 = 1 == 6 %5 so storing an array of the "%" of N for the first "reasonable" number of primes for my data set should be helpful. For example in the above example we store the hash and the primes HASH PRIMES (array of %2, %3, %5, %7, ... ]) 16 [0 1 1 2 .. ] so looking for N%10 == 6 is equivalent to looking for all values where array[1]==1 and array[2] == 1. However, this breaks at the first prime larger than the highest number I'm storing in the factor table. Is there a better way?

    Read the article

  • Is this high coupling?

    - by Bono
    Question I'm currently working a on an assignment for school. The assignment is to create a puzzle/calculator program in which you learn how to work with different datastructures (such as Stacks). We have generate infix math strings suchs as "1 + 2 * 3 - 4" and then turn them in to postfix math strings such as "1 2 + 3 * 4 -". In my book the author creates a special class for converting the infix notation to postfix. I was planning on using this but whilst I was about to implement it I was wondering if the following is what you would call "high coupling". I have read something about this (nothing that is taught in the book or anything) and was wondering about the aspect (since I still have to grasp it). Problem I have created a PuzzleGenerator class which generates the infix notation of the puzzle (or math string, whatever you want to call it) when it's instantiated. I was going to make a method getAnswer() in which I would instantiate the InToPost class (the class from the book) to convert the infix to postfox notation and then calculate the answer. But whilst doing this I thought: "Is using the InToPost class inside this method a form a high coupling, and would it be better to place this in a different method?" (such as a "convertPostfixToInfix" method, inside the PuzzleGenerator class) Thanks in advance.

    Read the article

  • Are specific types still necessary?

    - by MKO
    One thing that occurred to me the other day, are specific types still necessary or a legacy that is holding us back. What I mean is: do we really need short, int, long, bigint etc etc. I understand the reasoning, variables/objects are kept in memory, memory needs to be allocated and therefore we need to know how big a variable can be. But really, shouldn't a modern programming language be able to handle "adaptive types", ie, if something is only ever allocated in the shortint range it uses fewer bytes, and if something is suddenly allocated a very big number the memory is allocated accordinly for that particular instance. Float, real and double's are a bit trickier since the type depends on what precision you need. Strings should however be able to take upp less memory in many instances (in .Net) where mostly ascii is used buth strings always take up double the memory because of unicode encoding. One argument for specific types might be that it's part of the specification, ie for example a variable should not be able to be bigger than a certain value so we set it to shortint. But why not have type constraints instead? It would be much more flexible and powerful to be able to set permissible ranges and values on variables (and properties). I realize the immense problem in revamping the type architecture since it's so tightly integrated with underlying hardware and things like serialization might become tricky indeed. But from a programming perspective it should be great no?

    Read the article

  • High-level strategy for distinguishing a regular string from invalid JSON (ie. JSON-like string detection)

    - by Jonline
    Disclaimer On Absence of Code: I have no code to post because I haven't started writing; was looking for more theoretical guidance as I doubt I'll have trouble coding it but am pretty befuddled on what approach(es) would yield best results. I'm not seeking any code, either, though; just direction. Dilemma I'm toying with adding a "magic method"-style feature to a UI I'm building for a client, and it would require intelligently detecting whether or not a string was meant to be JSON as against a simple string. I had considered these general ideas: Look for a sort of arbitrarily-determined acceptable ratio of the frequency of JSON-like syntax (ie. regex to find strings separated by colons; look for colons between curly-braces, etc.) to the number of quote-encapsulated strings + nulls, bools and ints/floats. But the smaller the data set, the more fickle this would get look for key identifiers like opening and closing curly braces... not sure if there even are more easy identifiers, and this doesn't appeal anyway because it's so prescriptive about the kinds of mistakes it could find try incrementally parsing chunks, as those between curly braces, and seeing what proportion of these fractional statements turn out to be valid JSON; this seems like it would suffer less than (1) from smaller datasets, but would probably be much more processing-intensive, and very susceptible to a missing or inverted brace Just curious if the computational folks or algorithm pros out there had any approaches in mind that my semantics-oriented brain might have missed. PS: It occurs to me that natural language processing, about which I am totally ignorant, might be a cool approach; but, if NLP is a good strategy here, it sort of doesn't matter because I have zero experience with it and don't have time to learn & then implement/ this feature isn't worth it to the client.

    Read the article

  • SQLite, python, unicode, and non-utf data

    - by Nathan Spears
    I started by trying to store strings in sqlite using python, and got the message: sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings. Ok, I switched to Unicode strings. Then I started getting the message: sqlite3.OperationalError: Could not decode to UTF-8 column 'tag_artist' with text 'Sigur Rós' when trying to retrieve data from the db. More research and I started encoding it in utf8, but then 'Sigur Rós' starts looking like 'Sigur Rós' note: My console was set to display in 'latin_1' as @John Machin pointed out. What gives? After reading this, describing exactly the same situation I'm in, it seems as if the advice is to ignore the other advice and use 8-bit bytestrings after all. I didn't know much about unicode and utf before I started this process. I've learned quite a bit in the last couple hours, but I'm still ignorant of whether there is a way to correctly convert 'ó' from latin-1 to utf-8 and not mangle it. If there isn't, why would sqlite 'highly recommend' I switch my application to unicode strings? I'm going to update this question with a summary and some example code of everything I've learned in the last 24 hours so that someone in my shoes can have an easy(er) guide. If the information I post is wrong or misleading in any way please tell me and I'll update, or one of you senior guys can update. Summary of answers Let me first state the goal as I understand it. The goal in processing various encodings, if you are trying to convert between them, is to understand what your source encoding is, then convert it to unicode using that source encoding, then convert it to your desired encoding. Unicode is a base and encodings are mappings of subsets of that base. utf_8 has room for every character in unicode, but because they aren't in the same place as, for instance, latin_1, a string encoded in utf_8 and sent to a latin_1 console will not look the way you expect. In python the process of getting to unicode and into another encoding looks like: str.decode('source_encoding').encode('desired_encoding') or if the str is already in unicode str.encode('desired_encoding') For sqlite I didn't actually want to encode it again, I wanted to decode it and leave it in unicode format. Here are four things you might need to be aware of as you try to work with unicode and encodings in python. The encoding of the string you want to work with, and the encoding you want to get it to. The system encoding. The console encoding. The encoding of the source file Elaboration: (1) When you read a string from a source, it must have some encoding, like latin_1 or utf_8. In my case, I'm getting strings from filenames, so unfortunately, I could be getting any kind of encoding. Windows XP uses UCS-2 (a Unicode system) as its native string type, which seems like cheating to me. Fortunately for me, the characters in most filenames are not going to be made up of more than one source encoding type, and I think all of mine were either completely latin_1, completely utf_8, or just plain ascii (which is a subset of both of those). So I just read them and decoded them as if they were still in latin_1 or utf_8. It's possible, though, that you could have latin_1 and utf_8 and whatever other characters mixed together in a filename on Windows. Sometimes those characters can show up as boxes, other times they just look mangled, and other times they look correct (accented characters and whatnot). Moving on. (2) Python has a default system encoding that gets set when python starts and can't be changed during runtime. See here for details. Dirty summary ... well here's the file I added: \# sitecustomize.py \# this file can be anywhere in your Python path, \# but it usually goes in ${pythondir}/lib/site-packages/ import sys sys.setdefaultencoding('utf_8') This system encoding is the one that gets used when you use the unicode("str") function without any other encoding parameters. To say that another way, python tries to decode "str" to unicode based on the default system encoding. (3) If you're using IDLE or the command-line python, I think that your console will display according to the default system encoding. I am using pydev with eclipse for some reason, so I had to go into my project settings, edit the launch configuration properties of my test script, go to the Common tab, and change the console from latin-1 to utf-8 so that I could visually confirm what I was doing was working. (4) If you want to have some test strings, eg test_str = "ó" in your source code, then you will have to tell python what kind of encoding you are using in that file. (FYI: when I mistyped an encoding I had to ctrl-Z because my file became unreadable.) This is easily accomplished by putting a line like so at the top of your source code file: # -*- coding: utf_8 -*- If you don't have this information, python attempts to parse your code as ascii by default, and so: SyntaxError: Non-ASCII character '\xf3' in file _redacted_ on line 81, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details Once your program is working correctly, or, if you aren't using python's console or any other console to look at output, then you will probably really only care about #1 on the list. System default and console encoding are not that important unless you need to look at output and/or you are using the builtin unicode() function (without any encoding parameters) instead of the string.decode() function. I wrote a demo function I will paste into the bottom of this gigantic mess that I hope correctly demonstrates the items in my list. Here is some of the output when I run the character 'ó' through the demo function, showing how various methods react to the character as input. My system encoding and console output are both set to utf_8 for this run: '?' = original char <type 'str'> repr(char)='\xf3' '?' = unicode(char) ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data 'ó' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Now I will change the system and console encoding to latin_1, and I get this output for the same input: 'ó' = original char <type 'str'> repr(char)='\xf3' 'ó' = unicode(char) <type 'unicode'> repr(unicode(char))=u'\xf3' 'ó' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Notice that the 'original' character displays correctly and the builtin unicode() function works now. Now I change my console output back to utf_8. '?' = original char <type 'str'> repr(char)='\xf3' '?' = unicode(char) <type 'unicode'> repr(unicode(char))=u'\xf3' '?' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Here everything still works the same as last time but the console can't display the output correctly. Etc. The function below also displays more information that this and hopefully would help someone figure out where the gap in their understanding is. I know all this information is in other places and more thoroughly dealt with there, but I hope that this would be a good kickoff point for someone trying to get coding with python and/or sqlite. Ideas are great but sometimes source code can save you a day or two of trying to figure out what functions do what. Disclaimers: I'm no encoding expert, I put this together to help my own understanding. I kept building on it when I should have probably started passing functions as arguments to avoid so much redundant code, so if I can I'll make it more concise. Also, utf_8 and latin_1 are by no means the only encoding schemes, they are just the two I was playing around with because I think they handle everything I need. Add your own encoding schemes to the demo function and test your own input. One more thing: there are apparently crazy application developers making life difficult in Windows. #!/usr/bin/env python # -*- coding: utf_8 -*- import os import sys def encodingDemo(str): validStrings = () try: print "str =",str,"{0} repr(str) = {1}".format(type(str), repr(str)) validStrings += ((str,""),) except UnicodeEncodeError as ude: print "Couldn't print the str itself because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print ude try: x = unicode(str) print "unicode(str) = ",x validStrings+= ((x, " decoded into unicode by the default system encoding"),) except UnicodeDecodeError as ude: print "ERROR. unicode(str) couldn't decode the string because the system encoding is set to an encoding that doesn't understand some character in the string." print "\tThe system encoding is set to {0}. See error:\n\t".format(sys.getdefaultencoding()), print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the unicode(str) because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print uee try: x = str.decode('latin_1') print "str.decode('latin_1') =",x validStrings+= ((x, " decoded with latin_1 into unicode"),) try: print "str.decode('latin_1').encode('utf_8') =",str.decode('latin_1').encode('utf_8') validStrings+= ((x, " decoded with latin_1 into unicode and encoded into utf_8"),) except UnicodeDecodeError as ude: print "The string was decoded into unicode using the latin_1 encoding, but couldn't be encoded into utf_8. See error:\n\t", print ude except UnicodeDecodeError as ude: print "Something didn't work, probably because the string wasn't latin_1 encoded. See error:\n\t", print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the str.decode('latin_1') because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print uee try: x = str.decode('utf_8') print "str.decode('utf_8') =",x validStrings+= ((x, " decoded with utf_8 into unicode"),) try: print "str.decode('utf_8').encode('latin_1') =",str.decode('utf_8').encode('latin_1') except UnicodeDecodeError as ude: print "str.decode('utf_8').encode('latin_1') didn't work. The string was decoded into unicode using the utf_8 encoding, but couldn't be encoded into latin_1. See error:\n\t", validStrings+= ((x, " decoded with utf_8 into unicode and encoded into latin_1"),) print ude except UnicodeDecodeError as ude: print "str.decode('utf_8') didn't work, probably because the string wasn't utf_8 encoded. See error:\n\t", print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the str.decode('utf_8') because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t",uee print print "Printing information about each character in the original string." for char in str: try: print "\t'" + char + "' = original char {0} repr(char)={1}".format(type(char), repr(char)) except UnicodeDecodeError as ude: print "\t'?' = original char {0} repr(char)={1} ERROR PRINTING: {2}".format(type(char), repr(char), ude) except UnicodeEncodeError as uee: print "\t'?' = original char {0} repr(char)={1} ERROR PRINTING: {2}".format(type(char), repr(char), uee) print uee try: x = unicode(char) print "\t'" + x + "' = unicode(char) {1} repr(unicode(char))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = unicode(char) ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = unicode(char) {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) try: x = char.decode('latin_1') print "\t'" + x + "' = char.decode('latin_1') {1} repr(char.decode('latin_1'))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = char.decode('latin_1') ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = char.decode('latin_1') {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) try: x = char.decode('utf_8') print "\t'" + x + "' = char.decode('utf_8') {1} repr(char.decode('utf_8'))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = char.decode('utf_8') ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = char.decode('utf_8') {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) print x = 'ó' encodingDemo(x) Much thanks for the answers below and especially to @John Machin for answering so thoroughly.

    Read the article

  • SQLite, python, unicode, and non-utf data

    - by Nathan Spears
    I started by trying to store strings in sqlite using python, and got the message: sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings. Ok, I switched to Unicode strings. Then I started getting the message: sqlite3.OperationalError: Could not decode to UTF-8 column 'tag_artist' with text 'Sigur Rós' when trying to retrieve data from the db. More research and I started encoding it in utf8, but then 'Sigur Rós' starts looking like 'Sigur Rós' note: My console was set to display in 'latin_1' as @John Machin pointed out. What gives? After reading this, describing exactly the same situation I'm in, it seems as if the advice is to ignore the other advice and use 8-bit bytestrings after all. I didn't know much about unicode and utf before I started this process. I've learned quite a bit in the last couple hours, but I'm still ignorant of whether there is a way to correctly convert 'ó' from latin-1 to utf-8 and not mangle it. If there isn't, why would sqlite 'highly recommend' I switch my application to unicode strings? I'm going to update this question with a summary and some example code of everything I've learned in the last 24 hours so that someone in my shoes can have an easy(er) guide. If the information I post is wrong or misleading in any way please tell me and I'll update, or one of you senior guys can update. Summary of answers Let me first state the goal as I understand it. The goal in processing various encodings, if you are trying to convert between them, is to understand what your source encoding is, then convert it to unicode using that source encoding, then convert it to your desired encoding. Unicode is a base and encodings are mappings of subsets of that base. utf_8 has room for every character in unicode, but because they aren't in the same place as, for instance, latin_1, a string encoded in utf_8 and sent to a latin_1 console will not look the way you expect. In python the process of getting to unicode and into another encoding looks like: str.decode('source_encoding').encode('desired_encoding') or if the str is already in unicode str.encode('desired_encoding') For sqlite I didn't actually want to encode it again, I wanted to decode it and leave it in unicode format. Here are four things you might need to be aware of as you try to work with unicode and encodings in python. The encoding of the string you want to work with, and the encoding you want to get it to. The system encoding. The console encoding. The encoding of the source file Elaboration: (1) When you read a string from a source, it must have some encoding, like latin_1 or utf_8. In my case, I'm getting strings from filenames, so unfortunately, I could be getting any kind of encoding. Windows XP uses UCS-2 (a Unicode system) as its native string type, which seems like cheating to me. Fortunately for me, the characters in most filenames are not going to be made up of more than one source encoding type, and I think all of mine were either completely latin_1, completely utf_8, or just plain ascii (which is a subset of both of those). So I just read them and decoded them as if they were still in latin_1 or utf_8. It's possible, though, that you could have latin_1 and utf_8 and whatever other characters mixed together in a filename on Windows. Sometimes those characters can show up as boxes, other times they just look mangled, and other times they look correct (accented characters and whatnot). Moving on. (2) Python has a default system encoding that gets set when python starts and can't be changed during runtime. See here for details. Dirty summary ... well here's the file I added: \# sitecustomize.py \# this file can be anywhere in your Python path, \# but it usually goes in ${pythondir}/lib/site-packages/ import sys sys.setdefaultencoding('utf_8') This system encoding is the one that gets used when you use the unicode("str") function without any other encoding parameters. To say that another way, python tries to decode "str" to unicode based on the default system encoding. (3) If you're using IDLE or the command-line python, I think that your console will display according to the default system encoding. I am using pydev with eclipse for some reason, so I had to go into my project settings, edit the launch configuration properties of my test script, go to the Common tab, and change the console from latin-1 to utf-8 so that I could visually confirm what I was doing was working. (4) If you want to have some test strings, eg test_str = "ó" in your source code, then you will have to tell python what kind of encoding you are using in that file. (FYI: when I mistyped an encoding I had to ctrl-Z because my file became unreadable.) This is easily accomplished by putting a line like so at the top of your source code file: # -*- coding: utf_8 -*- If you don't have this information, python attempts to parse your code as ascii by default, and so: SyntaxError: Non-ASCII character '\xf3' in file _redacted_ on line 81, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details Once your program is working correctly, or, if you aren't using python's console or any other console to look at output, then you will probably really only care about #1 on the list. System default and console encoding are not that important unless you need to look at output and/or you are using the builtin unicode() function (without any encoding parameters) instead of the string.decode() function. I wrote a demo function I will paste into the bottom of this gigantic mess that I hope correctly demonstrates the items in my list. Here is some of the output when I run the character 'ó' through the demo function, showing how various methods react to the character as input. My system encoding and console output are both set to utf_8 for this run: '?' = original char <type 'str'> repr(char)='\xf3' '?' = unicode(char) ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data 'ó' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Now I will change the system and console encoding to latin_1, and I get this output for the same input: 'ó' = original char <type 'str'> repr(char)='\xf3' 'ó' = unicode(char) <type 'unicode'> repr(unicode(char))=u'\xf3' 'ó' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Notice that the 'original' character displays correctly and the builtin unicode() function works now. Now I change my console output back to utf_8. '?' = original char <type 'str'> repr(char)='\xf3' '?' = unicode(char) <type 'unicode'> repr(unicode(char))=u'\xf3' '?' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Here everything still works the same as last time but the console can't display the output correctly. Etc. The function below also displays more information that this and hopefully would help someone figure out where the gap in their understanding is. I know all this information is in other places and more thoroughly dealt with there, but I hope that this would be a good kickoff point for someone trying to get coding with python and/or sqlite. Ideas are great but sometimes source code can save you a day or two of trying to figure out what functions do what. Disclaimers: I'm no encoding expert, I put this together to help my own understanding. I kept building on it when I should have probably started passing functions as arguments to avoid so much redundant code, so if I can I'll make it more concise. Also, utf_8 and latin_1 are by no means the only encoding schemes, they are just the two I was playing around with because I think they handle everything I need. Add your own encoding schemes to the demo function and test your own input. One more thing: there are apparently crazy application developers making life difficult in Windows. #!/usr/bin/env python # -*- coding: utf_8 -*- import os import sys def encodingDemo(str): validStrings = () try: print "str =",str,"{0} repr(str) = {1}".format(type(str), repr(str)) validStrings += ((str,""),) except UnicodeEncodeError as ude: print "Couldn't print the str itself because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print ude try: x = unicode(str) print "unicode(str) = ",x validStrings+= ((x, " decoded into unicode by the default system encoding"),) except UnicodeDecodeError as ude: print "ERROR. unicode(str) couldn't decode the string because the system encoding is set to an encoding that doesn't understand some character in the string." print "\tThe system encoding is set to {0}. See error:\n\t".format(sys.getdefaultencoding()), print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the unicode(str) because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print uee try: x = str.decode('latin_1') print "str.decode('latin_1') =",x validStrings+= ((x, " decoded with latin_1 into unicode"),) try: print "str.decode('latin_1').encode('utf_8') =",str.decode('latin_1').encode('utf_8') validStrings+= ((x, " decoded with latin_1 into unicode and encoded into utf_8"),) except UnicodeDecodeError as ude: print "The string was decoded into unicode using the latin_1 encoding, but couldn't be encoded into utf_8. See error:\n\t", print ude except UnicodeDecodeError as ude: print "Something didn't work, probably because the string wasn't latin_1 encoded. See error:\n\t", print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the str.decode('latin_1') because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print uee try: x = str.decode('utf_8') print "str.decode('utf_8') =",x validStrings+= ((x, " decoded with utf_8 into unicode"),) try: print "str.decode('utf_8').encode('latin_1') =",str.decode('utf_8').encode('latin_1') except UnicodeDecodeError as ude: print "str.decode('utf_8').encode('latin_1') didn't work. The string was decoded into unicode using the utf_8 encoding, but couldn't be encoded into latin_1. See error:\n\t", validStrings+= ((x, " decoded with utf_8 into unicode and encoded into latin_1"),) print ude except UnicodeDecodeError as ude: print "str.decode('utf_8') didn't work, probably because the string wasn't utf_8 encoded. See error:\n\t", print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the str.decode('utf_8') because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t",uee print print "Printing information about each character in the original string." for char in str: try: print "\t'" + char + "' = original char {0} repr(char)={1}".format(type(char), repr(char)) except UnicodeDecodeError as ude: print "\t'?' = original char {0} repr(char)={1} ERROR PRINTING: {2}".format(type(char), repr(char), ude) except UnicodeEncodeError as uee: print "\t'?' = original char {0} repr(char)={1} ERROR PRINTING: {2}".format(type(char), repr(char), uee) print uee try: x = unicode(char) print "\t'" + x + "' = unicode(char) {1} repr(unicode(char))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = unicode(char) ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = unicode(char) {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) try: x = char.decode('latin_1') print "\t'" + x + "' = char.decode('latin_1') {1} repr(char.decode('latin_1'))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = char.decode('latin_1') ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = char.decode('latin_1') {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) try: x = char.decode('utf_8') print "\t'" + x + "' = char.decode('utf_8') {1} repr(char.decode('utf_8'))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = char.decode('utf_8') ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = char.decode('utf_8') {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) print x = 'ó' encodingDemo(x) Much thanks for the answers below and especially to @John Machin for answering so thoroughly.

    Read the article

  • Use ASP.NET 4 Browser Definitions with ASP.NET 3.5

    - by Stephen Walther
    We updated the browser definitions files included with ASP.NET 4 to include information on recent browsers and devices such as Google Chrome and the iPhone. You can use these browser definition files with earlier versions of ASP.NET such as ASP.NET 3.5. The updated browser definition files, and instructions for installing them, can be found here: http://aspnet.codeplex.com/releases/view/41420 The changes in the browser definition files can cause backwards compatibility issues when you upgrade an ASP.NET 3.5 web application to ASP.NET 4. If you encounter compatibility issues, you can install the old browser definition files in your ASP.NET 4 application. The old browser definition files are included in the download file referenced above. What’s New in the ASP.NET 4 Browser Definition Files The complete set of browsers supported by the new ASP.NET 4 browser definition files is represented by the following figure:     If you look carefully at the figure, you’ll notice that we added browser definitions for several types of recent browsers such as Internet Explorer 8, Firefox 3.5, Google Chrome, Opera 10, and Safari 4. Furthermore, notice that we now include browser definitions for several of the most popular mobile devices: BlackBerry, IPhone, IPod, and Windows Mobile (IEMobile). The mobile devices appear in the figure with a purple background color. To improve performance, we removed a whole lot of outdated browser definitions for old cell phones and mobile devices. We also cleaned up the information contained in the browser files. Here are some of the browser features that you can detect: Are you a mobile device? <%=Request.Browser.IsMobileDevice %> Are you an IPhone? <%=Request.Browser.MobileDeviceModel == "IPhone" %> What version of JavaScript do you support? <%=Request.Browser["javascriptversion"] %> What layout engine do you use? <%=Request.Browser["layoutEngine"] %>   Here’s what you would get if you displayed the value of these properties using Internet Explorer 8: Here’s what you get when you use Google Chrome: Testing Browser Settings When working with browser definition files, it is useful to have some way to test the capability information returned when you request a page with different browsers. You can use the following method to return the HttpBrowserCapabilities the corresponds to a particular user agent string and set of browser headers: public HttpBrowserCapabilities GetBrowserCapabilities(string userAgent, NameValueCollection headers) { HttpBrowserCapabilities browserCaps = new HttpBrowserCapabilities(); Hashtable hashtable = new Hashtable(180, StringComparer.OrdinalIgnoreCase); hashtable[string.Empty] = userAgent; // The actual method uses client target browserCaps.Capabilities = hashtable; var capsFactory = new System.Web.Configuration.BrowserCapabilitiesFactory(); capsFactory.ConfigureBrowserCapabilities(headers, browserCaps); capsFactory.ConfigureCustomCapabilities(headers, browserCaps); return browserCaps; } At the end of this blog entry, there is a link to download a simple Visual Studio 2008 project – named Browser Definition Test -- that uses this method to display capability information for arbitrary user agent strings. For example, if you enter the user agent string for an iPhone then you get the results in the following figure: The Browser Definition Test application enables you to submit a user-agent string and display a table of browser capabilities information. The browser definition files contain sample user-agent strings for each browser definition. I got the iPhone user-agent string from the comments in the iphone.browser file. Enumerating Browser Definitions Someone asked in the comments whether or not there is a way to enumerate all of the browser definitions. You can do this if you ware willing to use a little reflection and read a private property. The browser definition files in the config\browsers folder get parsed into a class named BrowserCapabilitesFactory. After you run the aspnet_regbrowsers tool, you can see the source for this class in the config\browser folder by opening a file named BrowserCapsFactory.cs. The BrowserCapabilitiesFactoryBase class has a protected property named BrowserElements that represents a Hashtable of all of the browser definitions. Here's how you can read this protected property and display the ID for all of the browser definitions: var propInfo = typeof(BrowserCapabilitiesFactory).GetProperty("BrowserElements", BindingFlags.NonPublic | BindingFlags.Instance); Hashtable browserDefinitions = (Hashtable)propInfo.GetValue(new BrowserCapabilitiesFactory(), null); foreach (var key in browserDefinitions.Keys) { Response.Write("" + key); } If you run this code using Visual Studio 2008 then you get the following results: You get a huge number of outdated browsers and devices. In all, 449 browser definitions are listed. If you run this code using Visual Studio 2010 then you get the following results: In the case of Visual Studio 2010, all the old browsers and devices have been removed and you get only 19 browser definitions. Conclusion The updated browser definition files included in ASP.NET 4 provide more accurate information for recent browsers and devices. If you would like to test the new browser definitions with different user-agent strings then I recommend that you download the Browser Definition Test project: Browser Definition Test Project

    Read the article

  • Anatomy of a .NET Assembly - CLR metadata 2

    - by Simon Cooper
    Before we look any further at the CLR metadata, we need a quick diversion to understand how the metadata is actually stored. Encoding table information As an example, we'll have a look at a row in the TypeDef table. According to the spec, each TypeDef consists of the following: Flags specifying various properties of the class, including visibility. The name of the type. The namespace of the type. What type this type extends. The field list of this type. The method list of this type. How is all this data actually represented? Offset & RID encoding Most assemblies don't need to use a 4 byte value to specify heap offsets and RIDs everywhere, however we can't hard-code every offset and RID to be 2 bytes long as there could conceivably be more than 65535 items in a heap or more than 65535 fields or types defined in an assembly. So heap offsets and RIDs are only represented in the full 4 bytes if it is required; in the header information at the top of the #~ stream are 3 bits indicating if the #Strings, #GUID, or #Blob heaps use 2 or 4 bytes (the #US stream is not accessed from metadata), and the rowcount of each table. If the rowcount for a particular table is greater than 65535 then all RIDs referencing that table throughout the metadata use 4 bytes, else only 2 bytes are used. Coded tokens Not every field in a table row references a single predefined table. For example, in the TypeDef extends field, a type can extend another TypeDef (a type in the same assembly), a TypeRef (a type in a different assembly), or a TypeSpec (an instantiation of a generic type). A token would have to be used to let us specify the table along with the RID. Tokens are always 4 bytes long; again, this is rather wasteful of space. Cutting the RID down to 2 bytes would make each token 3 bytes long, which isn't really an optimum size for computers to read from memory or disk. However, every use of a token in the metadata tables can only point to a limited subset of the metadata tables. For the extends field, we only need to be able to specify one of 3 tables, which we can do using 2 bits: 0x0: TypeDef 0x1: TypeRef 0x2: TypeSpec We could therefore compress the 4-byte token that would otherwise be needed into a coded token of type TypeDefOrRef. For each type of coded token, the least significant bits encode the table the token points to, and the rest of the bits encode the RID within that table. We can work out whether each type of coded token needs 2 or 4 bytes to represent it by working out whether the maximum RID of every table that the coded token type can point to will fit in the space available. The space available for the RID depends on the type of coded token; a TypeOrMethodDef coded token only needs 1 bit to specify the table, leaving 15 bits available for the RID before a 4-byte representation is needed, whereas a HasCustomAttribute coded token can point to one of 18 different tables, and so needs 5 bits to specify the table, only leaving 11 bits for the RID before 4 bytes are needed to represent that coded token type. For example, a 2-byte TypeDefOrRef coded token with the value 0x0321 has the following bit pattern: 0 3 2 1 0000 0011 0010 0001 The first two bits specify the table - TypeRef; the other bits specify the RID. Because we've used the first two bits, we've got to shift everything along two bits: 000000 1100 1000 This gives us a RID of 0xc8. If any one of the TypeDef, TypeRef or TypeSpec tables had more than 16383 rows (2^14 - 1), then 4 bytes would need to be used to represent all TypeDefOrRef coded tokens throughout the metadata tables. Lists The third representation we need to consider is 1-to-many references; each TypeDef refers to a list of FieldDef and MethodDef belonging to that type. If we were to specify every FieldDef and MethodDef individually then each TypeDef would be very large and a variable size, which isn't ideal. There is a way of specifying a list of references without explicitly specifying every item; if we order the MethodDef and FieldDef tables by the owning type, then the field list and method list in a TypeDef only have to be a single RID pointing at the first FieldDef or MethodDef belonging to that type; the end of the list can be inferred by the field list and method list RIDs of the next row in the TypeDef table. Going back to the TypeDef If we have a look back at the definition of a TypeDef, we end up with the following reprensentation for each row: Flags - always 4 bytes Name - a #Strings heap offset. Namespace - a #Strings heap offset. Extends - a TypeDefOrRef coded token. FieldList - a single RID to the FieldDef table. MethodList - a single RID to the MethodDef table. So, depending on the number of entries in the heaps and tables within the assembly, the rows in the TypeDef table can be as small as 14 bytes, or as large as 24 bytes. Now we've had a look at how information is encoded within the metadata tables, in the next post we can see how they are arranged on disk.

    Read the article

  • ASP.NET WebControl ITemplate child controls are null

    - by tster
    Here is what I want: I want a control to put on a page, which other developers can place form elements inside of to display the entities that my control is searching. I have the Searching logic all working. The control builds custom search fields and performs searches based on declarative C# classes implementing my SearchSpec interface. Here is what I've been trying: I've tried using ITemplate on a WebControl which implements INamingContainer I've tried implementing a CompositeControl The closest I can get to working is below. OK I have a custom WebControl [ AspNetHostingPermission(SecurityAction.Demand, Level = AspNetHostingPermissionLevel.Minimal), AspNetHostingPermission(SecurityAction.InheritanceDemand, Level = AspNetHostingPermissionLevel.Minimal), DefaultProperty("SearchSpecName"), ParseChildren(true), ToolboxData("<{0}:SearchPage runat=\"server\"> </{0}:SearchPage>") ] public class SearchPage : WebControl, INamingContainer { [Browsable(false), PersistenceMode(PersistenceMode.InnerProperty), DefaultValue(typeof(ITemplate), ""), Description("Form template"), TemplateInstance(TemplateInstance.Single), TemplateContainer(typeof(FormContainer))] public ITemplate FormTemplate { get; set; } public class FormContainer : Control, INamingContainer{ } public Control MyTemplateContainer { get; private set; } [Bindable(true), Category("Behavior"), DefaultValue(""), Description("The class name of the SearchSpec to use."), Localizable(false)] public virtual string SearchSpecName { get; set; } [Bindable(true), Category("Behavior"), DefaultValue(true), Description("True if this is query mode."), Localizable(false)] public virtual bool QueryMode { get; set; } private SearchSpec _spec; private SearchSpec Spec { get { if (_spec == null) { Type type = Assembly.GetExecutingAssembly().GetTypes().Where(t => t.Name == SearchSpecName).First(); _spec = (SearchSpec)Assembly.GetExecutingAssembly().CreateInstance(type.Namespace + "." + type.Name); } return _spec; } } protected override void CreateChildControls() { if (FormTemplate != null) { MyTemplateContainer = new FormTemplateContainer(this); FormTemplate.InstantiateIn(MyTemplateContainer); Controls.Add(MyTemplateContainer); } else { Controls.Add(new LiteralControl("blah")); } } protected override void RenderContents(HtmlTextWriter writer) { // <snip> } protected override HtmlTextWriterTag TagKey { get { return HtmlTextWriterTag.Div; } } } public class FormTemplateContainer : Control, INamingContainer { private SearchPage parent; public FormTemplateContainer(SearchPage parent) { this.parent = parent; } } then the usage: <tster:SearchPage ID="sp1" runat="server" SearchSpecName="TestSearchSpec" QueryMode="False"> <FormTemplate> <br /> Test Name: <asp:TextBox ID="testNameBox" runat="server" Width="432px"></asp:TextBox> <br /> Owner: <asp:TextBox ID="ownerBox" runat="server" Width="427px"></asp:TextBox> <br /> Description: <asp:TextBox ID="descriptionBox" runat="server" Height="123px" Width="432px" TextMode="MultiLine" Wrap="true"></asp:TextBox> </FormTemplate> </tster:SearchPage> The problem is that in the CodeBehind, the page has members descriptionBox, ownerBox and testNameBox. However, they are all null. Furthermore, FindControl("ownerBox") returns null as does this.sp1.FindControl("ownerBox"). I have to do this.sp1.MyTemplateContainer.FindControl("ownerBox") to get the control. How can I make it so that the C# Code Behind will have the controls generated and not null in my Page_Load event so that developers can just do this: testNameBox.Text = "foo"; ownerBox.Text = "bar"; descriptionBox.Text = "baz";

    Read the article

  • Creating a constant Dictionary in C#

    - by David Schmitt
    What is the most efficient way to create a constant (never changes at runtime) mapping of strings to ints? I've tried using a const Dictionary, but that didn't work out. I could implement a immutable wrapper with appropriate semantics, but that still doesn't seem totally right. For those who have asked, I'm implementing IDataErrorInfo in a generated class and am looking for a way to make the columnName lookup into my array of descriptors. I wasn't aware (typo when testing! d'oh!) that switch accepts strings, so that's what I'm gonna use. Thanks!

    Read the article

  • Help with string manipulation function

    - by MusiGenesis
    I have a set of strings that contain within them one or more question marks delimited by a comma, a comma plus one or more spaces, or potentially both. So these strings are all possible: BOB AND ? BOB AND ?,?,?,?,? BOB AND ?, ?, ? ,? BOB AND ?,? , ?,? ?, ? ,? AND BOB I need to replace the question marks with @P#, so that the above samples would become: BOB AND @P1 BOB AND @P1,@P2,@P3,@P4,@P5 BOB AND @P1,@P2,@P3,@P4 BOB AND @P1,@P2,@P3,@P4 @P1,@P2,@P3 AND BOB What's the best way to do this without regex or Linq?

    Read the article

  • Regex to match a whole string only if it lacks a given substring/suffix

    - by Ivan Krechetov
    I've searched for questions like this, but all the cases I found were solved in a problem-specific manner, like using !g in vi to negate the regex matches, or matching other things, without a regex negation. Thus, I'm interested in a “pure” solution to this: Having a set of strings I need to filter them with a regular expression matcher so that it only leaves (matches) the strings lacking a given substring. For example, filtering out "Foo" in: Boo Foo Bar FooBar BooFooBar Baz Would result in: Boo Bar Baz I tried constructing it with negative look aheads/behinds (?!regex)/(?<!regex), but couldn't figure it out. Is that even possible?

    Read the article

< Previous Page | 51 52 53 54 55 56 57 58 59 60 61 62  | Next Page >