Search Results

Search found 55 results on 3 pages for 'pep'.

Page 3/3 | < Previous Page | 1 2 3

How can I disable 'output escaping' in minidom

- by William

I'm trying to build an xml document from scratch using xml.dom.minidom. Everything was going well until I tried to make a text node with a ® (Registered Trademark) symbol in. My objective is for when I finally hit print mydoc.toxml() this particular node will actually contain a ® symbol. First I tried: import xml.dom.minidom as mdom data = '®' which gives the rather obvious error of: File "C:\src\python\HTMLGen\test2.py", line 3 SyntaxError: Non-ASCII character '\xae' in file C:\src\python\HTMLGen\test2.py on line 3, but no encoding declared; see http://www.python.or g/peps/pep-0263.html for details I have of course also tried changing the encoding of my python script to 'utf-8' using the opening line comment method, but this didn't help. So I thought import xml.dom.minidom as mdom data = '®' #Both accepted xml encodings for registered trademark data = '®' text = mdom.Text() text.data = data print data print text.toxml() But because when I print text.toxml(), the ampersands are being escaped, I get this output: ® &reg; My question is, does anybody know of a way that I can force the ampersands not to be escaped in the output, so that I can have my special character reference carry through to the XML document? Basically, for this node, I want print text.toxml() to produce output of ® or ® in a happy and cooperative way! EDIT 1: By the way, if minidom actually doesn't have this capacity, I am perfectly happy using another module that you can recommend which does. EDIT 2: As Hugh suggested, I tried using data = u'®' (while also using data # -*- coding: utf-8 -*- Python source tags). This almost helped in the sense that it actually caused the ® symbol itself to be outputted to my xml. This is actually not the result I am looking for. As you may have guessed by now (and perhaps I should have specified earlier) this xml document happens to be an HTML page, which needs to work in a browser. So having ® in the document ends up causing rubbish in the browser (Â® to be precise!). I also tried: data = unichr(174) text.data = data.encode('ascii','xmlcharrefreplace') print text.toxml() But of course this lead to the same origional problem where all that happens is the ampersand gets escaped by .toxml(). My ideal scenario would be some way of escaping the ampersand so that the XML printing function won't "escape" it on my behalf for the document (in other words, achieving my original goal of having ® or ® appear in the document). Seems like soon I'm going to have to resort to regular expressions! EDIT 2a: Or perhaps not. Seems like getting my html meta information correct <META http-equiv="Content-Type" Content="text/html; charset=UTF-8"> could help, but I'm not sure yet how this fits in with the xml structure...

Read the article
CodePlex Daily Summary for Wednesday, December 05, 2012

CodePlex Daily Summary for Wednesday, December 05, 2012Popular ReleasesYahoo! UI Library: YUI Compressor for .Net: Version 2.2.0.0 - Epee: New : Web Optimization package! Cleaned up the nuget packages BugFix: minifying lots of files will now be faster because of a recent regression in some code. (We were instantiating something far too many times).DtPad - .NET Framework text editor: DtPad 2.9.0.40: http://dtpad.diariotraduttore.com/files/images/flag-eng.png English + A new built-in editor for the management of CSV files, including the edit of cells, deleting and adding new rows, replacement of delimiter character and much more (issue #1137) + The limit of rows allowed before the decommissioning of their side panel has been raised (new default: 1.000) (issue #1155, only partially solved) + Pressing CTRL+TAB now DtPad opens a screen that shows the list of opened tabs (issue #1143) + Note...Apex: Apex 1.5: Currently in development, Apex 1.5 is primarily a tidy-up, bug fix and minor feature release. New Features The 'AsynchronousCommand' now has the property 'DisableDuringExecution'. If set to true, this will disable execution of the command while it is already executing. The default is false. The 'Command' class now has a strongly typed analogue, Command<TParameter> that allows strong typing of its underlying function. Added the CueTextBox. The CueTextBox is a textbox that can optionally d...AvalonDock: AvalonDock 2.0.1746: Welcome to the new release of AvalonDock 2.0 This release contains a lot (lot) of bug fixes and some great improvements: Views Caching: Content of Documents and Anchorables is no more recreated everytime user move it. Autohide pane opens really fast now. Two new themes Expression (Dark and Light) and Metro (both of them still in experimental stage). If you already use AD 2.0 or plan to integrate it in your future projects, I'm interested in your ideas for new features: http://avalondock...AcDown?????: AcDown????? v4.3.2: ??●AcDown??????????、??、??、???????。????，????，?????????????????????????。???????????Acfun、????(Bilibili)、??、??、YouTube、??、???、??????、SF????、????????????。 ●??????AcPlay?????，??????、????????????????。 ● AcDown??????????????????，????????????????????????????。 ● AcDown???????C#??，????.NET Framework 2.0??。?????"Acfun?????"。 ?? v4.3.2?? ?????????????????? ??Acfun??????? ??Bilibili?????? ??Bilibili???????????? ??Bilibili????????? ??????????????? ???? ??Bilibili??????? ????32??64? Windows XP/...ExtJS based ASP.NET 2.0 Controls: FineUI v3.2.2: ??FineUI ?? ExtJS ??? ASP.NET 2.0 ???。 FineUI??? ?? No JavaScript，No CSS，No UpdatePanel，No ViewState，No WebServices ???????。 ?????? IE 7.0、Firefox 3.6、Chrome 3.0、Opera 10.5、Safari 3.0+ ???? Apache License 2.0 (Apache) ???? ??：http://fineui.com/bbs/ ??：http://fineui.com/demo/ ??：http://fineui.com/doc/ ??：http://fineui.codeplex.com/ ???? +2012-12-03 v3.2.2 -?????????????，?????button/button_menu.aspx（????）。 +?Window????Plain??；?ToolbarPosition??Footer??；?????FooterBarAlign??。 -????win...Player Framework by Microsoft: Player Framework for Windows Phone 8: This is a brand new version of the Player Framework for Windows Phone, available exclusively for Windows Phone 8, and now based upon the Player Framework for Windows 8. While this new version is not backward compatible with Windows Phone 7 (get that http://smf.codeplex.com/releases/view/88970), it does offer the same great feature set plus dozens of new features such as advertising, localization support, and improved skinning. Click here for more information about what's new in the Windows P...ASP.NET Youtube Clone: ASP.NET Youtube Clone ver 7.1: ASP.NET Youtube Clone Free Script version 7.1SSH.NET Library: 2012.12.3: New feature(s): + SynchronizeDirectoriesmenu4web: menu4web 1.1 - free javascript menu: menu4web 1.1 has been tested with all major browsers: Firefox, Chrome, IE, Opera and Safari. Minified m4w.js library is less than 9K. Includes 22 menu examples of different styles. Can be freely distributed under The MIT License (MIT).Quest: Quest 5.3 Beta: New features in Quest 5.3 include: Grid-based map (sponsored by Phillip Zolla) Changable POV (sponsored by Phillip Zolla) Game log (sponsored by Phillip Zolla) Customisable object link colour (sponsored by Phillip Zolla) More room description options (by James Gregory) More mathematical functions now available to expressions Desktop Player uses the same UI as WebPlayer - this will make it much easier to implement customisation options New sorting functions: ObjectListSort(list,...Chinook Database: Chinook Database 1.4: Chinook Database 1.4 This is a sample database available in multiple formats: SQL scripts for multiple database vendors, embeded database files, and XML format. The Chinook data model is available here. ChinookDatabase1.4_CompleteVersion.zip is a complete package for all supported databases/data sources. There are also packages for each specific data source. Supported Database ServersDB2 EffiProz MySQL Oracle PostgreSQL SQL Server SQL Server Compact SQLite Issues Resolved293...RiP-Ripper & PG-Ripper: RiP-Ripper 2.9.34: changes FIXED: Thanks Function when "Download each post in it's own folder" is disabled FIXED: "PixHub.eu" linksD3 Loot Tracker: 1.5.6: Updated to work with D3 version 1.0.6.13300????????API for .Net SDK: SDK for .Net ??? Release 5: 2012?11?30??? ?OAuth?????????????????????SDK OAuth oauth = new OAuth("<AppKey>", "<AppSecret>", "<????>"); WebProxy proxy = new WebProxy(); proxy.Address = new Uri("http://proxy.domain.com:3128");//??????????? proxy.Credentials = new NetworkCredential("<??>", "<??>");//???????，??? oauth.Proxy = proxy; //??????，?~ Magelia WebStore Open-source Ecommerce software: Magelia WebStore 2.2: new UI for the Administration console Bugs fixes and improvement version 2.2.215.3nopCommerce. Open source shopping cart (ASP.NET MVC): nopcommerce 2.70: Highlight features & improvements: • Performance optimization. • Search engine optimization. ID-less URLs for products, categories, and manufacturers. • Added ACL support (access control list) on products and categories. • Minify and bundle JavaScript files. • Allow a store owner to decide which billing/shipping address fields are enabled/disabled/required (like it's already done for the registration page). • Moved to MVC 4 (.NET 4.5 is required). • Now Visual Studio 2012 is required to work ...NHook - A debugger API: NHook 1.0: x86 debugger Resolve symbol from MS Public server Resolve RVA from executable's image Add breakpoints Assemble / Disassemble target process assembly More information here, you can also check unit tests that are real sample code.PDF Library: PDFLib v2.0: Release notes This new version include many bug fixes and include support for stream objects and cross-reference object streams. New FeatureExtract images from the PDFDocument.Editor: 2013.5: Whats new for Document.Editor 2013.5: New Read-only File support New Check For Updates support Minor Bug Fix's, improvements and speed upsNew ProjectsBookList: Schoolproject for createing a catalogue about schoolbooks. This loads books from a .txt file which is given by the publisher.BunnyHug: BunnyHug Client is an small footprint, high performance and userfriendly piece of software that synchronizes local folder to your Google Docs for Google Apps Premium usersCarRental001: carConCatJS: ConCatJS is a simple application that is capable of recursively processing directories and concatenating the JavaScript (or CSS) files therein into a single file.DemonBuddy Ultimate Plugin: Demon Buddy Plugin which handle all aspect of program: - Looting - Combat - Vendor run - Xml Profile TagDIYbook: 1234567Dynamics Crm 2011 Solution Manager: Dynamics Crm 2011 Solution Manager is used to automatically export/import the Crm 2011 solutions with some pre-defined settings.Easy Weather: PBKHosts Switcher: This small tray icon utility takes care about your host files, so developer can easily switch between QA, production and local environment.ImageGallery CRM 2011: Image loader for annotations CRM 2011 KinectSDK-Kinventor: Library wchich will work as bridge between Kinect and Inventor APILava JS: Lava is a javascript framework that makes creating web applications much easier. It allows you clearly separate logic from the UI by utilizing MVC concepts.LevelEditor: Level Editor is a tool for creating and editing game levels, written entirely in C#/WPF 4.Luxoft test tasks: Luxoft test tasks for competitorsMap Generator: Map Generator for your Civ/Col/Roguelike games in VB.NET.MAXP: ??Membership Adapter: Another approach: adapting MebershipProvider abstract class instead of inheriting it. Motion Maker for Pmd: Test Project.mp3player-xslt-plugin: This is an Umbraco based module for managing mp3 media. You can upload the mp3 media in the dashboard and then display those media anywhere in your site.Music Store Lab: Music Store Lab migrates the original to MVC4, using code first migrations.Mvc Dependency: Mvc Dependency is the smaller better looking sibling of Client Dependency. There's no standing on your head to support legacy Web Forms, just plain simple conventions that put you in control of your resources.MyOrchard: thanks ！i myXbyqwrhjadsfasfhgf: myXbyqwrhjadsfasfhgfnApp: Racunovodstveni software za hrvatsko tržišteNazTek.Extension.Clr4: CLR 4.0 extensions and utility APINinja Echo: Ninja Echo is a bot for Stack Overflow chat. It plugs in as a Greasemonkey userscript, and listens for and sends messages.p_zpp_grc: forum dyskusyjne ASP.NET - projekt ATHPivotal Tracker API wrapper: .NET 4.0 C# wrapper for PivotalTracker API. PrinceOfPersia.net: PrinceOfPersia is a game porting of the famous 80' classic game "Prince Of Persia" maded by Broderbound and J. Mechner game. Project13271205: dfProntuário Eletrônico do Paciente: O desenvolvimento de um Prontuário Eletrônico em Saúde (PEP) depende de componentes de software. Alguns públicos são compilados aqui.quirli - free media player for replay and rehearsal.: quirli is a free media player for music replay and rehearsal. It's main feature is fast navigation to predefined cue points in the media.Reinventing the Wheel: Interested in reinventing the wheel? Tired of re-implementing every single helpful piece of code again and again? That's why "Wheels" Project is introduced.Roll the Dice by Ma Chung: This project is developed by - Fika Aditya - M. Ainur R System Information - Ma Chung Universitysc2md: starcraft.md news portalShot In The Dark: Shot In The Dark is a multiplayer top-down 2D shooter framework developed in Flash/AS3.0, and uses the Nonoba Multiplayer API.Silverlog: Blog in silverlightSitecore PowerShell Console: PowerShell environment for Sitecore allowing to apply complex modifications, manipulate sites, files and items and perform content analysis & reports.Solid Edge Community: Solid Edge CommunitySpace Shooter: Space shooter just for funtestdyq: testTestMekepasa: prueba de proyectoTicket Information .NET Component: .NET Component, parsing ticket data from websiteWindows 8 Store Maps App Framework: Framework zur Erstellung eines Windows 8 Store App mit dem Bing Maps Control.

Read the article
Securing an ADF Application using OES11g: Part 2

- by user12587121

To validate the integration with OES we need a sample ADF Application that is rich enough to allow us to test securing the various ADF elements. To achieve this we can add some items including bounded task flows to the application developed in this tutorial. A sample JDeveloper 11.1.1.6 project is available here. It depends on the Fusion Order Demo (FOD) database schema which is easily created using the FOD build scripts.In the deployment we have chosen to enable only ADF Authentication as we will delegate Authorization, mostly, to OES.The welcome page of the application with all the links exposed looks as follows: The Welcome, Browse Products, Browse Stock and System Administration links go to pages while the Supplier Registration and Update Stock are bounded task flows. The Login link goes to a basic login page and once logged in a link is presented that goes to a logout page. Only the Browse Products and Browse Stock pages are really connected to the database--the other pages and task flows do not really perform any operations on the database. Required Security Policies We make use of a set of test users and roles as decscribed on the welcome page of the application. In order to exercise the different authorization possibilities we would like to enforce the following sample policies: Anonymous users can see the Login, Welcome and Supplier Registration links. They can also see the Welcome page, the Login page and follow the Supplier Registration task flow. They can see the icon adjacent to the Login link indicating whether they have logged in or not. Authenticated users can see the Browse Product page. Only staff granted the right can see the Browse Product page cost price value returned from the database and then only if the value is below a configurable limit. Suppliers and staff can see the Browse Stock links and pages. Customers cannot. Suppliers can see the Update Stock link but only those with the update permission are allowed to follow the task flow that it launches. We could hide the link but leave it exposed here so we can easily demonstrate the method call activity protecting the task flow. Only staff granted the right can see the System Administration link and the System Administration page it accesses. Implementing the required policies In order to secure the application we will make use of the following techniques: EL Expressions and Java backing beans: JSF has the notion of EL expressions to reference data from backing Java classes. We use these to control the presentation of links on the navigation page which respect the security contraints. So a user will not see links that he is not allowed to click on into. These Java backing beans can call on to OES for an authorization decision. Important Note: naturally we would configure the WLS domain where our ADF application is running as an OES WLS SM, which would allow us to efficiently query OES over the PEP API. However versioning conflicts between OES 11.1.1.5 and ADF 11.1.1.6 mean that this is not possible. Nevertheless, we can make use of the OES RESTful gateway technique from this posting in order to call into OES. You can easily create and manage backing beans in Jdeveloper as follows: Custom ADF Phase Listener: ADF extends the JSF page lifecycle flow and allows one to hook into the flow to intercept page rendering. We use this to put a check prior to rendering any protected pages, again calling on to OES via the backing bean. Phase listeners are configured in the adf-settings.xml file. See the MyPageListener.java class in the project. Here, for example, is the code we use in the listener to check for allowed access to the sysadmin page, navigating back to the welcome page if authorization is not granted: if (page != null && (page.equals("/system.jspx") || page.equals("/system"))){ System.out.println("MyPageListener: Checking Authorization for /system"); if (getValue("#{oesBackingBean.UIAccessSysAdmin}").toString().equals("false") ){ System.out.println("MyPageListener: Forcing navigation away from system" + "to welcome"); NavigationHandler nh = fc.getApplication().getNavigationHandler(); nh.handleNavigation(fc, null, "welcome"); } else { System.out.println("MyPageListener: access allowed"); } } Method call activity: our app makes use of bounded task flows to implement the sequence of pages that update the stock or allow suppliers to self register. ADF takes care of ensuring that a bounded task flow can be entered by only one page. So a way to protect all those pages is to make a call to OES in the first activity and then either exit the task flow or continue depending on the authorization decision. The method call returns a String which contains the name of the transition to effect. This is where we configure the method call activity in JDeveloper: We implement each of the policies using the above techniques as follows: Policies 1 and 2: as these policies concern the coarse grained notions of controlling access to anonymous and authenticated users we can make use of the container’s security constraints which can be defined in the web.xml file. The allPages constraint is added automatically when we configure Authentication for the ADF application. We have added the “anonymousss” constraint to allow access to the the required pages, task flows and icons: <security-constraint> <web-resource-collection> <web-resource-name>anonymousss</web-resource-name> <url-pattern>/faces/welcome</url-pattern> <url-pattern>/afr/*</url-pattern> <url-pattern>/adf/*</url-pattern> <url-pattern>/key.png</url-pattern> <url-pattern>/faces/supplier-reg-btf/*</url-pattern> <url-pattern>/faces/supplier_register_complete</url-pattern> </web-resource-collection> </security-constraint> Policy 3: we can place an EL expression on the element representing the cost price on the products.jspx page: #{oesBackingBean.dataAccessCostPrice}. This EL Expression references a method in a Java backing bean that will call on to OES for an authorization decision. In OES we model the authorization requirement by requiring the view permission on the resource /MyADFApp/data/costprice and granting it only to the staff application role. We recover any obligations to determine the limit. Policy 4: is implemented by putting an EL expression on the Browse Stock link #{oesBackingBean.UIAccessBrowseStock} which checks for the view permission on the /MyADFApp/ui/stock resource. The stock.jspx page is protected by checking for the same permission in a custom phase listener—if the required permission is not satisfied then we force navigation back to the welcome page. Policy 5: the Update Stock link is protected with the same EL expression as the Browse Link: #{oesBackingBean.UIAccessBrowseStock}. However the Update Stock link launches a bounded task flow and to protect it the first activity in the flow is a method call activity which will execute an EL expression #{oesBackingBean.isUIAccessSupplierUpdateTransition} to check for the update permission on the /MyADFApp/ui/stock resource and either transition to the next step in the flow or terminate the flow with an authorization error. Policy 6: the System Administration link is protected with an EL Expression #{oesBackingBean.UIAccessSysAdmin} that checks for view access on the /MyADF/ui/sysadmin resource. The system page is protected in the same way at the stock page—the custom phase listener checks for the same permission that protects the link and if not satisfied we navigate back to the welcome page. Testing the Application To test the application: deploy the OES11g Admin to a WLS domain deploy the OES gateway in a another domain configured to be a WLS SM. You must ensure that the jps-config.xml file therein is configured to allow access to the identity store, otherwise the gateway will not b eable to resolve the principals for the requested users. To do this ensure that the following elements appear in the jps-config.xml file: <serviceProvider type="IDENTITY_STORE" name="idstore.ldap.provider" class="oracle.security.jps.internal.idstore.ldap.LdapIdentityStoreProvider"> <description>LDAP-based IdentityStore Provider</description> </serviceProvider> <serviceInstance name="idstore.ldap" provider="idstore.ldap.provider"> <property name="idstore.config.provider" value="oracle.security.jps.wls.internal.idstore.WlsLdapIdStoreConfigProvider"/> <property name="CONNECTION_POOL_CLASS" value="oracle.security.idm.providers.stdldap.JNDIPool"/></serviceInstance> <serviceInstanceRef ref="idstore.ldap"/> download the sample application and change the URL to the gateway in the MyADFApp OESBackingBean code to point to the OES Gateway and deploy the application to an 11.1.1.6 WLS domain that has been extended with the ADF JRF files. You will need to configure the FOD database connection to point your database which contains the FOD schema. populate the OES Admin and OES Gateway WLS LDAP stores with the sample set of users and groups. If you have configured the WLS domains to point to the same LDAP then it would only have to be done once. To help with this there is a directory called ldap_scripts in the sample project with ldif files for the test users and groups. start the OES Admin console and configure the required OES authorization policies for the MyADFApp application and push them to the WLS SM containing the OES Gateway. Login to the MyADFApp as each of the users described on the login page to test that the security policy is correct. You will see informative logging from the OES Gateway and the ADF application to their respective WLS consoles. Congratulations, you may now login to the OES Admin console and change policies that will control the behaviour of your ADF application--change the limit value in the obligation for the cost price for example, or define Role Mapping policies to determine staff access to the system administration page based on user profile attributes. ADF Development Notes Some notes on ADF development which are probably typical gotchas: May need this on WLS startup in order to allow us to overwrite credentials for the database, the signal here is that there is an error trying to access the data base: -Djps.app.credential.overwrite.allowed=true Best to call Bounded Task flows via a CommandLink (as opposed to a go link) as you cannot seem to start them again from a go link, even having completed the task flow correctly with a return activity. Once a bounded task flow (BTF) is initated it must complete correctly via a return activity—attempting to click on any other link whilst in the context of a BTF has no effect. See here for example: When using the ADF Authentication only security approach it seems to be awkward to allow anonymous access to the welcome and registration pages. We can achieve anonymous access using the web.xml security constraint shown above (where no auth-constraint is specified) however it is not clear what needs to be listed in there….for example the /afr/* and /adf/* are in there by trial and error as sometimes the welcome page will not render if we omit those items. I was not able to use the default allPages constraint with for example the anonymous-role or the everyone WLS group in order to be able to allow anonymous access to pages. The ADF security best practice advises placing all pages under the public_html/WEB-INF folder as then ADF will not allow any direct access to the .jspx pages but will only allow acces via a link of the form /faces/welcome rather than /faces/welcome.jspx. This seems like a very good practice to follow as having multiple entry points to data is a source of confusion in a web application (particulary from a security point of view). In Authentication+Authorization mode only pages with a Page definition file are protected. In order to add an emty one right click on the page and choose Go to Page Definition. This will create an empty page definition and now the page will require explicit permission to be seen. It is advisable to give a unique context root via the weblogic.xml for the application, as otherwise the application will clash with any other application with the same context root and it will not deploy

Read the article
SQLite, python, unicode, and non-utf data

- by Nathan Spears

I started by trying to store strings in sqlite using python, and got the message: sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings. Ok, I switched to Unicode strings. Then I started getting the message: sqlite3.OperationalError: Could not decode to UTF-8 column 'tag_artist' with text 'Sigur Rós' when trying to retrieve data from the db. More research and I started encoding it in utf8, but then 'Sigur Rós' starts looking like 'Sigur RÃ³s' note: My console was set to display in 'latin_1' as @John Machin pointed out. What gives? After reading this, describing exactly the same situation I'm in, it seems as if the advice is to ignore the other advice and use 8-bit bytestrings after all. I didn't know much about unicode and utf before I started this process. I've learned quite a bit in the last couple hours, but I'm still ignorant of whether there is a way to correctly convert 'ó' from latin-1 to utf-8 and not mangle it. If there isn't, why would sqlite 'highly recommend' I switch my application to unicode strings? I'm going to update this question with a summary and some example code of everything I've learned in the last 24 hours so that someone in my shoes can have an easy(er) guide. If the information I post is wrong or misleading in any way please tell me and I'll update, or one of you senior guys can update. Summary of answers Let me first state the goal as I understand it. The goal in processing various encodings, if you are trying to convert between them, is to understand what your source encoding is, then convert it to unicode using that source encoding, then convert it to your desired encoding. Unicode is a base and encodings are mappings of subsets of that base. utf_8 has room for every character in unicode, but because they aren't in the same place as, for instance, latin_1, a string encoded in utf_8 and sent to a latin_1 console will not look the way you expect. In python the process of getting to unicode and into another encoding looks like: str.decode('source_encoding').encode('desired_encoding') or if the str is already in unicode str.encode('desired_encoding') For sqlite I didn't actually want to encode it again, I wanted to decode it and leave it in unicode format. Here are four things you might need to be aware of as you try to work with unicode and encodings in python. The encoding of the string you want to work with, and the encoding you want to get it to. The system encoding. The console encoding. The encoding of the source file Elaboration: (1) When you read a string from a source, it must have some encoding, like latin_1 or utf_8. In my case, I'm getting strings from filenames, so unfortunately, I could be getting any kind of encoding. Windows XP uses UCS-2 (a Unicode system) as its native string type, which seems like cheating to me. Fortunately for me, the characters in most filenames are not going to be made up of more than one source encoding type, and I think all of mine were either completely latin_1, completely utf_8, or just plain ascii (which is a subset of both of those). So I just read them and decoded them as if they were still in latin_1 or utf_8. It's possible, though, that you could have latin_1 and utf_8 and whatever other characters mixed together in a filename on Windows. Sometimes those characters can show up as boxes, other times they just look mangled, and other times they look correct (accented characters and whatnot). Moving on. (2) Python has a default system encoding that gets set when python starts and can't be changed during runtime. See here for details. Dirty summary ... well here's the file I added: \# sitecustomize.py \# this file can be anywhere in your Python path, \# but it usually goes in ${pythondir}/lib/site-packages/ import sys sys.setdefaultencoding('utf_8') This system encoding is the one that gets used when you use the unicode("str") function without any other encoding parameters. To say that another way, python tries to decode "str" to unicode based on the default system encoding. (3) If you're using IDLE or the command-line python, I think that your console will display according to the default system encoding. I am using pydev with eclipse for some reason, so I had to go into my project settings, edit the launch configuration properties of my test script, go to the Common tab, and change the console from latin-1 to utf-8 so that I could visually confirm what I was doing was working. (4) If you want to have some test strings, eg test_str = "ó" in your source code, then you will have to tell python what kind of encoding you are using in that file. (FYI: when I mistyped an encoding I had to ctrl-Z because my file became unreadable.) This is easily accomplished by putting a line like so at the top of your source code file: # -*- coding: utf_8 -*- If you don't have this information, python attempts to parse your code as ascii by default, and so: SyntaxError: Non-ASCII character '\xf3' in file _redacted_ on line 81, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details Once your program is working correctly, or, if you aren't using python's console or any other console to look at output, then you will probably really only care about #1 on the list. System default and console encoding are not that important unless you need to look at output and/or you are using the builtin unicode() function (without any encoding parameters) instead of the string.decode() function. I wrote a demo function I will paste into the bottom of this gigantic mess that I hope correctly demonstrates the items in my list. Here is some of the output when I run the character 'ó' through the demo function, showing how various methods react to the character as input. My system encoding and console output are both set to utf_8 for this run: '?' = original char <type 'str'> repr(char)='\xf3' '?' = unicode(char) ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data 'ó' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Now I will change the system and console encoding to latin_1, and I get this output for the same input: 'ó' = original char <type 'str'> repr(char)='\xf3' 'ó' = unicode(char) <type 'unicode'> repr(unicode(char))=u'\xf3' 'ó' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Notice that the 'original' character displays correctly and the builtin unicode() function works now. Now I change my console output back to utf_8. '?' = original char <type 'str'> repr(char)='\xf3' '?' = unicode(char) <type 'unicode'> repr(unicode(char))=u'\xf3' '?' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Here everything still works the same as last time but the console can't display the output correctly. Etc. The function below also displays more information that this and hopefully would help someone figure out where the gap in their understanding is. I know all this information is in other places and more thoroughly dealt with there, but I hope that this would be a good kickoff point for someone trying to get coding with python and/or sqlite. Ideas are great but sometimes source code can save you a day or two of trying to figure out what functions do what. Disclaimers: I'm no encoding expert, I put this together to help my own understanding. I kept building on it when I should have probably started passing functions as arguments to avoid so much redundant code, so if I can I'll make it more concise. Also, utf_8 and latin_1 are by no means the only encoding schemes, they are just the two I was playing around with because I think they handle everything I need. Add your own encoding schemes to the demo function and test your own input. One more thing: there are apparently crazy application developers making life difficult in Windows. #!/usr/bin/env python # -*- coding: utf_8 -*- import os import sys def encodingDemo(str): validStrings = () try: print "str =",str,"{0} repr(str) = {1}".format(type(str), repr(str)) validStrings += ((str,""),) except UnicodeEncodeError as ude: print "Couldn't print the str itself because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print ude try: x = unicode(str) print "unicode(str) = ",x validStrings+= ((x, " decoded into unicode by the default system encoding"),) except UnicodeDecodeError as ude: print "ERROR. unicode(str) couldn't decode the string because the system encoding is set to an encoding that doesn't understand some character in the string." print "\tThe system encoding is set to {0}. See error:\n\t".format(sys.getdefaultencoding()), print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the unicode(str) because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print uee try: x = str.decode('latin_1') print "str.decode('latin_1') =",x validStrings+= ((x, " decoded with latin_1 into unicode"),) try: print "str.decode('latin_1').encode('utf_8') =",str.decode('latin_1').encode('utf_8') validStrings+= ((x, " decoded with latin_1 into unicode and encoded into utf_8"),) except UnicodeDecodeError as ude: print "The string was decoded into unicode using the latin_1 encoding, but couldn't be encoded into utf_8. See error:\n\t", print ude except UnicodeDecodeError as ude: print "Something didn't work, probably because the string wasn't latin_1 encoded. See error:\n\t", print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the str.decode('latin_1') because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print uee try: x = str.decode('utf_8') print "str.decode('utf_8') =",x validStrings+= ((x, " decoded with utf_8 into unicode"),) try: print "str.decode('utf_8').encode('latin_1') =",str.decode('utf_8').encode('latin_1') except UnicodeDecodeError as ude: print "str.decode('utf_8').encode('latin_1') didn't work. The string was decoded into unicode using the utf_8 encoding, but couldn't be encoded into latin_1. See error:\n\t", validStrings+= ((x, " decoded with utf_8 into unicode and encoded into latin_1"),) print ude except UnicodeDecodeError as ude: print "str.decode('utf_8') didn't work, probably because the string wasn't utf_8 encoded. See error:\n\t", print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the str.decode('utf_8') because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t",uee print print "Printing information about each character in the original string." for char in str: try: print "\t'" + char + "' = original char {0} repr(char)={1}".format(type(char), repr(char)) except UnicodeDecodeError as ude: print "\t'?' = original char {0} repr(char)={1} ERROR PRINTING: {2}".format(type(char), repr(char), ude) except UnicodeEncodeError as uee: print "\t'?' = original char {0} repr(char)={1} ERROR PRINTING: {2}".format(type(char), repr(char), uee) print uee try: x = unicode(char) print "\t'" + x + "' = unicode(char) {1} repr(unicode(char))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = unicode(char) ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = unicode(char) {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) try: x = char.decode('latin_1') print "\t'" + x + "' = char.decode('latin_1') {1} repr(char.decode('latin_1'))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = char.decode('latin_1') ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = char.decode('latin_1') {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) try: x = char.decode('utf_8') print "\t'" + x + "' = char.decode('utf_8') {1} repr(char.decode('utf_8'))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = char.decode('utf_8') ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = char.decode('utf_8') {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) print x = 'ó' encodingDemo(x) Much thanks for the answers below and especially to @John Machin for answering so thoroughly.

Read the article
SQLite, python, unicode, and non-utf data

- by Nathan Spears

I started by trying to store strings in sqlite using python, and got the message: sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings. Ok, I switched to Unicode strings. Then I started getting the message: sqlite3.OperationalError: Could not decode to UTF-8 column 'tag_artist' with text 'Sigur Rós' when trying to retrieve data from the db. More research and I started encoding it in utf8, but then 'Sigur Rós' starts looking like 'Sigur RÃ³s' note: My console was set to display in 'latin_1' as @John Machin pointed out. What gives? After reading this, describing exactly the same situation I'm in, it seems as if the advice is to ignore the other advice and use 8-bit bytestrings after all. I didn't know much about unicode and utf before I started this process. I've learned quite a bit in the last couple hours, but I'm still ignorant of whether there is a way to correctly convert 'ó' from latin-1 to utf-8 and not mangle it. If there isn't, why would sqlite 'highly recommend' I switch my application to unicode strings? I'm going to update this question with a summary and some example code of everything I've learned in the last 24 hours so that someone in my shoes can have an easy(er) guide. If the information I post is wrong or misleading in any way please tell me and I'll update, or one of you senior guys can update. Summary of answers Let me first state the goal as I understand it. The goal in processing various encodings, if you are trying to convert between them, is to understand what your source encoding is, then convert it to unicode using that source encoding, then convert it to your desired encoding. Unicode is a base and encodings are mappings of subsets of that base. utf_8 has room for every character in unicode, but because they aren't in the same place as, for instance, latin_1, a string encoded in utf_8 and sent to a latin_1 console will not look the way you expect. In python the process of getting to unicode and into another encoding looks like: str.decode('source_encoding').encode('desired_encoding') or if the str is already in unicode str.encode('desired_encoding') For sqlite I didn't actually want to encode it again, I wanted to decode it and leave it in unicode format. Here are four things you might need to be aware of as you try to work with unicode and encodings in python. The encoding of the string you want to work with, and the encoding you want to get it to. The system encoding. The console encoding. The encoding of the source file Elaboration: (1) When you read a string from a source, it must have some encoding, like latin_1 or utf_8. In my case, I'm getting strings from filenames, so unfortunately, I could be getting any kind of encoding. Windows XP uses UCS-2 (a Unicode system) as its native string type, which seems like cheating to me. Fortunately for me, the characters in most filenames are not going to be made up of more than one source encoding type, and I think all of mine were either completely latin_1, completely utf_8, or just plain ascii (which is a subset of both of those). So I just read them and decoded them as if they were still in latin_1 or utf_8. It's possible, though, that you could have latin_1 and utf_8 and whatever other characters mixed together in a filename on Windows. Sometimes those characters can show up as boxes, other times they just look mangled, and other times they look correct (accented characters and whatnot). Moving on. (2) Python has a default system encoding that gets set when python starts and can't be changed during runtime. See here for details. Dirty summary ... well here's the file I added: \# sitecustomize.py \# this file can be anywhere in your Python path, \# but it usually goes in ${pythondir}/lib/site-packages/ import sys sys.setdefaultencoding('utf_8') This system encoding is the one that gets used when you use the unicode("str") function without any other encoding parameters. To say that another way, python tries to decode "str" to unicode based on the default system encoding. (3) If you're using IDLE or the command-line python, I think that your console will display according to the default system encoding. I am using pydev with eclipse for some reason, so I had to go into my project settings, edit the launch configuration properties of my test script, go to the Common tab, and change the console from latin-1 to utf-8 so that I could visually confirm what I was doing was working. (4) If you want to have some test strings, eg test_str = "ó" in your source code, then you will have to tell python what kind of encoding you are using in that file. (FYI: when I mistyped an encoding I had to ctrl-Z because my file became unreadable.) This is easily accomplished by putting a line like so at the top of your source code file: # -*- coding: utf_8 -*- If you don't have this information, python attempts to parse your code as ascii by default, and so: SyntaxError: Non-ASCII character '\xf3' in file _redacted_ on line 81, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details Once your program is working correctly, or, if you aren't using python's console or any other console to look at output, then you will probably really only care about #1 on the list. System default and console encoding are not that important unless you need to look at output and/or you are using the builtin unicode() function (without any encoding parameters) instead of the string.decode() function. I wrote a demo function I will paste into the bottom of this gigantic mess that I hope correctly demonstrates the items in my list. Here is some of the output when I run the character 'ó' through the demo function, showing how various methods react to the character as input. My system encoding and console output are both set to utf_8 for this run: '?' = original char <type 'str'> repr(char)='\xf3' '?' = unicode(char) ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data 'ó' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Now I will change the system and console encoding to latin_1, and I get this output for the same input: 'ó' = original char <type 'str'> repr(char)='\xf3' 'ó' = unicode(char) <type 'unicode'> repr(unicode(char))=u'\xf3' 'ó' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Notice that the 'original' character displays correctly and the builtin unicode() function works now. Now I change my console output back to utf_8. '?' = original char <type 'str'> repr(char)='\xf3' '?' = unicode(char) <type 'unicode'> repr(unicode(char))=u'\xf3' '?' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Here everything still works the same as last time but the console can't display the output correctly. Etc. The function below also displays more information that this and hopefully would help someone figure out where the gap in their understanding is. I know all this information is in other places and more thoroughly dealt with there, but I hope that this would be a good kickoff point for someone trying to get coding with python and/or sqlite. Ideas are great but sometimes source code can save you a day or two of trying to figure out what functions do what. Disclaimers: I'm no encoding expert, I put this together to help my own understanding. I kept building on it when I should have probably started passing functions as arguments to avoid so much redundant code, so if I can I'll make it more concise. Also, utf_8 and latin_1 are by no means the only encoding schemes, they are just the two I was playing around with because I think they handle everything I need. Add your own encoding schemes to the demo function and test your own input. One more thing: there are apparently crazy application developers making life difficult in Windows. #!/usr/bin/env python # -*- coding: utf_8 -*- import os import sys def encodingDemo(str): validStrings = () try: print "str =",str,"{0} repr(str) = {1}".format(type(str), repr(str)) validStrings += ((str,""),) except UnicodeEncodeError as ude: print "Couldn't print the str itself because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print ude try: x = unicode(str) print "unicode(str) = ",x validStrings+= ((x, " decoded into unicode by the default system encoding"),) except UnicodeDecodeError as ude: print "ERROR. unicode(str) couldn't decode the string because the system encoding is set to an encoding that doesn't understand some character in the string." print "\tThe system encoding is set to {0}. See error:\n\t".format(sys.getdefaultencoding()), print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the unicode(str) because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print uee try: x = str.decode('latin_1') print "str.decode('latin_1') =",x validStrings+= ((x, " decoded with latin_1 into unicode"),) try: print "str.decode('latin_1').encode('utf_8') =",str.decode('latin_1').encode('utf_8') validStrings+= ((x, " decoded with latin_1 into unicode and encoded into utf_8"),) except UnicodeDecodeError as ude: print "The string was decoded into unicode using the latin_1 encoding, but couldn't be encoded into utf_8. See error:\n\t", print ude except UnicodeDecodeError as ude: print "Something didn't work, probably because the string wasn't latin_1 encoded. See error:\n\t", print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the str.decode('latin_1') because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print uee try: x = str.decode('utf_8') print "str.decode('utf_8') =",x validStrings+= ((x, " decoded with utf_8 into unicode"),) try: print "str.decode('utf_8').encode('latin_1') =",str.decode('utf_8').encode('latin_1') except UnicodeDecodeError as ude: print "str.decode('utf_8').encode('latin_1') didn't work. The string was decoded into unicode using the utf_8 encoding, but couldn't be encoded into latin_1. See error:\n\t", validStrings+= ((x, " decoded with utf_8 into unicode and encoded into latin_1"),) print ude except UnicodeDecodeError as ude: print "str.decode('utf_8') didn't work, probably because the string wasn't utf_8 encoded. See error:\n\t", print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the str.decode('utf_8') because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t",uee print print "Printing information about each character in the original string." for char in str: try: print "\t'" + char + "' = original char {0} repr(char)={1}".format(type(char), repr(char)) except UnicodeDecodeError as ude: print "\t'?' = original char {0} repr(char)={1} ERROR PRINTING: {2}".format(type(char), repr(char), ude) except UnicodeEncodeError as uee: print "\t'?' = original char {0} repr(char)={1} ERROR PRINTING: {2}".format(type(char), repr(char), uee) print uee try: x = unicode(char) print "\t'" + x + "' = unicode(char) {1} repr(unicode(char))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = unicode(char) ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = unicode(char) {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) try: x = char.decode('latin_1') print "\t'" + x + "' = char.decode('latin_1') {1} repr(char.decode('latin_1'))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = char.decode('latin_1') ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = char.decode('latin_1') {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) try: x = char.decode('utf_8') print "\t'" + x + "' = char.decode('utf_8') {1} repr(char.decode('utf_8'))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = char.decode('utf_8') ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = char.decode('utf_8') {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) print x = 'ó' encodingDemo(x) Much thanks for the answers below and especially to @John Machin for answering so thoroughly.

Read the article

< Previous Page | 1 2 3