Search Results

Search found 5371 results on 215 pages for 'church encoding'.

Page 15/215 | < Previous Page | 11 12 13 14 15 16 17 18 19 20 21 22  | Next Page >

  • Encoding in python with lxml - complex solution

    - by Vojtech R.
    Hi, I need to download and parse webpage with lxml and build UTF-8 xml output. I thing schema in pseudocode is more illustrative: from lxml import etree webfile = urllib2.urlopen(url) root = etree.parse(webfile.read(), parser=etree.HTMLParser(recover=True)) txt = my_process_text(etree.tostring(root.xpath('/html/body'), encoding=utf8)) output = etree.Element("out") output.text = txt outputfile.write(etree.tostring(output, encoding=utf8)) So webfile can be in any encoding (lxml should handle this). Outputfile have to be in utf-8. I'm not sure where to use encoding/coding. Is this schema ok? (I cant find good tutorial about lxml and encoding, but I can find many problems with this...) I need robust approved solution so I ask you seniors. Many thanks

    Read the article

  • Check the encoding of text in SQlite

    - by JJG
    I'm having a nightmare dealing with non Eurpean texts in SQlite. I think the problem is that SQlite isn't encoding the text in UTF8. So I want to check what the encoding is, and hopefully change it to utf8. I encoded a CSV in UTF8 and simply imported it to SQlite but the non-roman text is garbled. I would like to know: 1)how to check the encoding. 2)How to change the encoding if it is not utf8. I've been reading about Pragma encoding, but I'm not sure how to use this.

    Read the article

  • /etc/init.d Character Encoding Issue

    - by Ryan Rosario
    I have a script in /etc/init.d on an EC2 image that, on machine startup, pulls in source code via SVN, builds it, and then runs it using Ant. The source code is Java. Within this code is a call to the Weka library which writes a file to disk. On most Ubuntu AMIs, and my home machines' versions of Ubuntu, there is no issue. The problem is that with certain versions/AMIs of Ubuntu, Unicode characters in the file are replaced with question marks ('?'). If I run the job manually on the trouble instance, Unicode is output to file correctly, but not when run from /etc/init.d. What might be causing this problem and how can I fix it so that Unicode characters appear correctly in files written from /etc/init.d processes?

    Read the article

  • FFmpeg multi pass encoding

    - by Levan
    Sorry, I am really new to this, and have problems doing some tasks without help. So I have a terminal command: ffmpeg \ -y \ -i '/media/levan/BEEA60D8EA608E89/Downloads/Videos/Tony Braxton - Un-Break My Heart.VOB' \ -s 1920x1080 \ -aspect 16:9 \ -r 25 \ -b 15550k \ -bt 19792k \ -vcodec libtheora \ -acodec libvorbis \ -ac 2 \ -ar 48000 \ -ab 320k \ ddd.ogg and I want to have 3 pass video in output video, but how do I accomplish this? I found that I must write -pass n command some where, but where to write it I do not know. I tested this and wrote -pass 3 at the end but then the terminal just showed a > symbol.

    Read the article

  • Convert DVD to MKV (et al) without transcoding/recompression

    - by Oli
    Like a lot of people, I have a lot of DVDs. But we also have a stupid amount of disk space and a media centre (Boxee) so the DVDs are getting less and less use. It would be nice to convert our DVDs into something more relevant to our needs. I've dabbled with DVD ripping before but whereas I'd usually transcode down to a smaller picture size with a better video compression algorithm, this takes a silly amount of time. I don't have a couple of hours available for each disk. (Sidebar: is there dedicated, Linux-friendly hardware to improve h264 encoding performance?) So I was wondering if there's anything that take the DVD filesystem, De-CSS it, and then stitch together any the VOBs that make up the main part of the film and package that up in a wrapping format like MKV. A bonus would be if it could grab the subtitles and stick them in too but that's not a requirement as Boxee can grab the subtitles online if it needs to.

    Read the article

  • Convert DVD to MKV (et al) without transcoding/recompression

    - by Oli
    Like a lot of people, I have a lot of DVDs. But we also have a stupid amount of disk space and a media centre (Boxee) so the DVDs are getting less and less use. It would be nice to convert our DVDs into something more relevant to our needs. I've dabbled with DVD ripping before but whereas I'd usually transcode down to a smaller picture size with a better video compression algorithm, this takes a silly amount of time. I don't have a couple of hours available for each disk. (Sidebar: is there dedicated, Linux-friendly hardware to improve h264 encoding performance?) So I was wondering if there's anything that take the DVD filesystem, De-CSS it, and then stitch together any the VOBs that make up the main part of the film and package that up in a wrapping format like MKV. A bonus would be if it could grab the subtitles and stick them in too but that's not a requirement as Boxee can grab the subtitles online if it needs to.

    Read the article

  • Replacing LF, NEL line endings in text file with CR+LF

    - by Tomas Lycken
    I have a text file with a strange character encoding that I'd like to convert to standard UTF-8. I have managed to get part of the way: $ file myfile.txt myfile.txt: Non-ISO extended-ASCII text, with LF, NEL line endings $ iconv -f ascii -t utf-8 myfile.txt > myfile.txt.utf8 $ file myfile.txt.utf8 myfile.txt.utf8: UTF-8 Unicode text, with LF, NEL line endings ## edit myfile.txt.utf8 using nano, to fix failed character conversions (mostly åäö) $ file myfile.txt.utf8 myfile.txt.utf8: UTF-8 Unicode text, with LF, NEL line endings However, I can't figure out how to convert the line endings. How do I do to replace LF+NEL with CR+LF (or whatever is the standard)? When I'm done, I'd like to see the following: $ file myfile.txt myfile.txt: UTF-8 Unicode text

    Read the article

  • VS2008 asp.net spits out gibberish, possibly wrong encoding issue.

    - by Edward M Meshuris
    Hello, I have inherited a project, it was originally written in VS2005. I have made a few changes, but all are design. Now when I run the project using the visual studio's web server, in IE8, the page shows up just fine, however in FireFox 3.6.3, I get gibberish (a full page of this): ?I?%&/m?{J?J??t??$?@?????iG#)?*??eVe]f@?? ??{???{??;?N'????\fdl??J??!????~|?"~???????7????t?.???WO???? m???{'w?}?4??????x'}??W?{???G?G?]=?{???j|uo\?w???Pv??? My setup: Windows 7 Proffesional (All patches) VS2008 SP1 (I think all patches) Thank you for your help! -Edward

    Read the article

  • Is this a good approach to address double-base64-encoding?

    - by Freiheit
    My software understands attachments, like PNGs attached to user records. These attachments are usually sent in from outside sources as a Base64 encoded string. The database stores whatever data it is given, Base64 encoded or not. When I serve up the attachment for download I do this: if (Base64.isBase64(data)) { data = Base64.decodeBase64(data); } There is a potential for data that is double encoded. For instance the sender of a message had base64 encoded data, then encoded it again when building the message to send to me. I think the following code would address that circumstance: while (Base64.isBase64(data)) { data = Base64.decodeBase64(data); } So if data is encoded multiple times, it would be decoded until its in its 'raw' state and then served up for download. Is this approach an acceptable way to address that problem? Ideally some sort of checking could happen at the edge when I receive attachment data, but that will take more time. This looping seems to be a faster way to do it. The 'Base64' library is Apache Commons: http://commons.apache.org/codec/apidocs/org/apache/commons/codec/binary/Base64.html I trust it to properly identify Base64 encoded data.

    Read the article

  • PHP - ___ encoding to UTF-8 - is there an end-all solution?

    - by Kerry
    I've looked across the web, I've looked through SO, through PHP documentation and more. It seems like a ridiculous problem not to have a standard solution to. If you get an unknown character set, and it has strange characters (like english quotes), is there a standard way to convert them to UTF-8? I've seen many messy solutions using a plethora of functions and checking and none of them are definitely going to work. Has anyone come up with their own function or a solution that always works?

    Read the article

  • how do i Raw url ENCODING/ DECODING in javascript and ruby to get the same answers in both?

    - by Mo
    Hi i am working on a web application where i have to encode and decode a string at the JavaScript side and ruby backend of the code. the only problem is that the escape methods for JavaScript and ruby have a small difference. in java script the " " is treated as "%20" but in ruby the " "(space) is encoded to "+". any way to work around this? another ruby method to encode a string in RAW url encode? thank you

    Read the article

  • Japanese character stored in SQL Server DB using ASP page that assumed it as ISO-8859-1 encoding

    - by Vishal Seth
    We have a legacy ASP based product that allowed the UI and Data languages of user groups to be configured according to their locations. CodePage and CharSet in ASP pages collecting data was set accordingly. I've noticed few instances in the SQL Server DB where users posted Japanese characters in the ASP page that assumes the oncoming stream to be of ISO-8859-1/Western and as a result, the data in the SQL table has gobbled up. While upgrading the client to our new product, I want to back-convert those "garbage" Japanese (in some instances Chinese) characters back to their actual form. Can I create some utility ASP page that would go through such data values and "fix" the wrongly-encoded strings and store everything back as utf-8 strings? In any case, I don't want to affect my French/Spanish/English characters that might be there as well.

    Read the article

  • How to handle URLs with diacritic characters

    - by user359650
    I am wondering how to handle URLs which correspond to strings containing diacritic (á, u, ´...). I believe what we're seeing mostly are URLs where diacritic characters where converted to their closest ASCII equivalent, for instance Rånades på Skyttis i Ö-vik converted to ranades-pa-skyttis-i-o-vik. However depending on the corresponding language, such conversion might be incorrect. For instance in German, ü should be converted to ue and not just u, as seen with the below URL representing the Bayern München string as bayern-muenchen: http://www.bundesliga.de/en/liga/clubs/fc-bayern-muenchen/index.php However what I've also noticed, is that browsers can render non-ASCII characters when they are percent-encoded in the URL, which is the approach Wikipedia has chosen, for instance http://de.wikipedia.org/wiki/FC_Bayern_M%C3%BCnchen which is rendered as: Therefore I'm considering the following approach for creating URL slugs: -(1) convert strings while replacing non-ASCII characters to their recommended ASCII representation: Bayern München - bayern-muenchen -(2) also convert strings to percent encoding: Bayern München - bayern_m%C3%BCnchen -create a 301 redirect from version (1) to version (2) Version (1) URLs could be used for marketing purposes (e.g. mywebsite.com/bayern-muenchen) but the URLs that would end being displayed in the browser bar would be version (2) URLs (e.g. mywebsite.com/bayern-münchen). Can you foresee particular problems with this approach? (Wikipedia is not doing it and I wonder why, apart from the fact that they don't need to market their URLs)

    Read the article

  • How to resolve a NULL cString crash

    - by hanumanDev
    I'm getting a crash with the following encoding fix I'm trying to implement: // encoding fix NSString *correctStringTitle = [NSString stringWithCString:[[item objectForKey:@"main_tag"] cStringUsingEncoding:NSISOLatin1StringEncoding] encoding:NSUTF8StringEncoding]; cell.titleLabel.text = [correctStringTitle capitalizedString]; my crash log output states: *** Terminating app due to uncaught exception 'NSInvalidArgumentException', reason: '*** +[NSString stringWithCString:encoding:]: NULL cString' thanks for any help

    Read the article

  • Convert .net String object into base64 encoded string

    - by chester89
    I have a question, which Unicode encoding to use while encoding .NET string into base64? I know strings are UTF-16 encoded on Windows, so is my way of encoding is the right one? public static String ToBase64String(this String source) { return Convert.ToBase64String(Encoding.Unicode.GetBytes(source)); }

    Read the article

  • How to detect UTF-8-based encoded strings [closed]

    - by Diego Sendra
    A customer of asked us to build him a multi-language based support VB6 scraper, for which we had the need to detect UTF-8 based encoded strings to decode it later for proper displaying in application UI. It's necessary to point out that this need arises based on VB6 limitations to natively support UTF-8 in its controls, contrary to what it happens in .NET where you can tell a control that it should expect UTF-8 encoding. VB6 natively supports ISO 8859-1 and/or Windows-1252 encodings only, for which textboxes, dropdowns, listview controls, others can't be defined to natively support/expect UTF-8 as you can do in .NET considering what we just explained; so we would see weird symbols such as é, è among others, making it a whole mess at the time of displaying. So, next function contains whole UTF-8 encoded punctuation marks and symbols from languages like Spanish, Italian, German, Portuguese, French and others, based on an excellent UTF-8 based list we got from this link - Ref. http://home.telfort.nl/~t876506/utf8tbl.html Basically, the function compares if each and one of the listed UTF-8 encoded sentences, separated by | (pipe) are found in our passed string making a substring search first. Whether it's not found, it makes an alternative ASCII value based search to get a match. Say, a string like "Societé" (Society in english) would return FALSE through calling isUTF8("Societé") while it would return TRUE when calling isUTF8("SocietÈ") since È is the UTF-8 encoded representation of é. Once you got it TRUE or FALSE, you can decode the string through DecodeUTF8() function for properly displaying it, a function we found somewhere else time ago and also included in this post. Function isUTF8(ByVal ptstr As String) Dim tUTFencoded As String Dim tUTFencodedaux Dim tUTFencodedASCII As String Dim ptstrASCII As String Dim iaux, iaux2 As Integer Dim ffound As Boolean ffound = False ptstrASCII = "" For iaux = 1 To Len(ptstr) ptstrASCII = ptstrASCII & Asc(Mid(ptstr, iaux, 1)) & "|" Next tUTFencoded = "Ä|Ã…|Ç|É|Ñ|Ö|ÃŒ|á|Ã|â|ä|ã|Ã¥|ç|é|è|ê|ë|í|ì|î|ï|ñ|ó|ò|ô|ö|õ|ú|ù|û|ü|â€|°|¢|£|§|•|¶|ß|®|©|â„¢|´|¨|â‰|Æ|Ø|∞|±|≤|≥|Â¥|µ|∂|∑|âˆ|Ï€|∫|ª|º|Ω|æ|ø|¿|¡|¬|√|Æ’|≈|∆|«|»|…|Â|À|Ã|Õ|Å’|Å“|–|—|“|â€|‘|’|÷|â—Š|ÿ|Ÿ|â„|€|‹|›|ï¬|fl|‡|·|‚|„|‰|Â|Ú|Ã|Ë|È|Ã|ÃŽ|Ã|ÃŒ|Ó|Ô||Ã’|Ú|Û|Ù|ı|ˆ|Ëœ|¯|˘|Ë™|Ëš|¸|Ë|Ë›|ˇ" & _ "Å|Å¡|¦|²|³|¹|¼|½|¾|Ã|×|Ã|Þ|ð|ý|þ" & _ "â‰|∞|≤|≥|∂|∑|âˆ|Ï€|∫|Ω|√|≈|∆|â—Š|â„|ï¬|fl||ı|˘|Ë™|Ëš|Ë|Ë›|ˇ" tUTFencodedaux = Split(tUTFencoded, "|") If UBound(tUTFencodedaux) > 0 Then iaux = 0 Do While Not ffound And Not iaux > UBound(tUTFencodedaux) If InStr(1, ptstr, tUTFencodedaux(iaux), vbTextCompare) > 0 Then ffound = True End If If Not ffound Then 'ASCII numeric search tUTFencodedASCII = "" For iaux2 = 1 To Len(tUTFencodedaux(iaux)) 'gets ASCII numeric sequence tUTFencodedASCII = tUTFencodedASCII & Asc(Mid(tUTFencodedaux(iaux), iaux2, 1)) & "|" Next 'tUTFencodedASCII = Left(tUTFencodedASCII, Len(tUTFencodedASCII) - 1) 'compares numeric sequences If InStr(1, ptstrASCII, tUTFencodedASCII) > 0 Then ffound = True End If End If iaux = iaux + 1 Loop End If isUTF8 = ffound End Function Function DecodeUTF8(s) Dim i Dim c Dim n s = s & " " i = 1 Do While i <= Len(s) c = Asc(Mid(s, i, 1)) If c And &H80 Then n = 1 Do While i + n < Len(s) If (Asc(Mid(s, i + n, 1)) And &HC0) <> &H80 Then Exit Do End If n = n + 1 Loop If n = 2 And ((c And &HE0) = &HC0) Then c = Asc(Mid(s, i + 1, 1)) + &H40 * (c And &H1) Else c = 191 End If s = Left(s, i - 1) + Chr(c) + Mid(s, i + n) End If i = i + 1 Loop DecodeUTF8 = s End Function

    Read the article

  • UITableView's NSString memory leak on iphone when encoding with NSUTF8StringEncoding

    - by vince
    my UITableView have serious memory leak problem only when the NSString is NOT encoding with NSASCIIStringEncoding. - (UITableViewCell *)tableView:(UITableView *)tableView cellForRowAtIndexPath:(NSIndexPath *)indexPath { static NSString *CellIdentifier = @"cell"; UILabel *textLabel1; UITableViewCell *cell = [tableView dequeueReusableCellWithIdentifier:CellIdentifier]; if (cell == nil) { cell = [[[UITableViewCell alloc] initWithStyle:UITableViewCellStyleDefault reuseIdentifier:CellIdentifier] autorelease]; textLabel1 = [[UILabel alloc] initWithFrame:CGRectMake(105, 6, 192, 22)]; textLabel1.tag = 1; textLabel1.textColor = [UIColor whiteColor]; textLabel1.backgroundColor = [UIColor blackColor]; textLabel1.numberOfLines = 1; textLabel1.adjustsFontSizeToFitWidth = NO; [textLabel1 setFont:[UIFont boldSystemFontOfSize:19]]; [cell.contentView addSubview:textLabel1]; [textLabel1 release]; } else { textLabel1 = (UILabel *)[cell.contentView viewWithTag:1]; } NSDictionary *tmpDict = [listOfInfo objectForKey:[NSString stringWithFormat:@"%@",indexPath.row]]; textLabel1.text = [tmpDict objectForKey:@"name"]; return cell; } -(void) readDatabase { NSArray *documentPaths = NSSearchPathForDirectoriesInDomains(NSDocumentDirectory, NSUserDomainMask, YES); NSString *documentsDir = [documentPaths objectAtIndex:0]; databasePath = [documentsDir stringByAppendingPathComponent:[NSString stringWithFormat:@"%@",myDB]]; sqlite3 *database; if(sqlite3_open([databasePath UTF8String], &database) == SQLITE_OK) { const char sqlStatement = [[NSString stringWithFormat:@"select id,name from %@ order by orderid",myTable] UTF8String]; sqlite3_stmt *compiledStatement; if(sqlite3_prepare_v2(database, sqlStatement, -1, &compiledStatement, NULL) == SQLITE_OK) { while(sqlite3_step(compiledStatement) == SQLITE_ROW) { NSString *tmpid = [NSString stringWithUTF8String:(char *)sqlite3_column_text(compiledStatement, 0)]; NSString *tmpname = [NSString stringWithCString:(const char *)sqlite3_column_text(compiledStatement, 1) encoding:NSUTF8StringEncoding]; [listOfInfo setObject:[[NSMutableDictionary alloc] init] forKey:tmpid]; [[listOfInfo objectForKey:tmpid] setObject:[NSString stringWithFormat:@"%@", tmpname] forKey:@"name"]; } } sqlite3_finalize(compiledStatement); debugNSLog(@"sqlite closing"); } sqlite3_close(database); } when i change the line NSString *tmpname = [NSString stringWithCString:(const char *)sqlite3_column_text(compiledStatement, 1) encoding:NSUTF8StringEncoding]; to NSString *tmpname = [NSString stringWithCString:(const char *)sqlite3_column_text(compiledStatement, 1) encoding:NSASCIIStringEncoding]; the memory leak is gone i tried NSString stringWithUTF8String and it still leak. i've also tried: NSData *dtmpname = [NSData dataWithBytes:sqlite3_column_blob(compiledStatement, 1) length:sqlite3_column_bytes(compiledStatement, 1)]; NSString *tmpname = [[[NSString alloc] initWithData:dtmpname encoding:NSUTF8StringEncoding] autorelease]; and the problem remains, the leak occur when u start scrolling the tableview. i've actually tried other encoding and it seems that only NSASCIIStringEncoding works(no memory leak) any idea how to get rid of this problem?

    Read the article

  • Code for decoding/encoding a modified base64 URL

    - by Kirk Liemohn
    I want to base64 encode data to put it in a URL and then decode it within my HttpHandler. I have found that Base64 Encoding allows for a '/' character which will mess up my UriTemplate matching. Then I found that there is a concept of a "modified Base64 for URL" from wikipedia: A modified Base64 for URL variant exists, where no padding '=' will be used, and the '+' and '/' characters of standard Base64 are respectively replaced by '-' and '_', so that using URL encoders/decoders is no longer necessary and has no impact on the length of the encoded value, leaving the same encoded form intact for use in relational databases, web forms, and object identifiers in general. Using .NET I want to modify my current code from doing basic base64 encoding and decoding to using the "modified base64 for URL" method. Has anyone done this? To decode, I know it starts out with something like: string base64EncodedText = base64UrlEncodedText.Replace('-', '+').Replace('_', '/'); // Append '=' char(s) if necessary - how best to do this? // My normal base64 decoding now uses encodedText But, I need to potentially add one or two '=' chars to the end which looks a little more complex. My encoding logic should be a little simpler: // Perform normal base64 encoding byte[] encodedBytes = Encoding.UTF8.GetBytes(unencodedText); string base64EncodedText = Convert.ToBase64String(encodedBytes); // Apply URL variant string base64UrlEncodedText = base64EncodedText.Replace("=", String.Empty).Replace('+', '-').Replace('/', '_'); I have seen the Guid to Base64 for URL StackOverflow entry, but that has a known length and therefore they can hardcode the number of equal signs needed at the end.

    Read the article

  • Fast or asynchronous AS3 JPEG encoding

    - by Bart van Heukelom
    I'm currently using the JPGEncoder from the AS3 core lib to encode a bitmap to JPEG var enc:JPGEncoder = new JPGEncoder(90); var jpg:ByteArray = enc.encode(bitmap); Because the bitmap is rather large (3000 x 2000) the encoding takes a long while (about 20 seconds), causing the application to seemingly freeze while encoding. To solve this, I need either: An asynchronous encoder so I can keep updating the screen (with a progress bar or something) while encoding An alternative encoder which is simply faster Is either possible?

    Read the article

  • Perl's use encoding pragma breaking UTF strings

    - by Karel Bílek
    I have a problem with Perl and Encoding pragma. (I use utf-8 everywhere, in input, output, the perl scripts themselves. I don't want to use other encoding, never ever.) However. When I write binmode(STDOUT, ':utf8'); use utf8; $r = "\x{ed}"; print $r; I see the string "í" (which is what I want - and what is 00+ED unicode char). But when I add the "use encoding" pragma like this binmode(STDOUT, ':utf8'); use utf8; use encoding 'utf8'; $r = "\x{ed}"; print $r; all I see is a box character. Why? Moreover, when I add Data::Dumper and let the Dumper print the new string like this binmode(STDOUT, ':utf8'); use utf8; use encoding 'utf8'; $r = "\x{ed}"; use Data::Dumper; print Dumper($r); I see that perl changed the string to "\x{fffd}". Why?

    Read the article

  • python UTF16LE file to UTF8 encoding

    - by Qiao
    I have big file with utf16le (BOM) encoding. Is it possible to convert it to usual UTF8 by python? Something like file_old = open('old.txt', mode='r', encoding='utf_16_le') file_new = open('new.txt', mode='w', encoding='utf-8') text = file_old.read() file_new.write(text.encode('utf-8')) http://docs.python.org/release/2.3/lib/node126.html (-- utf_16_le UTF-16LE) Not working. Can't understand "TypeError: must be str, not bytes" error. python 3

    Read the article

< Previous Page | 11 12 13 14 15 16 17 18 19 20 21 22  | Next Page >