Search Results

Search found 11840 results on 474 pages for 'assembly context'.

Page 63/474 | < Previous Page | 59 60 61 62 63 64 65 66 67 68 69 70  | Next Page >

  • 80x86 16-bit asm: lea cx, [cx*8+cx] causes error on NASM (compiling .com file)

    - by larz
    Title says it all. The error NASM gives (dispite my working OS) is "invalid effective address". Now i've seen many examples of how to use LEA and i think i gots it right but yet my NASM dislikes it. I tried "lea cx, [cx+9]" and it worked; "lea cx, [bx+cx]" didn't. Now if i extended my registers to 32-bits (i.e. "lea ecx, [ecx*8+ecx]") everything would be well but i am restricted to use 16- and 8-bit registers only. Is here anyone so knoweledgeable who could explain me WHY my assembler doesn't let me use lea the way i supposed it should be used? Thanks.

    Read the article

  • Ret Failure with SDL using FASM on Win32

    - by Jon Purdy
    I'm using SDL with FASM, and have code that's minimally like the following: format ELF extrn _SDL_Init extrn _SDL_SetVideoMode extrn _SDL_Quit extrn _exit SDL_INIT_VIDEO equ 0x00000020 section '.text' public _SDL_main _SDL_main: ccall _SDL_Init, SDL_INIT_VIDEO ccall _SDL_SetVideoMode, 640, 480, 32, 0 ccall _SDL_Quit ccall _exit, 0 ; Success, or ret ; failure. With the following quick-and-dirty makefile: SOURCES = main.asm OBJECTS = main.o TARGET = SDLASM.exe FASM = C:\fasm\fasm.exe release : $(OBJECTS) ld $(OBJECTS) -LC:/SDL/lib/ -lSDLmain -lSDL -LC:/MinGW/lib/ -lmingw32 -lcrtdll -o $(TARGET) --subsystem windows cleanrelease : del $(OBJECTS) %.o : %.asm $(FASM) $< $@ Using exit() (or Windows' ExitProcess()) seems to be the only way to get this program to exit cleanly, even though I feel like I should be able to use retn/retf. When I just ret without calling exit(), the application does not terminate and needs to be killed. Could anyone shed some light on this? It only happens when I make the call to SDL_SetVideoMode().

    Read the article

  • What does the q in a q-grammar stand for?

    - by Aru
    So I've been reading sites and the classic books on compilers, reading about s-grammar and q-grammars I wondered what the s and q stand for, I think the s stands for simple grammar. While the q...well, I have no idea. What does the q in a q-grammar stand for?

    Read the article

  • NDepend: How to not display 'tier' assemblies in dependency graph?

    - by Edward Buatois
    I was able to do this in an earlier version of nDepend by going to tools-options and setting which assemblies would be part of the analysis (and ignore the rest). The latest version of the trial version of nDepend lets me set it, but it seems to ignore the setting and always analyze all assemblies whether I want it to or not. I tried to delete the "tier" assemblies by moving them over to the "application assemblies" list, but when I delete them out of there, they just get added back to the "tier" list, which I can't ignore. I don't want my dependency graph to contain assemblies like "system," "system.xml," and "system.serialization!" I want only MY assemblies in the dependency graph! Or is that a paid-version feature now? Is there a way to do what I'm talking about?

    Read the article

  • See if any application has a DLL from the GAC loaded

    - by rwmnau
    I'm trying to deploy new copies of my DLL to the GAC on remote servers, but I need to identify if any processes currently running have a loaded copy of the DLL I'm replacing - I'd like to restart them, or at least tell the user. For example, Biztalk seems to load the DLLs it needs the first time they're used, and then replacing them keeps the old copy in memory until the Host Instances are restarted - something I could easily do as part of my deployment. Is there a way to tell using .NET which processes have loaded a particular DLL from the GAC? UPDATE: Some further investigation shows that both Process Explorer has this functionality, and another Sysinternals tool, ListDLL, does exactly what I want to be able to do. I'd like to know how they do it, since I'd love to replicate this functionality in my application without having to include and screen-scrape ListDLL (if that's even allowed inside the license).

    Read the article

  • Why is a 16-bit register used with BSR instruction in this code snippet?

    - by sharptooth
    In this hardcore article there's a function find_maskwidth() that basically detects the number of bits required to represent itemCount dictinct values: unsigned int find_maskwidth( unsigned int itemCount ) { unsigned int maskWidth, count = itemCount; __asm { mov eax, count mov ecx, 0 mov maskWidth, ecx dec eax bsr cx, ax jz next inc cx mov maskWidth, ecx next: } return maskWidth; } the question is why do they use ax and cx registers instead of eax and ecx?

    Read the article

  • In which scenario it is useful to use Disassembly on python?

    - by systempuntoout
    The dis module can be effectively used to disassemble Python methods, functions and classes into low-level interpreter instructions. I know that dis information can be used for: 1. Find race condition in programs that use threads 2. Find possible optimizations From your experience, do you know any other scenarios where Disassembly Python feature could be useful?

    Read the article

  • App only spawns one thread

    - by tipu
    I have what I thought was a thread-friendly app, and after doing some output I've concluded that of the 15 threads I am attempting to run, only one does. I have if __name__ == "__main__": fhf = FileHandlerFactory() tweet_manager = TweetManager("C:/Documents and Settings/Administrator/My Documents/My Dropbox/workspace/trie/Tweet Search Engine/data/partitioned_raw_tweets/raw_tweets.txt.001") start = time.time() for i in range(15): Indexer(tweet_manager, fhf).start() Then in my thread-entry point, I do def run(self): print(threading.current_thread()) self.index() That results in this: <Indexer(Thread-3, started 1168)> So of 15 threads that I thought were running, I'm only running one. Any idea as to why? Edit: code

    Read the article

  • Java: how to tell if a line in a text file was supposed to be blank?

    - by defn
    I'm working on a project in which I have to read in a Grammar file (breaking it up into my data structure), with the goal of being able to generate a random "DearJohnLetter". My problem is that when reading in the .txt file, I don't know how find out whether the file was supposed to be a completely blank line or not, which is detrimental to the program. Here is an example of part of the file, How do i tell if the next line was supposed to be a blank line? (btw I'm just using a buffered reader) Thanks! <start> I have to break up with you because <reason> . But let's still <disclaimer> . <reason> <dubious-excuse> <dubious-excuse> , and also because <reason> <dubious-excuse> my <person> doesn't like you I'm in love with <another> I haven't told you this before but <harsh> I didn't have the heart to tell you this when we were going out, but <harsh> you never <romantic-with-me> with me any more you don't <romantic> any more my <someone> said you were bad news

    Read the article

  • Theory of computation - Using the pumping lemma for CFLs

    - by Tony
    I'm reviewing my notes for my course on theory of computation and I'm having trouble understanding how to complete a certain proof. Here is the question: A = {0^n 1^m 0^n | n>=1, m>=1} Prove that A is not regular. It's pretty obvious that the pumping lemma has to be used for this. So, we have |vy| = 1 |vxy| <= p (p being the pumping length, = 1) uv^ixy^iz exists in A for all i = 0 Trying to think of the correct string to choose seems a bit iffy for this. I was thinking 0^p 1^q 0^p, but I don't know if I can obscurely make a q, and since there is no bound on u, this could make things unruly.. So, how would one go about this?

    Read the article

  • Entity Framework - refresh objects from database

    - by Nebo
    I'm having trouble with refreshing objects in my database. I have an two PC's and two applications. On the first PC, there's an application which communicates with my database and adds some data to Measurements table. On my other PC, there's an application which retrives the latest Measurement under a timer, so it should retrive measurements added by the application on my first PC too. The problem is it doesn't. On my application start, it caches all the data from database and never get new data added. I use Refresh() method which works well when I change any of the cached data, but it doesn't refresh newly added data. Here is my method which should update the data: public static Entities myEntities = new Entities(); public static Measurement GetLastMeasurement(int conditionId) { myEntities.Refresh(RefreshMode.StoreWins, myEntities.Measurements); return (from measurement in myEntities.Measurements where measurement.ConditionId == conditionId select measurement).OrderByDescending(cd => cd.Timestamp).First(); } P.S. Applications have different connection strings in app.config (different accounts for the same DB).

    Read the article

  • Recognizing terminals in a CFG production previously not defined as tokens.

    - by kmels
    I'm making a generator of LL(1) parsers, my input is a CoCo/R language specification. I've already got a Scanner generator for that input. Suppose I've got the following specification: COMPILER 1. CHARACTERS digit="0123456789". TOKENS number = digit{digit}. decnumber = digit{digit}"."digit{digit}. PRODUCTIONS Expression = Term{"+"Term|"-"Term}. Term = Factor{"*"Factor|"/"Factor}. Factor = ["-"](Number|"("Expression")"). Number = (number|decnumber). END 1. So, if the parser generated by this grammar receives a word "1+1", it'd be accepted i.e. a parse tree would be found. My question is, the character "+" was never defined in a token, but it appears in the non-terminal "Expression". How should my generated Scanner recognize it? It would not recognize it as a token. Is this a valid input then? Should I add this terminal in TOKENS and then consider an error routine for a Scanner for it to skip it? How does usual language specifications handle this?

    Read the article

  • How Do You Make An Assembler?

    - by mudge
    I'd like to make a simple x86 assembler. I'm wondering if there's any tutorials for making your own assembler. Or if there's a simple assembler that I could study. Also, I wonder what tools are used in looking at and handling the binary/hex of programs.

    Read the article

  • x86_64 assembler: only one call per subroutine?

    - by zneak
    Hello everyone, I decided yesterday to start doing assembler. Most of it is okay (well, as okay as assembler can be), but I'm getting some problems with gas. It seems that I can call functions only once. After that, any subsequent call opcode with the same function name will fail. I must be doing something terribly wrong, though I can't see what. Take this small C function for instance: void path_free(path_t path) { if (path == NULL) return; free(((point_list_t*)path)->points); free(path); } I "translated" it to assembler like that: .globl _path_free _path_free: push rbp mov rbp, rsp cmp rdi, 0 jz byebye push rdi mov rdi, qword ptr [rdi] call _free pop rdi sub rsp, 8 call _free byebye: leave ret This triggers the following error for the second call _free: suffix or operands invalid for ``call''. And if I change it to something else, like free2, everything works (until link time, that is). Assembler code gcc -S gave me looks very similar to what I've done (except it's in AT&T syntax), so I'm kind of lost. I'm doing this on Mac OS X under the x86_64 architecture.

    Read the article

  • (x86) Assembler Optimization

    - by Pindatjuh
    I'm building a compiler/assembler/linker in Java for the x86-32 (IA32) processor targeting Windows. High-level concepts of a "language" (in essential a Java API for creating executables) are translated into opcodes, which then are wrapped and outputted to a file. The translation process has several phases, one is the translation between languages: the highest-level code is translated into the medium-level code which is then translated into the lowest-level code (probably more than 3 levels). My problem is the following; if I have higher-level code (X and Y) translated to lower-level code (x, y, U and V), then an example of such a translation is, in pseudo-code: x + U(f) // generated by X + V(f) + y // generated by Y (An easy example) where V is the opposite of U (compare with a stack push as U and a pop as V). This needs to be 'optimized' into: x + y (essentially removing the "useless" code) My idea was to use regular expressions. For the above case, it'll be a regular expression looking like this: x:(U(x)+V(x)):null, meaning for all x find U(x) followed by V(x) and replace by null. Imagine more complex regular expressions, for more complex optimizations. This should work on all levels. What do you suggest? What would be a good approach to optimize in these situations?

    Read the article

  • Webapp: safetly update a shared List/Map in the AppContext

    - by al nik
    I've Lists and Maps in my WebAppContext. Most of the time these are only read by multiple Threads but sometimes there's the need to update or add some data. I'm wondering what's the best way to do this without incurring in a ConcurrentModificationException. I think that using CopyOnWriteArrayList I can achieve what I want in terms of - I do not have to sync on every read operation- I can safety update the list while other threads are reading it. Is this the best solution? What about Maps?

    Read the article

  • Why does CGPDFPageGetDrawingTransform() crash with SIGABRT when specifying a rotation?

    - by David
    When I call CGPDFPageGetDrawingTransform() with a rotation argument, the application crashes. If I specify no rotation, there is no crash. Here is my drawLayer:inContext: method: - (void)drawLayer:(CALayer*)layer inContext:(CGContextRef)context { CGContextSetRGBFillColor(context, 1.0, 1.0, 1.0, 1.0); CGRect boundingBox = CGContextGetClipBoundingBox(context); CGContextFillRect(context, boundingBox); //convert to UIKit native coodinate system CGContextTranslateCTM(context, 0.0, self.bounds.size.height); CGContextScaleCTM(context, 1.0, -1.0); //Rotate the pdf_page CGAffineTransform pfd_transform = CGPDFPageGetDrawingTransform(self.page, kCGPDFCropBox, self.frame, 58.46f, true); CGContextSaveGState (context); CGContextConcatCTM (context, pfd_transform); CGContextClipToRect (context, self.frame); CGContextDrawPDFPage (context, self.page); CGContextRestoreGState (context); } In the long run, I would like to rotate the pdf dynamically to follow a users heading. Maybe I am going at this all wrong... Thank you for your time.

    Read the article

  • Throwing a C++ exception after an inline-asm jump

    - by SoapBox
    I have some odd self modifying code, but at the root of it is a pretty simple problem: I want to be able to execute a jmp (or a call) and then from that arbitrary point throw an exception and have it caught by the try/catch block that contained the jmp/call. But when I do this (in gcc 4.4.1 x86_64) the exception results in a terminate() as it would if the exception was thrown from outside of a try/catch. I don't really see how this is different than throwing an exception from inside of some far-flung library, yet it obviously is because it just doesn't work. How can I execute a jmp or call but still throw an exception back to the original try/catch? Why doesn't this try/catch continue to handle these exceptions as it would if the function was called normally? The code: #include <iostream> #include <stdexcept> using namespace std; void thrower() { cout << "Inside thrower" << endl; throw runtime_error("some exception"); } int main() { cout << "Top of main" << endl; try { asm volatile ( "jmp *%0" // same thing happens with a call instead of a jmp : : "r"((long)thrower) : ); } catch (exception &e) { cout << "Caught : " << e.what() << endl; } cout << "Bottom of main" << endl << endl; } The expected output: Top of main Inside thrower Caught : some exception Bottom of main The actual output: Top of main Inside thrower terminate called after throwing an instance of 'std::runtime_error' what(): some exception Aborted

    Read the article

  • Intrinsics program (SSE) - g++ - help needed

    - by Sriram
    Hi all, This is the first time I am posting a question on stackoverflow, so please try and overlook any errors I may have made in formatting my question/code. But please do point the same out to me so I may be more careful. I was trying to write some simple intrinsics routines for the addition of two 128-bit (containing 4 float variables) numbers. I found some code on the net and was trying to get it to run on my system. The code is as follows: //this is a sample Intrinsics program to add two vectors. #include <iostream> #include <iomanip> #include <xmmintrin.h> #include <stdio.h> using namespace std; struct vector4 { float x, y, z, w; }; //functions to operate on them. vector4 set_vector(float x, float y, float z, float w = 0) { vector4 temp; temp.x = x; temp.y = y; temp.z = z; temp.w = w; return temp; } void print_vector(const vector4& v) { cout << " This is the contents of vector: " << endl; cout << " > vector.x = " << v.x << endl; cout << " vector.y = " << v.y << endl; cout << " vector.z = " << v.z << endl; cout << " vector.w = " << v.w << endl; } vector4 sse_vector4_add(const vector4&a, const vector4& b) { vector4 result; asm volatile ( "movl $a, %eax" //move operands into registers. "\n\tmovl $b, %ebx" "\n\tmovups (%eax), xmm0" //move register contents into SSE registers. "\n\tmovups (%ebx), xmm1" "\n\taddps xmm0, xmm1" //add the elements. addps operates on single-precision vectors. "\n\t movups xmm0, result" //move result into vector4 type data. ); return result; } int main() { vector4 a, b, result; a = set_vector(1.1, 2.1, 3.2, 4.5); b = set_vector(2.2, 4.2, 5.6); result = sse_vector4_add(a, b); print_vector(a); print_vector(b); print_vector(result); return 0; } The g++ parameters I use are: g++ -Wall -pedantic -g -march=i386 -msse intrinsics_SSE_example.C -o h The errors I get are as follows: intrinsics_SSE_example.C: Assembler messages: intrinsics_SSE_example.C:45: Error: too many memory references for movups intrinsics_SSE_example.C:46: Error: too many memory references for movups intrinsics_SSE_example.C:47: Error: too many memory references for addps intrinsics_SSE_example.C:48: Error: too many memory references for movups I have spent a lot of time on trying to debug these errors, googled them and so on. I am a complete noob to Intrinsics and so may have overlooked some important things. Any help is appreciated, Thanks, Sriram.

    Read the article

  • help me improve my sse yuv to rgb ssse3 code

    - by David McPaul
    Hello, I am looking to optimise some sse code I wrote for converting yuv to rgb (both planar and packed yuv functions). i am using SSSE3 at the moment but if there are useful functions from later sse versions thats ok. I am mainly interested in how I would work out processor stalls and the like. Anyone know of any tools that do static analysis of sse code? ; ; Copyright (C) 2009-2010 David McPaul ; ; All rights reserved. Distributed under the terms of the MIT License. ; ; A rather unoptimised set of ssse3 yuv to rgb converters ; does 8 pixels per loop ; inputer: ; reads 128 bits of yuv 8 bit data and puts ; the y values converted to 16 bit in xmm0 ; the u values converted to 16 bit and duplicated into xmm1 ; the v values converted to 16 bit and duplicated into xmm2 ; conversion: ; does the yuv to rgb conversion using 16 bit integer and the ; results are placed into the following registers as 8 bit clamped values ; r values in xmm3 ; g values in xmm4 ; b values in xmm5 ; outputer: ; writes out the rgba pixels as 8 bit values with 0 for alpha ; xmm6 used for scratch ; xmm7 used for scratch %macro cglobal 1 global _%1 %define %1 _%1 align 16 %1: %endmacro ; conversion code %macro yuv2rgbsse2 0 ; u = u - 128 ; v = v - 128 ; r = y + v + v >> 2 + v >> 3 + v >> 5 ; g = y - (u >> 2 + u >> 4 + u >> 5) - (v >> 1 + v >> 3 + v >> 4 + v >> 5) ; b = y + u + u >> 1 + u >> 2 + u >> 6 ; subtract 16 from y movdqa xmm7, [Const16] ; loads a constant using data cache (slower on first fetch but then cached) psubsw xmm0,xmm7 ; y = y - 16 ; subtract 128 from u and v movdqa xmm7, [Const128] ; loads a constant using data cache (slower on first fetch but then cached) psubsw xmm1,xmm7 ; u = u - 128 psubsw xmm2,xmm7 ; v = v - 128 ; load r,b with y movdqa xmm3,xmm0 ; r = y pshufd xmm5,xmm0, 0xE4 ; b = y ; r = y + v + v >> 2 + v >> 3 + v >> 5 paddsw xmm3, xmm2 ; add v to r movdqa xmm7, xmm1 ; move u to scratch pshufd xmm6, xmm2, 0xE4 ; move v to scratch psraw xmm6,2 ; divide v by 4 paddsw xmm3, xmm6 ; and add to r psraw xmm6,1 ; divide v by 2 paddsw xmm3, xmm6 ; and add to r psraw xmm6,2 ; divide v by 4 paddsw xmm3, xmm6 ; and add to r ; b = y + u + u >> 1 + u >> 2 + u >> 6 paddsw xmm5, xmm1 ; add u to b psraw xmm7,1 ; divide u by 2 paddsw xmm5, xmm7 ; and add to b psraw xmm7,1 ; divide u by 2 paddsw xmm5, xmm7 ; and add to b psraw xmm7,4 ; divide u by 32 paddsw xmm5, xmm7 ; and add to b ; g = y - u >> 2 - u >> 4 - u >> 5 - v >> 1 - v >> 3 - v >> 4 - v >> 5 movdqa xmm7,xmm2 ; move v to scratch pshufd xmm6,xmm1, 0xE4 ; move u to scratch movdqa xmm4,xmm0 ; g = y psraw xmm6,2 ; divide u by 4 psubsw xmm4,xmm6 ; subtract from g psraw xmm6,2 ; divide u by 4 psubsw xmm4,xmm6 ; subtract from g psraw xmm6,1 ; divide u by 2 psubsw xmm4,xmm6 ; subtract from g psraw xmm7,1 ; divide v by 2 psubsw xmm4,xmm7 ; subtract from g psraw xmm7,2 ; divide v by 4 psubsw xmm4,xmm7 ; subtract from g psraw xmm7,1 ; divide v by 2 psubsw xmm4,xmm7 ; subtract from g psraw xmm7,1 ; divide v by 2 psubsw xmm4,xmm7 ; subtract from g %endmacro ; outputer %macro rgba32sse2output 0 ; clamp values pxor xmm7,xmm7 packuswb xmm3,xmm7 ; clamp to 0,255 and pack R to 8 bit per pixel packuswb xmm4,xmm7 ; clamp to 0,255 and pack G to 8 bit per pixel packuswb xmm5,xmm7 ; clamp to 0,255 and pack B to 8 bit per pixel ; convert to bgra32 packed punpcklbw xmm5,xmm4 ; bgbgbgbgbgbgbgbg movdqa xmm0, xmm5 ; save bg values punpcklbw xmm3,xmm7 ; r0r0r0r0r0r0r0r0 punpcklwd xmm5,xmm3 ; lower half bgr0bgr0bgr0bgr0 punpckhwd xmm0,xmm3 ; upper half bgr0bgr0bgr0bgr0 ; write to output ptr movntdq [edi], xmm5 ; output first 4 pixels bypassing cache movntdq [edi+16], xmm0 ; output second 4 pixels bypassing cache %endmacro SECTION .data align=16 Const16 dw 16 dw 16 dw 16 dw 16 dw 16 dw 16 dw 16 dw 16 Const128 dw 128 dw 128 dw 128 dw 128 dw 128 dw 128 dw 128 dw 128 UMask db 0x01 db 0x80 db 0x01 db 0x80 db 0x05 db 0x80 db 0x05 db 0x80 db 0x09 db 0x80 db 0x09 db 0x80 db 0x0d db 0x80 db 0x0d db 0x80 VMask db 0x03 db 0x80 db 0x03 db 0x80 db 0x07 db 0x80 db 0x07 db 0x80 db 0x0b db 0x80 db 0x0b db 0x80 db 0x0f db 0x80 db 0x0f db 0x80 YMask db 0x00 db 0x80 db 0x02 db 0x80 db 0x04 db 0x80 db 0x06 db 0x80 db 0x08 db 0x80 db 0x0a db 0x80 db 0x0c db 0x80 db 0x0e db 0x80 ; void Convert_YUV422_RGBA32_SSSE3(void *fromPtr, void *toPtr, int width) width equ ebp+16 toPtr equ ebp+12 fromPtr equ ebp+8 ; void Convert_YUV420P_RGBA32_SSSE3(void *fromYPtr, void *fromUPtr, void *fromVPtr, void *toPtr, int width) width1 equ ebp+24 toPtr1 equ ebp+20 fromVPtr equ ebp+16 fromUPtr equ ebp+12 fromYPtr equ ebp+8 SECTION .text align=16 cglobal Convert_YUV422_RGBA32_SSSE3 ; reserve variables push ebp mov ebp, esp push edi push esi push ecx mov esi, [fromPtr] mov edi, [toPtr] mov ecx, [width] ; loop width / 8 times shr ecx,3 test ecx,ecx jng ENDLOOP REPEATLOOP: ; loop over width / 8 ; YUV422 packed inputer movdqa xmm0, [esi] ; should have yuyv yuyv yuyv yuyv pshufd xmm1, xmm0, 0xE4 ; copy to xmm1 movdqa xmm2, xmm0 ; copy to xmm2 ; extract both y giving y0y0 pshufb xmm0, [YMask] ; extract u and duplicate so each u in yuyv becomes u0u0 pshufb xmm1, [UMask] ; extract v and duplicate so each v in yuyv becomes v0v0 pshufb xmm2, [VMask] yuv2rgbsse2 rgba32sse2output ; endloop add edi,32 add esi,16 sub ecx, 1 ; apparently sub is better than dec jnz REPEATLOOP ENDLOOP: ; Cleanup pop ecx pop esi pop edi mov esp, ebp pop ebp ret cglobal Convert_YUV420P_RGBA32_SSSE3 ; reserve variables push ebp mov ebp, esp push edi push esi push ecx push eax push ebx mov esi, [fromYPtr] mov eax, [fromUPtr] mov ebx, [fromVPtr] mov edi, [toPtr1] mov ecx, [width1] ; loop width / 8 times shr ecx,3 test ecx,ecx jng ENDLOOP1 REPEATLOOP1: ; loop over width / 8 ; YUV420 Planar inputer movq xmm0, [esi] ; fetch 8 y values (8 bit) yyyyyyyy00000000 movd xmm1, [eax] ; fetch 4 u values (8 bit) uuuu000000000000 movd xmm2, [ebx] ; fetch 4 v values (8 bit) vvvv000000000000 ; extract y pxor xmm7,xmm7 ; 00000000000000000000000000000000 punpcklbw xmm0,xmm7 ; interleave xmm7 into xmm0 y0y0y0y0y0y0y0y0 ; extract u and duplicate so each becomes 0u0u punpcklbw xmm1,xmm7 ; interleave xmm7 into xmm1 u0u0u0u000000000 punpcklwd xmm1,xmm7 ; interleave again u000u000u000u000 pshuflw xmm1,xmm1, 0xA0 ; copy u values pshufhw xmm1,xmm1, 0xA0 ; to get u0u0 ; extract v punpcklbw xmm2,xmm7 ; interleave xmm7 into xmm1 v0v0v0v000000000 punpcklwd xmm2,xmm7 ; interleave again v000v000v000v000 pshuflw xmm2,xmm2, 0xA0 ; copy v values pshufhw xmm2,xmm2, 0xA0 ; to get v0v0 yuv2rgbsse2 rgba32sse2output ; endloop add edi,32 add esi,8 add eax,4 add ebx,4 sub ecx, 1 ; apparently sub is better than dec jnz REPEATLOOP1 ENDLOOP1: ; Cleanup pop ebx pop eax pop ecx pop esi pop edi mov esp, ebp pop ebp ret SECTION .note.GNU-stack noalloc noexec nowrite progbits

    Read the article

< Previous Page | 59 60 61 62 63 64 65 66 67 68 69 70  | Next Page >