Search Results

Search found 5623 results on 225 pages for 'inline assembly'.

Page 50/225 | < Previous Page | 46 47 48 49 50 51 52 53 54 55 56 57  | Next Page >

  • delete a file in protected mode env(like windows xp)

    - by JGC
    hi I write a program to delete a file from somewhere of my harddisk in 8086 but when i use int 21h (ah=41h) an error happens and carry set to 1.and I cannot delete that. does anyone know what can I do? I think it should be from protected mode which does not allow my program to delete another file.I want the answer and language is not matter.

    Read the article

  • help me improve my sse yuv to rgb ssse3 code

    - by David McPaul
    Hello, I am looking to optimise some sse code I wrote for converting yuv to rgb (both planar and packed yuv functions). i am using SSSE3 at the moment but if there are useful functions from later sse versions thats ok. I am mainly interested in how I would work out processor stalls and the like. Anyone know of any tools that do static analysis of sse code? ; ; Copyright (C) 2009-2010 David McPaul ; ; All rights reserved. Distributed under the terms of the MIT License. ; ; A rather unoptimised set of ssse3 yuv to rgb converters ; does 8 pixels per loop ; inputer: ; reads 128 bits of yuv 8 bit data and puts ; the y values converted to 16 bit in xmm0 ; the u values converted to 16 bit and duplicated into xmm1 ; the v values converted to 16 bit and duplicated into xmm2 ; conversion: ; does the yuv to rgb conversion using 16 bit integer and the ; results are placed into the following registers as 8 bit clamped values ; r values in xmm3 ; g values in xmm4 ; b values in xmm5 ; outputer: ; writes out the rgba pixels as 8 bit values with 0 for alpha ; xmm6 used for scratch ; xmm7 used for scratch %macro cglobal 1 global _%1 %define %1 _%1 align 16 %1: %endmacro ; conversion code %macro yuv2rgbsse2 0 ; u = u - 128 ; v = v - 128 ; r = y + v + v >> 2 + v >> 3 + v >> 5 ; g = y - (u >> 2 + u >> 4 + u >> 5) - (v >> 1 + v >> 3 + v >> 4 + v >> 5) ; b = y + u + u >> 1 + u >> 2 + u >> 6 ; subtract 16 from y movdqa xmm7, [Const16] ; loads a constant using data cache (slower on first fetch but then cached) psubsw xmm0,xmm7 ; y = y - 16 ; subtract 128 from u and v movdqa xmm7, [Const128] ; loads a constant using data cache (slower on first fetch but then cached) psubsw xmm1,xmm7 ; u = u - 128 psubsw xmm2,xmm7 ; v = v - 128 ; load r,b with y movdqa xmm3,xmm0 ; r = y pshufd xmm5,xmm0, 0xE4 ; b = y ; r = y + v + v >> 2 + v >> 3 + v >> 5 paddsw xmm3, xmm2 ; add v to r movdqa xmm7, xmm1 ; move u to scratch pshufd xmm6, xmm2, 0xE4 ; move v to scratch psraw xmm6,2 ; divide v by 4 paddsw xmm3, xmm6 ; and add to r psraw xmm6,1 ; divide v by 2 paddsw xmm3, xmm6 ; and add to r psraw xmm6,2 ; divide v by 4 paddsw xmm3, xmm6 ; and add to r ; b = y + u + u >> 1 + u >> 2 + u >> 6 paddsw xmm5, xmm1 ; add u to b psraw xmm7,1 ; divide u by 2 paddsw xmm5, xmm7 ; and add to b psraw xmm7,1 ; divide u by 2 paddsw xmm5, xmm7 ; and add to b psraw xmm7,4 ; divide u by 32 paddsw xmm5, xmm7 ; and add to b ; g = y - u >> 2 - u >> 4 - u >> 5 - v >> 1 - v >> 3 - v >> 4 - v >> 5 movdqa xmm7,xmm2 ; move v to scratch pshufd xmm6,xmm1, 0xE4 ; move u to scratch movdqa xmm4,xmm0 ; g = y psraw xmm6,2 ; divide u by 4 psubsw xmm4,xmm6 ; subtract from g psraw xmm6,2 ; divide u by 4 psubsw xmm4,xmm6 ; subtract from g psraw xmm6,1 ; divide u by 2 psubsw xmm4,xmm6 ; subtract from g psraw xmm7,1 ; divide v by 2 psubsw xmm4,xmm7 ; subtract from g psraw xmm7,2 ; divide v by 4 psubsw xmm4,xmm7 ; subtract from g psraw xmm7,1 ; divide v by 2 psubsw xmm4,xmm7 ; subtract from g psraw xmm7,1 ; divide v by 2 psubsw xmm4,xmm7 ; subtract from g %endmacro ; outputer %macro rgba32sse2output 0 ; clamp values pxor xmm7,xmm7 packuswb xmm3,xmm7 ; clamp to 0,255 and pack R to 8 bit per pixel packuswb xmm4,xmm7 ; clamp to 0,255 and pack G to 8 bit per pixel packuswb xmm5,xmm7 ; clamp to 0,255 and pack B to 8 bit per pixel ; convert to bgra32 packed punpcklbw xmm5,xmm4 ; bgbgbgbgbgbgbgbg movdqa xmm0, xmm5 ; save bg values punpcklbw xmm3,xmm7 ; r0r0r0r0r0r0r0r0 punpcklwd xmm5,xmm3 ; lower half bgr0bgr0bgr0bgr0 punpckhwd xmm0,xmm3 ; upper half bgr0bgr0bgr0bgr0 ; write to output ptr movntdq [edi], xmm5 ; output first 4 pixels bypassing cache movntdq [edi+16], xmm0 ; output second 4 pixels bypassing cache %endmacro SECTION .data align=16 Const16 dw 16 dw 16 dw 16 dw 16 dw 16 dw 16 dw 16 dw 16 Const128 dw 128 dw 128 dw 128 dw 128 dw 128 dw 128 dw 128 dw 128 UMask db 0x01 db 0x80 db 0x01 db 0x80 db 0x05 db 0x80 db 0x05 db 0x80 db 0x09 db 0x80 db 0x09 db 0x80 db 0x0d db 0x80 db 0x0d db 0x80 VMask db 0x03 db 0x80 db 0x03 db 0x80 db 0x07 db 0x80 db 0x07 db 0x80 db 0x0b db 0x80 db 0x0b db 0x80 db 0x0f db 0x80 db 0x0f db 0x80 YMask db 0x00 db 0x80 db 0x02 db 0x80 db 0x04 db 0x80 db 0x06 db 0x80 db 0x08 db 0x80 db 0x0a db 0x80 db 0x0c db 0x80 db 0x0e db 0x80 ; void Convert_YUV422_RGBA32_SSSE3(void *fromPtr, void *toPtr, int width) width equ ebp+16 toPtr equ ebp+12 fromPtr equ ebp+8 ; void Convert_YUV420P_RGBA32_SSSE3(void *fromYPtr, void *fromUPtr, void *fromVPtr, void *toPtr, int width) width1 equ ebp+24 toPtr1 equ ebp+20 fromVPtr equ ebp+16 fromUPtr equ ebp+12 fromYPtr equ ebp+8 SECTION .text align=16 cglobal Convert_YUV422_RGBA32_SSSE3 ; reserve variables push ebp mov ebp, esp push edi push esi push ecx mov esi, [fromPtr] mov edi, [toPtr] mov ecx, [width] ; loop width / 8 times shr ecx,3 test ecx,ecx jng ENDLOOP REPEATLOOP: ; loop over width / 8 ; YUV422 packed inputer movdqa xmm0, [esi] ; should have yuyv yuyv yuyv yuyv pshufd xmm1, xmm0, 0xE4 ; copy to xmm1 movdqa xmm2, xmm0 ; copy to xmm2 ; extract both y giving y0y0 pshufb xmm0, [YMask] ; extract u and duplicate so each u in yuyv becomes u0u0 pshufb xmm1, [UMask] ; extract v and duplicate so each v in yuyv becomes v0v0 pshufb xmm2, [VMask] yuv2rgbsse2 rgba32sse2output ; endloop add edi,32 add esi,16 sub ecx, 1 ; apparently sub is better than dec jnz REPEATLOOP ENDLOOP: ; Cleanup pop ecx pop esi pop edi mov esp, ebp pop ebp ret cglobal Convert_YUV420P_RGBA32_SSSE3 ; reserve variables push ebp mov ebp, esp push edi push esi push ecx push eax push ebx mov esi, [fromYPtr] mov eax, [fromUPtr] mov ebx, [fromVPtr] mov edi, [toPtr1] mov ecx, [width1] ; loop width / 8 times shr ecx,3 test ecx,ecx jng ENDLOOP1 REPEATLOOP1: ; loop over width / 8 ; YUV420 Planar inputer movq xmm0, [esi] ; fetch 8 y values (8 bit) yyyyyyyy00000000 movd xmm1, [eax] ; fetch 4 u values (8 bit) uuuu000000000000 movd xmm2, [ebx] ; fetch 4 v values (8 bit) vvvv000000000000 ; extract y pxor xmm7,xmm7 ; 00000000000000000000000000000000 punpcklbw xmm0,xmm7 ; interleave xmm7 into xmm0 y0y0y0y0y0y0y0y0 ; extract u and duplicate so each becomes 0u0u punpcklbw xmm1,xmm7 ; interleave xmm7 into xmm1 u0u0u0u000000000 punpcklwd xmm1,xmm7 ; interleave again u000u000u000u000 pshuflw xmm1,xmm1, 0xA0 ; copy u values pshufhw xmm1,xmm1, 0xA0 ; to get u0u0 ; extract v punpcklbw xmm2,xmm7 ; interleave xmm7 into xmm1 v0v0v0v000000000 punpcklwd xmm2,xmm7 ; interleave again v000v000v000v000 pshuflw xmm2,xmm2, 0xA0 ; copy v values pshufhw xmm2,xmm2, 0xA0 ; to get v0v0 yuv2rgbsse2 rgba32sse2output ; endloop add edi,32 add esi,8 add eax,4 add ebx,4 sub ecx, 1 ; apparently sub is better than dec jnz REPEATLOOP1 ENDLOOP1: ; Cleanup pop ebx pop eax pop ecx pop esi pop edi mov esp, ebp pop ebp ret SECTION .note.GNU-stack noalloc noexec nowrite progbits

    Read the article

  • Why would I need a using statement to Libary B extn methods, if they're used in Library A & it's Li

    - by Greg
    Hi, I have: Main Program Class - uses Library A Library A - has partial classes which mix in methods from Library B Library B - mix in methods & interfaces Why would I need a using statement to LibaryB just to get their extension methods working in the main class? That is given that it's Library B that defines the classes that will be extended. EDIT - Except from code // *** PROGRAM *** using TopologyDAL; using Topology; // *** THIS WAS NEEDED TO GET EXTN METHODS APPEARING *** class Program { static void Main(string[] args) { var context = new Model1Container(); Node myNode; // ** trying to get myNode mixin methods to appear seems to need using line to point to Library B *** } } // ** LIBRARY A namespace TopologyDAL { public partial class Node { // Auto generated from EF } public partial class Node : INode<int> // to add extension methods from Library B { public int Key } } // ** LIBRARY B namespace ToplogyLibrary { public static class NodeExtns { public static void FromNodeMixin<T>(this INode<T> node) { // XXXX } } public interface INode<T> { // Properties T Key { get; } // Methods } }

    Read the article

  • How to specify execution time of x86 and PowerPC instructions?

    - by Goofy
    Hello! I have to approximate execution time of PowerPC and x86 assembler code.I understand that I cannot compute exact it dependson many problems (current processor state - x86 processor dicides internal instructions in microinstructions, memory access time obtainig code from cache of from slower memory etc.). I found some information in Intel Optimizaction reference (APPENDIX C), but it does not provide information about all general purpose instructions. Is there any complete reference about it? What about PowerPC processors? Where can I fund such information?

    Read the article

  • How to benchmark on multi-core processors

    - by Pascal Cuoq
    I am looking for ways to perform micro-benchmarks on multi-core processors. Context: At about the same time desktop processors introduced out-of-order execution that made performance hard to predict, they, perhaps not coincidentally, also introduced special instructions to get very precise timings. Example of these instructions are rdtsc on x86 and rftb on PowerPC. These instructions gave timings that were more precise than could ever be allowed by a system call, allowed programmers to micro-benchmark their hearts out, for better or for worse. On a yet more modern processor with several cores, some of which sleep some of the time, the counters are not synchronized between cores. We are told that rdtsc is no longer safe to use for benchmarking, but I must have been dozing off when we were explained the alternative solutions. Question: Some systems may save and restore the performance counter and provide an API call to read the proper sum. If you know what this call is for any operating system, please let us know in an answer. Some systems may allow to turn off cores, leaving only one running. I know Mac OS X Leopard does when the right Preference Pane is installed from the Developers Tools. Do you think that this make rdtsc safe to use again? More context: Please assume I know what I am doing when trying to do a micro-benchmark. If you are of the opinion that if an optimization's gains cannot be measured by timing the whole application, it's not worth optimizing, I agree with you, but I cannot time the whole application until the alternative data structure is finished, which will take a long time. In fact, if the micro-benchmark were not promising, I could decide to give up on the implementation now; I need figures to provide in a publication whose deadline I have no control over.

    Read the article

  • Error in Print Function in Bubble Sort MIPS?

    - by m00nbeam360
    Sorry that this is such a long block of code, but do you see any obvious syntax errors in this? I feel like the problem is that the code isn't printing correctly since the sort and swap methods were from my textbook. Please help if you can! .data save: .word 1,2,4,2,5,6 size: .word 6 .text swap: sll $t1, $a1, 2 #shift bits by 2 add $t1, $a1, $t1 #set $t1 address to v[k] lw $t0, 0($t1) #load v[k] into t1 lw $t2, 4($t1) #load v[k+1] into t1 sw $t2, 0($t1) #swap addresses sw $t0, 4($t1) #swap addresses jr $ra #return sort: addi $sp, $sp, -20 #make enough room on the stack for five registers sw $ra, 16($sp) #save the return address on the stack sw $s3, 12($sp) #save $s3 on the stack sw $s2, 8($sp) #save Ss2 on the stack sw $s1, 4($sp) #save $s1 on the stack sw $s0, 0($sp) #save $s0 on the stack move $s2, $a0 #copy the parameter $a0 into $s2 (save $a0) move $s3, $a1 #copy the parameter $a1 into $s3 (save $a1) move $s0, $zero #start of for loop, i = 0 for1tst: slt $t0, $s0, $s3 #$t0 = 0 if $s0 S $s3 (i S n) beq $t0, $zero, exit1 #go to exit1 if $s0 S $s3 (i S n) addi $s1, $s0, -1 #j - i - 1 for2tst: slti $t0, $s1, 0 #$t0 = 1 if $s1 < 0 (j < 0) bne $t0, $zero, exit2 #$t0 = 1 if $s1 < 0 (j < 0) sll $t1, $s1, 2 #$t1 = j * 4 (shift by 2 bits) add $t2, $s2, $t1 #$t2 = v + (j*4) lw $t3, 0($t2) #$t3 = v[j] lw $t4, 4($t2) #$t4 = v[j+1] slt $t0, $t4, $t3 #$t0 = 0 if $t4 S $t3 beq $t0, $zero, exit2 #go to exit2 if $t4 S $t3 move $a0, $s2 #1st parameter of swap is v(old $a0) move $a1, $s1 #2nd parameter of swap is j jal swap #swap addi $s1, $s1, -1 j for2tst #jump to test of inner loop j print exit2: addi $s0, $s0, 1 #i = i + 1 j for1tst #jump to test of outer loop exit1: lw $s0, 0($sp) #restore $s0 from stack lw $s1, 4($sp) #resture $s1 from stack lw $s2, 8($sp) #restore $s2 from stack lw $s3, 12($sp) #restore $s3 from stack lw $ra, 16($sp) #restore $ra from stack addi $sp, $sp, 20 #restore stack pointer jr $ra #return to calling routine .data space:.asciiz " " # space to insert between numbers head: .asciiz "The sorted numbers are:\n" .text print:add $t0, $zero, $a0 # starting address of array add $t1, $zero, $a1 # initialize loop counter to array size la $a0, head # load address of print heading li $v0, 4 # specify Print String service syscall # print heading out: lw $a0, 0($t0) # load fibonacci number for syscall li $v0, 1 # specify Print Integer service syscall # print fibonacci number la $a0, space # load address of spacer for syscall li $v0, 4 # specify Print String service syscall # output string addi $t0, $t0, 4 # increment address addi $t1, $t1, -1 # decrement loop counter bgtz $t1, out # repeat if not finished jr $ra # return

    Read the article

  • Why increased pipeline depth does not always mean increased throughput?

    - by worlds-apart89
    This is perhaps more of a discussion question, but I thought stackoverflow could be the right place to ask it. I am studying the concept of instruction pipelining. I have been taught that a pipeline's instruction throughput is increased once the number of pipeline stages is increased, but in some cases, throughput might not change. Under what conditions, does this happen? I am thinking stalling and branching could be the answer to the question, but I wonder if I am missing something crucial.

    Read the article

  • Setting processor to 32-bit mode

    - by dboarman-FissureStudios
    It seems that the following is a common method given in many tutorials on switching a processor from 16-bit to 32-bit: mov eax, cr0 ; set bit 0 in CR0-go to pmode or eax, 1 mov cr0, eax Why wouldn't I simply do the following: or cr0, 1 Is there something I'm missing? Possibly the only thing I can think of is that I cannot perform an operation like this on the cr0 register.

    Read the article

  • Why does it NOT give a segmentation violation?

    - by user198729
    The code below is said to give a segmentation violation: #include <stdio.h> #include <string.h> void function(char *str) { char buffer[16]; strcpy(buffer,str); } int main() { char large_string[256]; int i; for( i = 0; i < 255; i++) large_string[i] = 'A'; function(large_string); return 1; } It's compiled and run like this: gcc -Wall -Wextra hw.cpp && a.exe But there is nothing output. NOTE The above code indeed overwrites the ret address and so on if you really understand what's going underneath.

    Read the article

  • Outputting variable values in x86?

    - by Airjoe
    Hello All- I'm working on a homework assignment in x86 and it isn't working as I expect (surprise surprise!). I'd like to be able to output values of variables in x86 functions to ensure that the values are what I expect them to be. Is there a simple way to do this, or is it very complex? For what it's worth, the x86 functions are being used by a C file and compiled with gcc, so if that makes it simpler that is how I'm going about it. Thanks for the help.

    Read the article

  • How to correctly load 32-bit DLL dependencies when running a program from a batch file

    - by neilwhitaker1
    I have written a tool that references Microsoft.TeamFoundation.VersionControl.Client.dll, which is a 32-bit DLL. When I build my tool on 64-bit Windows, I set Visual Studio to specifically target X86 in order to force it to a 32-bit build. Targetting X86 instead of All-CPU's prevents me from getting a BadImageFormatException, as long as I invoke the tool directly (e.g. by typing "myTool.exe" on the command line). However, if I run a batch file that invokes the tool, I still get the exception. This happens even if the batch file runs in a 32-bit command prompt (%WINDIR%\SysWOW64\cmd.exe). What else can I do to make this work?

    Read the article

  • ARM Simulator on Windows

    - by Betamoo
    I am studying ARM Processors from a textbook... I thought it will be more useful if I could apply what I learn on an ARM simulator... writing code then watching results and different execution stages would be more fun... I have searched for it, but all I could find was either a freeware on linux or a demo on windows Is there a simulator that allow me to see execution steps and different changes for ARM processor (any version!) that runs on windows?? Thanks

    Read the article

  • easy asm program(nasm)

    - by GLeBaTi
    org 0x100 SEGMENT .CODE mov ah,0x9 mov dx, Msg1 int 0x21 ;string input mov ah,0xA mov dx,buff int 0x21 mov ax,0 mov al,[buff+1]; length ;string UPPERCASE mov cl, al mov si, buff cld loop1: lodsb; cmp al, 'a' jnb upper loop loop1 ;output mov ah,0x9 mov dx, buff int 0x21 exit: mov ah, 0x8 int 0x21 int 0x20 upper: sub al,32 jmp loop1 SEGMENT .DATA Msg1 db 'Press string: $' buff db 254,0 this code perform poorly. I think that problem in "jnb upper". This program make small symbols into big symbols.

    Read the article

  • Is there any kind of standard for 8086 multiprocessing?

    - by Earlz
    Back when I made an 8086 emulator I noticed that there was the LOCK prefix intended for synchonization in a multiprocessor environment. Yet the only multitasking I know of for the x86 arch. involves use of the APIC which didn't come around until either the Pentiums or 486s. Was there any kind of standard for 8086 multitasking or was it done by some manufacturer specific extensions to the instruction set and/or special ports? By standard, I mean things like: How do you separate the 2 processors if they both use the same memory? This is impossible without some kind of way to make each processor execute a different piece of code. (or cause an interrupt on only one processor)

    Read the article

  • "C variable type sizes are machine dependent." Is it really true? signed & unsigned numbers ;

    - by claws
    Hello, I've been told that C types are machine dependent. Today I wanted to verify it. void legacyTypes() { /* character types */ char k_char = 'a'; //Signedness --> signed & unsigned signed char k_char_s = 'a'; unsigned char k_char_u = 'a'; /* integer types */ int k_int = 1; /* Same as "signed int" */ //Signedness --> signed & unsigned signed int k_int_s = -2; unsigned int k_int_u = 3; //Size --> short, _____, long, long long short int k_s_int = 4; long int k_l_int = 5; long long int k_ll_int = 6; /* real number types */ float k_float = 7; double k_double = 8; } I compiled it on a 32-Bit machine using minGW C compiler _legacyTypes: pushl %ebp movl %esp, %ebp subl $48, %esp movb $97, -1(%ebp) # char movb $97, -2(%ebp) # signed char movb $97, -3(%ebp) # unsigned char movl $1, -8(%ebp) # int movl $-2, -12(%ebp)# signed int movl $3, -16(%ebp) # unsigned int movw $4, -18(%ebp) # short int movl $5, -24(%ebp) # long int movl $6, -32(%ebp) # long long int movl $0, -28(%ebp) movl $0x40e00000, %eax movl %eax, -36(%ebp) fldl LC2 fstpl -48(%ebp) leave ret I compiled the same code on 64-Bit processor (Intel Core 2 Duo) on GCC (linux) legacyTypes: .LFB2: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 movq %rsp, %rbp .cfi_offset 6, -16 .cfi_def_cfa_register 6 movb $97, -1(%rbp) # char movb $97, -2(%rbp) # signed char movb $97, -3(%rbp) # unsigned char movl $1, -12(%rbp) # int movl $-2, -16(%rbp)# signed int movl $3, -20(%rbp) # unsigned int movw $4, -6(%rbp) # short int movq $5, -32(%rbp) # long int movq $6, -40(%rbp) # long long int movl $0x40e00000, %eax movl %eax, -24(%rbp) movabsq $4620693217682128896, %rax movq %rax, -48(%rbp) leave ret Observations char, signed char, unsigned char, int, unsigned int, signed int, short int, unsigned short int, signed short int all occupy same no. of bytes on both 32-Bit & 64-Bit Processor. The only change is in long int & long long int both of these occupy 32-bit on 32-bit machine & 64-bit on 64-bit machine. And also the pointers, which take 32-bit on 32-bit CPU & 64-bit on 64-bit CPU. Questions: I cannot say, what the books say is wrong. But I'm missing something here. What exactly does "Variable types are machine dependent mean?" As you can see, There is no difference between instructions for unsigned & signed numbers. Then how come the range of numbers that can be addressed using both is different? I was reading http://stackoverflow.com/questions/2511246/how-to-maintain-fixed-size-of-c-variable-types-over-different-machines I didn't get the purpose of the question or their answers. What maintaining fixed size? They all are the same. I didn't understand how those answers are going to ensure the same size.

    Read the article

  • Electronic resources for learning Z/OS assembler?

    - by Jared
    This is a follow up to this question. I'm totally blind so printed books aren't an option. All the recommended books appear to have been published before electronic publishing got started. I've been able to learn the very basics but would like something between here's what a register is, and the IBM reference material. Searching the normal places like Safari Books Online has come up dry.

    Read the article

  • C++ word to bytes

    - by Vit
    Hi, I tried to read CPUID using assembler in C++. I know there is function for it in , but I want the asm way. So, after CPUID is executed, it should fill eax,ebx,ecx registers with ASCII coded string. But my problem is, since I can in asm adress only full, or half eax register, how to break that 32 bits into 4 bytes. I used this: #include <iostream> #include <stdlib.h> int main() { _asm { cpuid /*There I need to mov values from eax,ebx and ecx to some propriate variables*/ } system("PAUSE"); return(0); }

    Read the article

  • How do I patch a Windows API at runtime so that it to returns 0 in x64?

    - by Jorge Vasquez
    In x86, I get the function address using GetProcAddress() and write a simple XOR EAX,EAX; RET; in it. Simple and effective. How do I do the same in x64? bool DisableSetUnhandledExceptionFilter() { const BYTE PatchBytes[5] = { 0x33, 0xC0, 0xC2, 0x04, 0x00 }; // XOR EAX,EAX; RET; // Obtain the address of SetUnhandledExceptionFilter HMODULE hLib = GetModuleHandle( _T("kernel32.dll") ); if( hLib == NULL ) return false; BYTE* pTarget = (BYTE*)GetProcAddress( hLib, "SetUnhandledExceptionFilter" ); if( pTarget == 0 ) return false; // Patch SetUnhandledExceptionFilter if( !WriteMemory( pTarget, PatchBytes, sizeof(PatchBytes) ) ) return false; // Ensures out of cache FlushInstructionCache(GetCurrentProcess(), pTarget, sizeof(PatchBytes)); // Success return true; } static bool WriteMemory( BYTE* pTarget, const BYTE* pSource, DWORD Size ) { // Check parameters if( pTarget == 0 ) return false; if( pSource == 0 ) return false; if( Size == 0 ) return false; if( IsBadReadPtr( pSource, Size ) ) return false; // Modify protection attributes of the target memory page DWORD OldProtect = 0; if( !VirtualProtect( pTarget, Size, PAGE_EXECUTE_READWRITE, &OldProtect ) ) return false; // Write memory memcpy( pTarget, pSource, Size ); // Restore memory protection attributes of the target memory page DWORD Temp = 0; if( !VirtualProtect( pTarget, Size, OldProtect, &Temp ) ) return false; // Success return true; } This example is adapted from code found here: http://www.debuginfo.com/articles/debugfilters.html#overwrite .

    Read the article

  • Software Protection: Shuffeling my application?

    - by Martijn Courteaux
    Hi, I want to continue on my previous question: http://stackoverflow.com/questions/3007168/torrents-can-i-protect-my-software-by-sending-wrong-bytes Developer Art suggested to add a unique key to the application, to identifier the cracker. But JAB said that crackers can search where my unique key is located by checking for binary differences, if the cracker has multiple copies of my software. Then crackers change that key to make them self anonymous. That is true. Now comes the question: If I want to add a unique key, are there tools to shuffle (a kind of obfuscation) the program modules? So, that a binary compare would say that the two files are completely different. So they can't locate the identifier key. I'm pretty sure it is possible (maybe by replacing assembler blocks and make some jumps). I think it would be enough to make 30 to 40 shuffles of my software. Thanks

    Read the article

  • Are programming languages and methods inefficient? (assembler and C knowledge needed)

    - by b-gen-jack-o-neill
    Hi, for a long time, I am thinking and studying output of C language compiler in assembler form, as well as CPU architecture. I know this may be silly to you, but it seems to me that something is very ineffective. Please, don´t be angry if I am wrong, and there is some reason I do not see for all these principles. I will be very glad if you tell me why is it designed this way. I actually truly believe I am wrong, I know the genius minds of people which get PCs together knew a reason to do so. What exactly, do you ask? I´ll tell you right away, I use C as a example: 1: Stack local scope memory allocation: So, typical local memory allocation uses stack. Just copy esp to ebp and than allocate all the memory via ebp. OK, I would understand this if you explicitly need allocate RAM by default stack values, but if I do understand it correctly, modern OS use paging as a translation layer between application and physical RAM, when address you desire is further translated before reaching actual RAM byte. So why don´t just say 0x00000000 is int a,0x00000004 is int b and so? And access them just by mov 0x00000000,#10? Because you wont actually access memory blocks 0x00000000 and 0x00000004 but those your OS set the paging tables to. Actually, since memory allocation by ebp and esp use indirect addressing, "my" way would be even faster. 2: Variable allocation duplicity: When you run application, Loader load its code into RAM. When you create variable, or string, compiler generates code that pushes these values on the top o stack when created in main. So there is actual instruction for do so, and that actual number in memory. So, there are 2 entries of the same value in RAM. One in form of instruction, second in form of actual bytes in the RAM. But why? Why not to just when declaring variable count at which memory block it would be, than when used, just insert this memory location?

    Read the article

  • Where are the function literals in c++?

    - by academicRobot
    First of all, maybe literals is not the right term for this concept, but its the closest I could think of (not literals in the sense of functions as first class citizens). The idea is that when you make a conventional function call, it compiles to something like this: callq <immediate address> But if you make a function call using a function pointer, it compiles to something like this: mov <memory location>,%rax callq *%rax Which is all well and good. However, what if I'm writing a template library that requires a callback of some sort with a specified argument list and the user of the library is expected to know what function they want to call at compile time? Then I would like to write my template to accept a function literal as a template parameter. So, similar to template <int int_literal> struct my_template {...};` I'd like to write template <func_literal_t func_literal> struct my_template {...}; and have calls to func_literal within my_template compile to callq <immediate address>. Is there a facility in C++ for this, or a work around to achieve the same effect? If not, why not (e.g. some cataclysmic side effects)? How about C++0x or another language? Solutions that are not portable are fine. Solutions that include the use of member function pointers would be ideal. I'm not particularly interested in being told "You are a <socially unacceptable term for a person of low IQ>, just use function pointers/functors." This is a curiosity based question, and it seems that it might be useful in some (albeit limited) applications. It seems like this should be possible since function names are just placeholders for a (relative) memory address, so why not allow more liberal use (e.g. aliasing) of this placeholder. p.s. I use function pointers and functions objects all the the time and they are great. But this post got me thinking about the don't pay for what you don't use principle in relation to function calls, and it seems like forcing the use of function pointers or similar facility when the function is known at compile time is a violation of this principle, though a small one.

    Read the article

< Previous Page | 46 47 48 49 50 51 52 53 54 55 56 57  | Next Page >