summaryrefslogtreecommitdiffstats
path: root/src/codegen_sse.h
Commit message (Collapse)AuthorAgeFilesLines
* No need to display the size of transformJukka Ojanen2015-08-281-10/+18
|
* Minimize sin/cos calculations by calculating all factors ones and generate ↵Jukka Ojanen2015-03-181-5/+4
| | | | lookup tables by mapping
* Always run-time generate x64 dynamic codeJukka Ojanen2015-03-181-841/+1335
|
* Don't use long NOPs, instead add extra prefix to extend op codes to align ↵Jukka Ojanen2014-11-171-84/+119
| | | | branch targets
* Add comments to SSE constantsJukka Ojanen2014-11-171-0/+10
|
* Define externals only when neededJukka Ojanen2014-11-161-12/+13
|
* Optionally define SSE constants in headerJukka Ojanen2014-11-161-0/+18
|
* Add some comments to macro assemblyJukka Ojanen2014-11-161-2/+47
|
* Remove unused "neon" labels and mark external function as "extern"Jukka Ojanen2014-11-141-18/+13
|
* Take care of unreferenced parametersJukka Ojanen2014-11-141-0/+9
|
* benchFFTS is computing the correct answer with theseJukka Ojanen2014-11-111-24/+24
|\
| * Damn AT&T syntaxfix_generate_size4_base_caseJukka Ojanen2014-11-111-24/+24
| |
* | generate_leaf_init, generate_leaf_ee, generate_leaf_eo, generate_leaf_oe and ↵Jukka Ojanen2014-11-111-2/+507
| | | | | | | | | | | | generate_leaf_oo Multiple offset constants by 4, and remove multiply by 4 from "offset fixing" loops.
* | Replace movdqa with movaps which is one byte shorter. Don't need RDI ↵Jukka Ojanen2014-11-101-41/+42
|/ | | | register as R9 is saved by caller.
* Generate function in "generate_size4_base_case"Jukka Ojanen2014-11-101-10/+79
|
* Removed last bits of magic from "generate_size8_base_case".Jukka Ojanen2014-11-091-138/+120
| | | | Replace x64_call_imm with x64_call_code.
* Replace "magic bytes" with various macrosJukka Ojanen2014-11-091-211/+59
|
* Replace SHUFPS with x64_sse_shufps_reg_reg_immJukka Ojanen2014-11-091-79/+9
|
* Replace MULPS with x64_sse_mulps_reg_regJukka Ojanen2014-11-091-31/+12
|
* Replace MOVDQA with x64_sse_movdqa_reg_membase/64_sse_movdqa_membase_regJukka Ojanen2014-11-091-98/+20
|
* Replace MOVAPS with x64_sse_movaps_reg_membaseJukka Ojanen2014-11-091-76/+1
|
* Replace SUBPS with x64_sse_subps_reg_regJukka Ojanen2014-11-091-30/+12
|
* Replace ADDPS with x64_sse_addps_reg_regJukka Ojanen2014-11-091-30/+13
|
* Replace XORPS with x64_sse_xorps_reg_regJukka Ojanen2014-11-091-21/+4
|
* Replace XOR2 with x86_clear_reg, MOV_D with ↵Jukka Ojanen2014-11-091-112/+12
| | | | x64_mov_membase_reg/x86_mov_reg_membase, MOV_R with x64_mov_reg_reg and x64_alu_reg_imm_size_body with x64_alu_reg_imm_size
* Replace MOV_I with x86_mov_reg_imm, SHIFT with x86_shift_reg_imm, CALL with ↵Jukka Ojanen2014-11-091-106/+17
| | | | x64_call_imm, POP with x64_pop_reg, PUSH with x64_push_reg
* Replace add/sub immediate value with x64_alu_reg_imm_size_bodyJukka Ojanen2014-11-091-50/+3
|
* Replace register names with new definitionsJukka Ojanen2014-11-081-83/+54
|
* Win64 actually "generate_size8_base_case" instead of copyingJukka Ojanen2014-11-061-315/+744
|
* Reorder functions to alphabetical orderJukka Ojanen2014-11-051-122/+206
|
* Generate leaf_ee_init and x_init instead of copyingJukka Ojanen2014-11-041-47/+160
|
* Replace _M_AMD64 with _M_X64 as it is equal and "neutral"Jukka Ojanen2014-11-041-3/+3
|
* Refactor generate_func_codeJukka Ojanen2014-11-041-0/+242
|
* MOVDQA "intrinsic", two operand MOVDQA2, three operand MOVDQA3 helpersJukka Ojanen2014-11-031-16/+110
|
* XMM6:XMM15 Nonvolatile, must be preserved as needed by callee. ↵Jukka Ojanen2014-11-011-5/+29
| | | | http://msdn.microsoft.com/en-us/library/9z1stfyw(v=vs.80).aspx
* Add CMake as an alternative build systemJukka Ojanen2014-10-311-103/+137
| | | | Add support for Windows x64 (requires YASM)
* Adding in Vim modelines to all .c and .h files.Robert Massaioli2013-12-051-0/+1
|
* Transforms for N>=32 are now thread safeAnthony Blake2012-10-201-0/+1
|
* NEON backwards transforms work correctlyAnthony Blake2012-10-201-4/+0
|
* Added copyright noticeAnthony Blake2012-10-191-0/+34
|
* Fixed gcc compile issueAnthony Blake2012-10-181-2/+2
|
* SSE workingAnthony Blake2012-08-301-12/+36
|
* SSE working in GDBAnthony Blake2012-08-301-0/+32
|
* SSE Leaves workingAnthony Blake2012-08-291-4/+8
|
* SSE LEAF EE worksAnthony Blake2012-08-291-0/+104
OpenPOWER on IntegriCloud