1 files changed, 385 insertions, 25 deletions
diff --git a/docs/ReleaseNotes.rst b/docs/ReleaseNotes.rst
index c0d2ea1..fd149c9 100644
--- a/docs/ReleaseNotes.rst
+++ b/docs/ReleaseNotes.rst
@@ -5,12 +5,6 @@ LLVM 3.7 Release Notes
 .. contents::
     :local:
 
-.. warning::
-   These are in-progress notes for the upcoming LLVM 3.7 release.  You may
-   prefer the `LLVM 3.6 Release Notes <http://llvm.org/releases/3.6.0/docs
-   /ReleaseNotes.html>`_.
-
-
 Introduction
 ============
 
@@ -23,7 +17,7 @@ from the `LLVM releases web site <http://llvm.org/releases/>`_.
 For more information about LLVM, including information about the latest
 release, please check out the `main LLVM web site <http://llvm.org/>`_.  If you
 have questions or comments, the `LLVM Developer's Mailing List
-<http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>`_ is a good place to send
+<http://lists.llvm.org/mailman/listinfo/llvm-dev>`_ is a good place to send
 them.
 
 Note that if you are reading this file from a Subversion checkout or the main
@@ -48,46 +42,346 @@ Non-comprehensive list of changes in this release
   collection of tips for frontend authors on how to generate IR which LLVM is
   able to effectively optimize.
 
-* The DataLayout is no longer optional. All the IR level optimizations expects
+* The ``DataLayout`` is no longer optional. All the IR level optimizations expects
   it to be present and the API has been changed to use a reference instead of
   a pointer to make it explicit. The Module owns the datalayout and it has to
   match the one attached to the TargetMachine for generating code.
 
-* ... next change ...
+  In 3.6, a pass was inserted in the pipeline to make the ``DataLayout`` accessible:
+    ``MyPassManager->add(new DataLayoutPass(MyTargetMachine->getDataLayout()));``
+  In 3.7, you don't need a pass, you set the ``DataLayout`` on the ``Module``:
+    ``MyModule->setDataLayout(MyTargetMachine->createDataLayout());``
 
-.. NOTE
-   If you would like to document a larger change, then you can add a
-   subsection about it right here. You can copy the following boilerplate
-   and un-indent it (the indentation causes it to be inside this comment).
+  The LLVM C API ``LLVMGetTargetMachineData`` is deprecated to reflect the fact
+  that it won't be available anymore from ``TargetMachine`` in 3.8.
 
-   Special New Feature
-   -------------------
+* Comdats are now orthogonal to the linkage. LLVM will not create
+  comdats for weak linkage globals and the frontends are responsible
+  for explicitly adding them.
 
-   Makes programs 10x faster by doing Special New Thing.
+* On ELF we now support multiple sections with the same name and
+  comdat. This allows for smaller object files since multiple
+  sections can have a simple name (`.text`, `.rodata`, etc).
 
-Changes to the ARM Backend
---------------------------
+* LLVM now lazily loads metadata in some cases. Creating archives
+  with IR files with debug info is now 25X faster.
+
+* llvm-ar can create archives in the BSD format used by OS X.
+
+* LLVM received a backend for the extended Berkely Packet Filter
+  instruction set that can be dynamically loaded into the Linux kernel via the
+  `bpf(2) <http://man7.org/linux/man-pages/man2/bpf.2.html>`_ syscall.
+
+  Support for BPF has been present in the kernel for some time, but starting
+  from 3.18 has been extended with such features as: 64-bit registers, 8
+  additional registers registers, conditional backwards jumps, call
+  instruction, shift instructions, map (hash table, array, etc.), 1-8 byte
+  load/store from stack, and more.
 
- During this release ...
+  Up until now, users of BPF had to write bytecode by hand, or use
+  custom generators. This release adds a proper LLVM backend target for the BPF
+  bytecode architecture.
 
+  The BPF target is now available by default, and options exist in both Clang
+  (-target bpf) or llc (-march=bpf) to pick eBPF as a backend.
+
+* Switch-case lowering was rewritten to avoid generating unbalanced search trees
+  (`PR22262 <http://llvm.org/pr22262>`_) and to exploit profile information
+  when available. Some lowering strategies are now disabled when optimizations
+  are turned off, to save compile time.
+
+* The debug info IR class hierarchy now inherits from ``Metadata`` and has its
+  own bitcode records and assembly syntax
+  (`documented in LangRef <LangRef.html#specialized-metadata-nodes>`_).  The debug
+  info verifier has been merged with the main verifier.
+
+* LLVM IR and APIs are in a period of transition to aid in the removal of
+  pointer types (the end goal being that pointers are typeless/opaque - void*,
+  if you will). Some APIs and IR constructs have been modified to take
+  explicit types that are currently checked to match the target type of their
+  pre-existing pointer type operands. Further changes are still needed, but the
+  more you can avoid using ``PointerType::getPointeeType``, the easier the
+  migration will be.
+
+* Argument-less ``TargetMachine::getSubtarget`` and
+  ``TargetMachine::getSubtargetImpl`` have been removed from the tree. Updating
+  out of tree ports is as simple as implementing a non-virtual version in the
+  target, but implementing full ``Function`` based ``TargetSubtargetInfo``
+  support is recommended.
+
+* This is expected to be the last major release of LLVM that supports being
+  run on Windows XP and Windows Vista.  For the next major release the minimum
+  Windows version requirement will be Windows 7.
 
 Changes to the MIPS Target
 --------------------------
 
- During this release ...
+During this release the MIPS target has:
+
+* Added support for MIPS32R3, MIPS32R5, MIPS32R3, MIPS32R5, and microMIPS32.
+
+* Added support for dynamic stack realignment. This is of particular importance
+  to MSA on 32-bit subtargets since vectors always exceed the stack alignment on
+  the O32 ABI.
+
+* Added support for compiler-rt including:
+
+  * Support for the Address, and Undefined Behaviour Sanitizers for all MIPS
+    subtargets.
+
+  * Support for the Data Flow, and Memory Sanitizer for 64-bit subtargets.
+
+  * Support for the Profiler for all MIPS subtargets.
+
+* Added support for libcxx, and libcxxabi.
+
+* Improved inline assembly support such that memory constraints may now make use
+  of the appropriate address offsets available to the instructions. Also, added
+  support for the ``ZC`` constraint.
+
+* Added support for 128-bit integers on 64-bit subtargets and 16-bit floating
+  point conversions on all subtargets.
+
+* Added support for read-only ``.eh_frame`` sections by storing type information
+  indirectly.
+
+* Added support for MCJIT on all 64-bit subtargets as well as MIPS32R6.
+
+* Added support for fast instruction selection on MIPS32 and MIPS32R2 with PIC.
+
+* Various bug fixes. Including the following notable fixes:
 
+  * Fixed 'jumpy' debug line info around calls where calculation of the address
+    of the function would inappropriately change the line number.
+
+  * Fixed missing ``__mips_isa_rev`` macro on the MIPS32R6 and MIPS32R6
+    subtargets.
+
+  * Fixed representation of NaN when targeting systems using traditional
+    encodings. Traditionally, MIPS has used NaN encodings that were compatible
+    with IEEE754-1985 but would later be found incompatible with IEEE754-2008.
+
+  * Fixed multiple segfaults and assertions in the disassembler when
+    disassembling instructions that have memory operands.
+
+  * Fixed multiple cases of suboptimal code generation involving $zero.
+
+  * Fixed code generation of 128-bit shifts on 64-bit subtargets.
+
+  * Prevented the delay slot filler from filling call delay slots with
+    instructions that modify or use $ra.
+
+  * Fixed some remaining N32/N64 calling convention bugs when using small
+    structures on big-endian subtargets.
+
+  * Fixed missing sign-extensions that are required by the N32/N64 calling
+    convention when generating calls to library functions with 32-bit
+    parameters.
+
+  * Corrected the ``int64_t`` typedef to be ``long`` for N64.
+
+  * ``-mno-odd-spreg`` is now honoured for vector insertion/extraction
+    operations when using -mmsa.
+
+  * Fixed vector insertion and extraction for MSA on 64-bit subtargets.
+
+  * Corrected the representation of member function pointers. This makes them
+    usable on microMIPS subtargets.
 
 Changes to the PowerPC Target
 -----------------------------
 
- During this release ...
+There are numerous improvements to the PowerPC target in this release:
+
+* LLVM now supports the ISA 2.07B (POWER8) instruction set, including
+  direct moves between general registers and vector registers, and
+  built-in support for hardware transactional memory (HTM).  Some missing
+  instructions from ISA 2.06 (POWER7) were also added.
+
+* Code generation for the local-dynamic and global-dynamic thread-local
+  storage models has been improved.
+
+* Loops may be restructured to leverage pre-increment loads and stores.
+
+* QPX - The vector instruction set used by the IBM Blue Gene/Q supercomputers
+  is now supported.
+
+* Loads from the TOC area are now correctly treated as invariant.
+
+* PowerPC now has support for i128 and v1i128 types.  The types differ
+  in how they are passed in registers for the ELFv2 ABI.
+
+* Disassembly will now print shorter mnemonic aliases when available.
+
+* Optional register name prefixes for VSX and QPX registers are now
+  supported in the assembly parser.
+
+* The back end now contains a pass to remove unnecessary vector swaps
+  from POWER8 little-endian code generation.  Additional improvements
+  are planned for release 3.8.
+
+* The undefined-behavior sanitizer (UBSan) is now supported for PowerPC.
+
+* Many new vector programming APIs have been added to altivec.h.
+  Additional ones are planned for release 3.8.
+
+* PowerPC now supports __builtin_call_with_static_chain.
+
+* PowerPC now supports the revised -mrecip option that permits finer
+  control over reciprocal estimates.
 
+* Many bugs have been identified and fixed.
 
-Changes to the OCaml bindings
+Changes to the SystemZ Target
 -----------------------------
 
- During this release ...
+* LLVM no longer attempts to automatically detect the current host CPU when
+  invoked natively.
 
+* Support for all thread-local storage models. (Previous releases would support
+  only the local-exec TLS model.)
+
+* The POPCNT instruction is now used on z196 and above.
+
+* The RISBGN instruction is now used on zEC12 and above.
+
+* Support for the transactional-execution facility on zEC12 and above.
+
+* Support for the z13 processor and its vector facility.
+
+
+Changes to the JIT APIs
+-----------------------
+
+* Added a new C++ JIT API called On Request Compilation, or ORC.
+
+  ORC is a new JIT API inspired by MCJIT but designed to be more testable, and
+  easier to extend with new features. A key new feature already in tree is lazy,
+  function-at-a-time compilation for X86. Also included is a reimplementation of
+  MCJIT's API and behavior (OrcMCJITReplacement). MCJIT itself remains in tree,
+  and continues to be the default JIT ExecutionEngine, though new users are
+  encouraged to try ORC out for their projects. (A good place to start is the
+  new ORC tutorials under llvm/examples/kaleidoscope/orc).
+
+Sub-project Status Update
+=========================
+
+In addition to the core LLVM 3.7 distribution of production-quality compiler
+infrastructure, the LLVM project includes sub-projects that use the LLVM core
+and share the same distribution license. This section provides updates on these
+sub-projects.
+
+Polly - The Polyhedral Loop Optimizer in LLVM
+---------------------------------------------
+
+`Polly <http://polly.llvm.org>`_ is a polyhedral loop optimization
+infrastructure that provides data-locality optimizations to LLVM-based
+compilers. When compiled as part of clang or loaded as a module into clang,
+it can perform loop optimizations such as tiling, loop fusion or outer-loop
+vectorization. As a generic loop optimization infrastructure it allows
+developers to get a per-loop-iteration model of a loop nest on which detailed
+analysis and transformations can be performed.
+
+Changes since the last release:
+
+* isl imported into Polly distribution
+
+  `isl <http://repo.or.cz/w/isl.git>`_, the math library Polly uses, has been
+  imported into the source code repository of Polly and is now distributed as part
+  of Polly. As this was the last external library dependency of Polly, Polly can
+  now be compiled right after checking out the Polly source code without the need
+  for any additional libraries to be pre-installed.
+
+* Small integer optimization of isl
+
+  The MIT licensed imath backend using in `isl <http://repo.or.cz/w/isl.git>`_ for
+  arbitrary width integer computations has been optimized to use native integer
+  operations for the common case where the operands of a computation fit into 32
+  bit and to only fall back to large arbitrary precision integers for the
+  remaining cases. This optimization has greatly improved the compile-time
+  performance of Polly, both due to faster native operations also due to a
+  reduction in malloc traffic and pointer indirections. As a result, computations
+  that use arbitrary precision integers heavily have been speed up by almost 6x.
+  As a result, the compile-time of Polly on the Polybench test kernels in the LNT
+  suite has been reduced by 20% on average with compile time reductions between
+  9-43%.
+
+* Schedule Trees
+
+  Polly now uses internally so-called > Schedule Trees < to model the loop
+  structure it optimizes. Schedule trees are an easy to understand tree structure
+  that describes a loop nest using integer constraint sets to keep track of
+  execution constraints. It allows the developer to use per-tree-node operations
+  to modify the loop tree. Programatic analysis that work on the schedule tree
+  (e.g., as dependence analysis) also show a visible speedup as they can exploit
+  the tree structure of the schedule and need to fall back to ILP based
+  optimization problems less often. Section 6 of `Polyhedral AST generation is
+  more than scanning polyhedra
+  <http://www.grosser.es/#pub-polyhedral-AST-generation>`_ gives a detailed
+  explanation of this schedule trees.
+
+* Scalar and PHI node modeling - Polly as an analysis
+
+  Polly now requires almost no preprocessing to analyse LLVM-IR, which makes it
+  easier to use Polly as a pure analysis pass e.g. to provide more precise
+  dependence information to non-polyhedral transformation passes. Originally,
+  Polly required the input LLVM-IR to be preprocessed such that all scalar and
+  PHI-node dependences are translated to in-memory operations. Since this release,
+  Polly has full support for scalar and PHI node dependences and requires no
+  scalar-to-memory translation for such kind of dependences.
+
+* Modeling of modulo and non-affine conditions
+
+  Polly can now supports modulo operations such as A[t%2][i][j] as they appear
+  often in stencil computations and also allows data-dependent conditional
+  branches as they result e.g. from ternary conditions ala A[i] > 255 ? 255 :
+  A[i].
+
+* Delinearization
+
+  Polly now support the analysis of manually linearized multi-dimensional arrays
+  as they result form macros such as
+  "#define 2DARRAY(A,i,j) (A.data[(i) * A.size + (j)]". Similar constructs appear
+  in old C code written before C99, C++ code such as boost::ublas, LLVM exported
+  from Julia, Matlab generated code and many others. Our work titled
+  `Optimistic Delinearization of Parametrically Sized Arrays
+  <http://www.grosser.es/#pub-optimistic-delinerization>`_ gives details.
+
+* Compile time improvements
+
+  Pratik Bahtu worked on compile-time performance tuning of Polly. His work
+  together with the support for schedule trees and the small integer optimization
+  in isl notably reduced the compile time.
+
+* Increased compute timeouts
+
+  As Polly's compile time has been notabily improved, we were able to increase
+  the compile time saveguards in Polly. As a result, the default configuration
+  of Polly can now analyze larger loop nests without running into compile time
+  restrictions.
+
+* Export Debug Locations via JSCoP file
+
+  Polly's JSCoP import/export format gained support for debug locations that show
+  to the user the source code location of detected scops.
+
+* Improved windows support
+
+  The compilation of Polly on windows using cmake has been improved and several
+  visual studio build issues have been addressed.
+
+* Many bug fixes
+
+libunwind
+---------
+
+The unwind implementation which use to reside in `libc++abi` has been moved into
+a separate repository.  This implementation can still be used for `libc++abi` by
+specifying `-DLIBCXXABI_USE_LLVM_UNWINDER=YES` and
+`-DLIBCXXABI_LIBUNWIND_PATH=<path to libunwind source>` when configuring
+`libc++abi`, which defaults to `true` when building on ARM.
+
+The new repository can also be built standalone if just `libunwind` is desired.
 
 External Open Source Projects Using LLVM 3.7
 ============================================
@@ -96,7 +390,74 @@ An exciting aspect of LLVM is that it is used as an enabling technology for
 a lot of other language and tools projects. This section lists some of the
 projects that have already been updated to work with LLVM 3.7.
 
-* A project
+
+LDC - the LLVM-based D compiler
+-------------------------------
+
+`D <http://dlang.org>`_ is a language with C-like syntax and static typing. It
+pragmatically combines efficiency, control, and modeling power, with safety and
+programmer productivity. D supports powerful concepts like Compile-Time Function
+Execution (CTFE) and Template Meta-Programming, provides an innovative approach
+to concurrency and offers many classical paradigms.
+
+`LDC <http://wiki.dlang.org/LDC>`_ uses the frontend from the reference compiler
+combined with LLVM as backend to produce efficient native code. LDC targets
+x86/x86_64 systems like Linux, OS X, FreeBSD and Windows and also Linux on
+PowerPC (32/64 bit). Ports to other architectures like ARM, AArch64 and MIPS64
+are underway.
+
+Portable Computing Language (pocl)
+----------------------------------
+
+In addition to producing an easily portable open source OpenCL
+implementation, another major goal of `pocl <http://portablecl.org/>`_
+is improving performance portability of OpenCL programs with
+compiler optimizations, reducing the need for target-dependent manual
+optimizations. An important part of pocl is a set of LLVM passes used to
+statically parallelize multiple work-items with the kernel compiler, even in
+the presence of work-group barriers.
+
+
+TTA-based Co-design Environment (TCE)
+-------------------------------------
+
+`TCE <http://tce.cs.tut.fi/>`_ is a toolset for designing customized
+exposed datapath processors based on the Transport triggered
+architecture (TTA).
+
+The toolset provides a complete co-design flow from C/C++
+programs down to synthesizable VHDL/Verilog and parallel program binaries.
+Processor customization points include the register files, function units,
+supported operations, and the interconnection network.
+
+TCE uses Clang and LLVM for C/C++/OpenCL C language support, target independent
+optimizations and also for parts of code generation. It generates
+new LLVM-based code generators "on the fly" for the designed processors and
+loads them in to the compiler backend as runtime libraries to avoid
+per-target recompilation of larger parts of the compiler chain.
+
+BPF Compiler Collection (BCC)
+-----------------------------
+`BCC <https://github.com/iovisor/bcc>`_ is a Python + C framework for tracing and
+networking that is using Clang rewriter + 2nd pass of Clang + BPF backend to
+generate eBPF and push it into the kernel.
+
+LLVMSharp & ClangSharp
+----------------------
+
+`LLVMSharp <http://www.llvmsharp.org>`_ and
+`ClangSharp <http://www.clangsharp.org>`_ are type-safe C# bindings for
+Microsoft.NET and Mono that Platform Invoke into the native libraries.
+ClangSharp is self-hosted and is used to generated LLVMSharp using the
+LLVM-C API.
+
+`LLVMSharp Kaleidoscope Tutorials <http://www.llvmsharp.org/Kaleidoscope/>`_
+are instructive examples of writing a compiler in C#, with certain improvements
+like using the visitor pattern to generate LLVM IR.
+
+`ClangSharp PInvoke Generator <http://www.clangsharp.org/PInvoke/>`_ is the
+self-hosting mechanism for LLVM/ClangSharp and is demonstrative of using
+LibClang to generate Platform Invoke (PInvoke) signatures for C APIs.
 
 
 Additional Information
@@ -111,4 +472,3 @@ going into the ``llvm/docs/`` directory in the LLVM tree.
 
 If you have any questions or comments about LLVM, please feel free to contact
 us via the `mailing lists <http://llvm.org/docs/#maillist>`_.
-