Intel Math Kernel Library

mkl-large-product-plain

The fastest and most used math library for Intel and compatible processors**

  • Vectorized and threaded for highest performance using de facto standard APIs for simple code integration
  • C, C++ and Fortran compilers – compatible with royalty-free licensing for low cost deployment

Download as part of Intel Parallel Studio XE

Overview

Performance: Ready to Use

Intel® Math Kernel Library (Intel® MKL) includes a wealth of math processing routines to accelerate application performance and reduce development time. Intel® MKL includes highly vectorized and threaded Linear Algebra, Fast Fourier Transforms (FFT), Vector Math and Statistics functions. The easiest way to take advantage of all of that processing power is to use a carefully optimized computing math library. Even the best compiler can’t compete with the level of performance possible from a hand-optimized library. If your application already relies on the BLAS or LAPACK functionality, simply re-link with Intel® MKL to get better performance on Intel and compatible architectures.

Because Intel has done the engineering on these ready-to-use, royalty-free functions, there will be more time to add features which customers request. Using Intel MKL can save development, debug and maintenance time in the long run because today’s code will run optimally on future generations of Intel processors with minimal effort.

PSXE-performance-ready-to-use

 


Quotes

“I’m a C++ and Fortran developer and have high praise for the Intel® Math Kernel Library. One nice feature I’d like to stress is the bitwise reproducibility of MKL which helps me get the assurance I need that I’m getting the same floating point results from run to run.”
Franz Bernice
CEO and Senior Developer, MSTC Modern Software Technology

“Intel MKL is indispensable for any high-performance computer user on x86 platforms.”
Prof. Jack Dongarra,
Innovative Computing Lab,
University of Tennessee, Knoxville


Features

Comprehensive Math Functionality – Covers Range of Application Needs

Intel® MKL includes highly vectorized and threaded Linear Algebra, Fast Fourier Transforms (FFT), Vector Math and Statistics functions. Through a single C or Fortran API call, these functions automatically scale across previous, current and future processor architectures by selecting the best code path for each. Cluster-based versions of LAPACK, FFT and sparse solver – are also included to support MPI-based distributed memory computing.

Standard APIs – For Immediate Performance Results

Wherever available, Intel® MKL uses de facto industry standard APIs so that minimal code changes are required to switch from another library. This makes it quick and easy to improve your application performance through simple function substitutions or relinking. Simply substituting Intel® MKL’s LAPACK (Linear Algebra PACKage), for example, can yield 500% or higher performance improvement. In addition to the industry-standard BLAS and LAPACK linear algebra APIs, Intel® MKL also supports MIT’s FFTW C interface for Fast Fourier Transforms.

MKL-11-1-LAPACK-DPOTRF-CPU-1000

Highest Performance and Scalability across Past, Present & Future Processors – Easily and Automatically

Behind a single C or Fortran API, Intel® MKL includes multiple code paths — each optimized for specific generations of Intel and compatible processors. With no code-branching required by application developers, Intel® MKL utilizes the best code path for maximum performance. New optimized code paths are added under these same APIs before future processors are released so developers just link to the newest version of Intel® MKL and their applications are ready to take full advantage of the latest processor architectures. In the case of the Intel® Many Integrated Core Architecture (Intel® MIC Architecture), Intel® MKL can automatically determine the best load balancing between the host CPU and the Intel® Xeon® Phi™ coprocessor in addition to full native optimization support.

MKLfigure-4-541_0

Flexibility to Meet Developer Requirements

Developers have many requirements to meet. Sometimes these requirements conflict and need to be balanced. Need consistent floating point results with the best application performance possible? Want faster vector math performance and don’t need maximum accuracy? Intel® MKL gives you control over the necessary tradeoffs including:

  • Results consistency vs. performance
  • Accuracy vs. performance
  • Compilers, linking and threading models
  • Languages and operating systems

Intel® MKL is also compatible with your choice of compilers, languages, operating systems, linking and threading models. One library solution across multiple environments means only one library to learn and manage.

Linear Algebra

Intel® MKL BLAS provides optimized vector-vector (Level 1), matrix-vector (Level 2) and matrix-matrix (Level 3) operations for single and double precision real and complex types.

  • Level 1 BLAS routines operate on individual vectors, e.g., compute scalar product, norm, or the sum of vectors.
  • Level 2 BLAS routines provide matrix-vector products, rank 1 and 2 updates of a matrix, and triangular system solvers.
  • Level 3 BLAS level 3 routines include matrix-matrix products, rank k matrix updates, and triangular solvers with multiple right-hand sides.

Intel® MKL LAPACK provides extremely well-tuned LU, Cholesky, and QR factorization and driver routines that can be used to solve linear systems of equations. Eigenvalue and least-squares solvers are also included, as are the latest LAPACK 3.4.1 interfaces and enhancements. Intel® MKL also includes Sparse BLAS and sparse solvers such as PARDISO and iterative sparse solvers. New in this release of Intel MKL is the Parallel Direct Sparse Solver for Clusters to solve systems of sparse matrices with millions of rows and columns.

Fast Fourier Transforms

Intel® MKL FFTs include many optimizations and should provide significant performance gains over other libraries for medium and large transform sizes. The library supports a broad variety of FFTs, from single and double precision 1D to multi-dimensional, complex-to-complex, real-to-complex, and real-to-real transforms of arbitrary length. Support for both FFTW* interfaces simplifies the porting of your FFTW-based applications.

Vector Math

Intel® MKL provides optimized vector implementations of computationally intensive core mathematical operations and functions for single and double precision real and complex types. The basic vector arithmetic operations include element-by-element summation, subtraction, multiplication, division, and conjugation as well as rounding operations such as floor, ceil, and round to the nearest integer. Additional functions include power, square root, inverse, logarithm, trigonometric, hyperbolic, (inverse) error and cumulative normal distribution, and pack/unpack. Enhanced capabilities include accuracy, denormalized number handling, and error mode controls, allowing users to customize the behavior to meet their individual needs.

Statistics

Intel® MKL includes random number generators and probability distributions that can deliver significant application performance. The functions provide the user the ability to pair Random-Number Generators such as Mersenne Twister and, Niederreiter with a variety of Probability Distributions including Uniform, Gaussian and Exponential.

Intel® MKL also provides computationally intensive core/building blocks for statistical analysis both in and out-of-core. This enables users to compute basic statistics, estimation of dependencies, data outlier detection, and missing value replacements. These features can be used to speed-up applications in computational finance, life sciences, engineering/simulations, databases, and other areas.

Data Fitting

Intel® MKL includes a rich set of splines functions for 1-dimensional interpolation. These are useful in a variety of application domains including data analytics (e.g. histograms), geometric modeling and surface approximation. Splines included are linear, quadratic, cubic, look-up, stepwise constant and user-defined.

What’s New

  • Direct Sparse Solver for Clusters extends the capabilities of Intel MKL PARDISO, enabling users to solve large distributed sparse systems of equations on clusters up to 2x faster than competing solutions.
  • Small Matrix Multiply performance improved up to 2x for small problem sizes (less than 20×20 matrices)
  • Support for the next generation Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instruction set with optimizations in BLAS, DFT and VML
  • Intel MKL cookbook provides step-by-step recipes to solve common mathematical problems using existing library functions
  • Verbose mode for BLAS and LAPACK functions for improved debugging and profiling
    • Provides detailed Intel MKL versioning information
    • Identifies the library functions called and the parameters passed to them
    • Returns the amount of time spent in each function call

Details

Linear Algebra

Intel® MKL BLAS provides optimized vector-vector (Level 1), matrix-vector (Level 2) and matrix-matrix (Level 3) operations for single and double precision real and complex types. Level 1 BLAS routines operate on individual vectors, e.g., compute scalar product, norm, or the sum of vectors. Level 2 BLAS routines provide matrix-vector products, rank 1 and 2 updates of a matrix, and triangular system solvers. Level 3 BLAS level 3 routines include matrix-matrix products, rank k matrix updates, and triangular solvers with multiple right-hand sides.

Intel® MKL LAPACK provides extremely well-tuned LU, Cholesky, and QR factorization and driver routines that can be used to solve linear systems of equations. Eigenvalue and least-squares solvers are also included, as are the latest LAPACK 3.4.1 interfaces and enhancements.

If your application already relies on the BLAS or LAPACK functionality, simply re-link with Intel® MKL to get better performance on Intel and compatible architectures.

Fast Fourier Transforms

Intel® MKL FFTs include many optimizations and should provide significant performance gains over other libraries for medium and large transform sizes. The library supports a broad variety of FFTs, from single and double precision 1D to multi-dimensional, complex-to-complex, real-to-complex, and real-to-real transforms of arbitrary length. Support for both FFTW* interfaces simplifies the porting of your FFTW-based applications.

Benchmarks

Intel Processors

Linear Algebra Performance Charts

DGEMM

Intel® Optimized SMP LINPACK

HPL LINPACK

DGEMM Performance Benchmark

Intel® Optimized SMP LINPACK Benchmark

HPL LINPACK performance benchmark

LU Factorization

Cholesky Factorization

LU Factorization Performance Benchmark

Cholesky Factorization Benchmark

FFT Performance Charts

2D and 3D FFTs on Intel® Xeon and Intel® Core Processors

Cluster FFT Performance

Cluster FFT Scalability

Cluster FFT Performance Benchmark

Cluster FFT Performance Benchmark

Cluster FFT Scalability Benchmark

Sparse BLAS and Sparse Solver Performance Charts Data Fitting Performance Charts

DCSRGEMV and DCSRMM

PARDISO Sparse Solver

Natural cubic spline construction and interpolation

DCSRGEMV and DCSRMM performance benchmark

PARDISO Sparse Solver performance benchmark

Natural cubic spline construction and interpolation Performance Benchmark

Random Number Generator Performance Charts Vector Math Performance Chart Application Benchmark Performance Chart

MCG31m1

VML exp()

Monte-Carlo option pricing performance benchmark

Random Number Generator Performance Benchmark

VML exp() Function Performance Benchmark

Monte-Carlo option pricing performance benchmark

Intel Xeon Phi Corprocessor

Linear Algebra Performance Charts

Intel® Optimized SMP LINPACK

LU Factorization

QR Factorization

DGEMM Performance Benchmark

LU Factorization Performance Benchmark

QR Factorization Performance Benchmark

HPL LINPACK

Cholesky Factorization

Matrix Multiply

HPL LINPACK

Cholesky Factorization Performance Benchmark

Matrix Multiply Performance Benchmark

Application Benchmark Performance Chart Batch 1D FFT Performance Chart Black- Scholes Chart

Monte Carlo Option Pricing

Monte Carlo Option Pricing Performance Benchmark

Batch 1D FFT Performance Chart

Black- Scholes Performance Benchmark

Video to help you get started


Using the Intel Math Kernel Library 11.0 and Compiler to Obtain Run-to-run Numerical Reproducibility

The Next Steps

Was sagen unsere Kunden über uns?

Excellent service received with queries I had when I had the trial EndNote. This was a factor which prompted me to go on to buy the product. An excellent service was received in response to a problem after the purchase of endnote.

JB, Manchester, UK

I have tested the program with my instrument. It is now working very well, and I am really very happy with it. Many thanks for all your help indeed. I am deeply impressed by your enthusiastic contributions to it.

JX, Oxford, UK

Both days were time and money well spent

PA, Glenrothes, UK

I can only say I wish all suppliers were as helpful as you.

CP, Newport, UK