Misplaced Pages

ELKI

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.

ELKI ( Environment for Developing KDD-Applications Supported by Index-Structures ) is a data mining (KDD, knowledge discovery in databases) software framework developed for use in research and teaching. It was originally created by the database systems research unit at the Ludwig Maximilian University of Munich , Germany, led by Professor Hans-Peter Kriegel . The project has continued at the Technical University of Dortmund , Germany. It aims at allowing the development and evaluation of advanced data mining algorithms and their interaction with database index structures .

#869130

64-537: The ELKI framework is written in Java and built around a modular architecture. Most currently included algorithms perform clustering , outlier detection , and database indexes . The object-oriented architecture allows the combination of arbitrary algorithms, data types, distance functions , indexes, and evaluation measures. The Java just-in-time compiler optimizes all combinations to a similar extent, making benchmarking results more comparable if they share large parts of

128-531: A database -inspired core, which uses a vertical data layout that stores data in column groups (similar to column families in NoSQL databases ). This database core provides nearest neighbor search , range/radius search, and distance query functionality with index acceleration for a wide range of dissimilarity measures . Algorithms based on such queries (e.g. k-nearest-neighbor algorithm , local outlier factor and DBSCAN ) can be implemented easily and benefit from

192-499: A programming language ( built-in types ). Data types which are not primitive are referred to as derived or composite . Primitive types are almost always value types , but composite types may also be value types. The most common primitive types are those used and supported by computer hardware, such as integers of various sizes, floating-point numbers, and Boolean logical values. Operations on such types are usually quite efficient. Primitive data types which are native to

256-461: A built-in type and true and false as reserved words. The Java virtual machine's set of primitive data types consists of: The set of basic C data types is similar to Java's. Minimally, there are four types, char , int , float , and double , but the qualifiers short , long , signed , and unsigned mean that C contains numerous target-dependent integer and floating-point primitive types. C99 extended this set by adding

320-459: A format equivalent to scientific notation , typically in binary but sometimes in decimal . Because floating-point numbers have limited precision, only a subset of real or rational numbers are exactly representable; other numbers can be represented only approximately. Many languages have both a single precision (often called float ) and a double precision type (often called double ). A Boolean type , typically denoted bool or boolean ,

384-416: A hindrance to an integration in commercial products; nevertheless it can be used to evaluate algorithms prior to developing an own implementation for a commercial product. Furthermore, the application of the algorithms requires knowledge about their usage, parameters, and study of original literature. The audience is students , researchers , data scientists , and software engineers . ELKI is modeled around

448-717: A lawsuit against Google shortly after that for using Java inside the Android SDK (see the Android section). On April 2, 2010, James Gosling resigned from Oracle . In January 2016, Oracle announced that Java run-time environments based on JDK 9 will discontinue the browser plugin. Java software runs on everything from laptops to data centers , game consoles to scientific supercomputers . Oracle (and others) highly recommend uninstalling outdated and unsupported versions of Java, due to unresolved security issues in older versions. There were five primary goals in creating

512-481: A length of one are normally used to represent single characters. Some languages have character types that are too small to represent all Unicode characters. These are more properly categorized as integer types that have been given a misleading name. For example C includes a char type, but it is defined to be the smallest addressable unit of memory, which several standards (such as POSIX ) require to be 8 bits . Recent versions of these standards refer to char as

576-482: A number of other standard servlet classes available, for example for WebSocket communication. The Java servlet API has to some extent been superseded (but still used under the hood) by two standard Java technologies for web services: Typical implementations of these APIs on Application Servers or Servlet Containers use a standard servlet for handling all interactions with the HTTP requests and responses that delegate to

640-418: A set of basic data types from which all other data types are constructed. Specifically it often refers to the limited set of data representations in use by a particular processor , which all compiled programs must use. Most processors support a similar set of primitive data types, although the specific representations vary. More generally, primitive data types may refer to the standard data types built into

704-440: A small portion of code to which Sun did not hold the copyright. Sun's vice-president Rich Green said that Sun's ideal role with regard to Java was as an evangelist . Following Oracle Corporation 's acquisition of Sun Microsystems in 2009–10, Oracle has described itself as the steward of Java technology with a relentless commitment to fostering a community of participation and transparency. This did not prevent Oracle from filing

SECTION 10

#1732802433870

768-608: A subject of controversy during the 2010s. The class library contains features such as: Javadoc is a comprehensive documentation system, created by Sun Microsystems . It provides developers with an organized system for documenting their code. Javadoc comments have an extra asterisk at the beginning, i.e. the delimiters are /** and */ , whereas the normal multi-line comments in Java are delimited by /* and */ , and single-line comments start with // . Primitive data type In computer science , primitive data types are

832-428: A subsample of the data is visualized by default). Version 0.4, presented at the "Symposium on Spatial and Temporal Databases" 2011, which included various methods for spatial outlier detection, won the conference's "best demonstration paper award". Select included algorithms: Version 0.1 (July 2008) contained several Algorithms from cluster analysis and anomaly detection , as well as some index structures such as

896-491: Is a high-level , class-based , object-oriented programming language that is designed to have as few implementation dependencies as possible. It is a general-purpose programming language intended to let programmers write once, run anywhere ( WORA ), meaning that compiled Java code can run on all platforms that support Java without the need to recompile. Java applications are typically compiled to bytecode that can run on any Java virtual machine (JVM) regardless of

960-458: Is a type that can represent all Unicode characters , hence must be at least 21 bits wide. Some languages such as Julia include a true 32-bit Unicode character type as primitive. Other languages such as JavaScript , Python , Ruby , and many dialects of BASIC do not have a primitive character type but instead add strings as a primitive data type, typically using the UTF-8 encoding. Strings with

1024-576: Is actually two compilers in one; and with GraalVM (included in e.g. Java 11, but removed as of Java 16) allowing tiered compilation . Java itself is platform-independent and is adapted to the particular platform it is to run on by a Java virtual machine (JVM), which translates the Java bytecode into the platform's machine language. Programs written in Java have a reputation for being slower and requiring more memory than those written in C++ . However, Java programs' execution speed improved significantly with

1088-616: Is developed for use in teaching and research . The source code is written with extensibility and reusability in mind, but is also optimized for performance. The experimental evaluation of algorithms depends on many environmental factors and implementation details can have a large impact on the runtime. ELKI aims at providing a shared codebase with comparable implementations of many algorithms. As research project, it currently does not offer integration with business intelligence applications or an interface to common database management systems via SQL . The copyleft ( AGPL ) license may also be

1152-428: Is faster using SIMD operations and data types to operate on an array of floats. An integer data type represents some range of mathematical integers. Integers may be either signed (allowing negative values) or unsigned ( non-negative integers only). Common ranges are: A floating-point number represents a limited-precision rational number that may have a fractional part. These numbers are stored internally in

1216-409: Is implicitly allocated on the stack or explicitly allocated and deallocated from the heap . In the latter case, the responsibility of managing memory resides with the programmer. If the program does not deallocate an object, a memory leak occurs. If the program attempts to access or deallocate memory that has already been deallocated, the result is undefined and difficult to predict, and the program

1280-626: Is insufficient free memory on the heap to allocate a new object; this can cause a program to stall momentarily. Explicit memory management is not possible in Java. Java does not support C/C++ style pointer arithmetic , where object addresses can be arithmetically manipulated (e.g. by adding or subtracting an offset). This allows the garbage collector to relocate referenced objects and ensures type safety and security. As in C++ and some other object-oriented languages, variables of Java's primitive data types are either stored directly in fields (for objects) or on

1344-399: Is likely to become unstable or crash. This can be partially remedied by the use of smart pointers , but these add overhead and complexity. Garbage collection does not prevent logical memory leaks, i.e. those where the memory is still referenced but never used. Garbage collection may happen at any time. Ideally, it will occur when a program is idle. It is guaranteed to be triggered if there

SECTION 20

#1732802433870

1408-420: Is no longer needed, typically when objects that are no longer needed are stored in containers that are still in use. If methods for a non-existent object are called, a null pointer exception is thrown. One of the ideas behind Java's automatic memory management model is that programmers can be spared the burden of having to perform manual memory management. In some languages, memory for the creation of objects

1472-538: Is supported for interfaces . Java uses comments similar to those of C++. There are three different styles of comments: a single line style marked with two slashes ( // ), a multiple line style opened with /* and closed with */ , and the Javadoc commenting style opened with /** and closed with */ . The Javadoc style of commenting allows the user to run the Javadoc executable to create documentation for

1536-451: Is the latest version (Java 22, and 20 are no longer maintained). Java 8, 11, 17, and 21 are previous LTS versions still officially supported. James Gosling , Mike Sheridan, and Patrick Naughton initiated the Java language project in June 1991. Java was originally designed for interactive television, but it was too advanced for the digital cable television industry at the time. The language

1600-594: Is typically a logical type that can have either the value true or the value false . Although only one bit is necessary to accommodate the value set true and false , programming languages typically implement Boolean types as one or more bytes. Many languages (e.g. Java , Pascal and Ada ) implement Booleans adhering to the concept of Boolean as a distinct logical type. Some languages, though, may implicitly convert Booleans to numeric types at times to give extended semantics to Booleans and Boolean expressions or to achieve backwards compatibility with earlier versions of

1664-431: Is written inside classes, and every data item is an object, with the exception of the primitive data types, (i.e. integers, floating-point numbers, boolean values , and characters), which are not objects for performance reasons. Java reuses some popular aspects of C++ (such as the printf method). Unlike C++, Java does not support operator overloading or multiple inheritance for classes, though multiple inheritance

1728-593: The ConcurrentMaps and other multi-core collections, and it was improved further with Java 1.6. Some platforms offer direct hardware support for Java; there are micro controllers that can run Java bytecode in hardware instead of a software Java virtual machine, and some ARM -based processors could have hardware support for executing Java bytecode through their Jazelle option, though support has mostly been dropped in current implementations of ARM. Java uses an automatic garbage collector to manage memory in

1792-571: The R*-tree . The focus of the first release was on subspace clustering and correlation clustering algorithms. Version 0.2 (July 2009) added functionality for time series analysis , in particular distance functions for time series. Version 0.3 (March 2010) extended the choice of anomaly detection algorithms and visualization modules. Version 0.4 (September 2011) added algorithms for geo data mining and support for multi-relational database and index structures. Version 0.5 (April 2012) focuses on

1856-418: The object lifecycle . The programmer determines when objects are created, and the Java runtime is responsible for recovering the memory once objects are no longer in use. Once no references to an object remain, the unreachable memory becomes eligible to be freed automatically by the garbage collector. Something similar to a memory leak may still occur if a programmer's code holds a reference to an object that

1920-516: The stack (for methods) rather than on the heap, as is commonly true for non-primitive data types (but see escape analysis ). This was a conscious decision by Java's designers for performance reasons. Java contains multiple types of garbage collectors. Since Java 9, HotSpot uses the Garbage First Garbage Collector (G1GC) as the default. However, there are also several other garbage collectors that can be used to manage

1984-591: The Boolean type _Bool and allowing the modifier long to be used twice in combination with int (e.g. long long int ). The XML Schema Definition language provides a set of 19 primitive data types: In JavaScript, there are 7 primitive data types: string, number, bigint, boolean, symbol, undefined, and null. Their values are considered immutable . These are not objects and have no methods or properties ; however, all primitives except undefined and null have object wrappers . In Visual Basic .NET ,

ELKI - Misplaced Pages Continue

2048-834: The Java language, as part of J2SE 5.0. Prior to the introduction of generics, each variable declaration had to be of a specific type. For container classes, for example, this is a problem because there is no easy way to create a container that accepts only specific types of objects. Either the container operates on all subtypes of a class or interface, usually Object , or a different container class has to be created for each contained class. Generics allow compile-time type checking without having to create many container classes, each containing almost identical code. In addition to enabling more efficient code, certain runtime exceptions are prevented from occurring, by issuing compile-time errors. If Java prevented all runtime type errors ( ClassCastException s) from occurring, it would be type safe . In 2016,

2112-950: The Java language: As of November 2024 , Java 8, 11, 17, and 21 are supported as long-term support (LTS) versions, with Java 25, releasing in September 2025, as the next scheduled LTS version. Oracle released the last zero-cost public update for the legacy version Java 8 LTS in January 2019 for commercial use, although it will otherwise still support Java 8 with public updates for personal use indefinitely. Other vendors such as Adoptium continue to offer free builds of OpenJDK's long-term support (LTS) versions. These builds may include additional security patches and bug fixes. Major release versions of Java, along with their release dates: Sun has defined and supports four editions of Java targeting different application environments and segmented many of its APIs so that they belong to one of

2176-440: The Java platform must run similarly on any combination of hardware and operating system with adequate run time support. This is achieved by compiling the Java language code to an intermediate representation called Java bytecode , instead of directly to architecture-specific machine code . Java bytecode instructions are analogous to machine code, but they are intended to be executed by a virtual machine (VM) written specifically for

2240-635: The ability to run Java applets within web pages, and Java quickly became popular. The Java 1.0 compiler was re-written in Java by Arthur van Hoff to comply strictly with the Java 1.0 language specification. With the advent of Java 2 (released initially as J2SE 1.2 in December 1998 – 1999), new versions had multiple configurations built for different types of platforms. J2EE included technologies and APIs for enterprise applications typically run in server environments, while J2ME featured APIs optimized for mobile applications. The desktop version

2304-751: The code. When developing new algorithms or index structures, the existing components can be easily reused, and the type safety of Java detects many programming errors at compile time. ELKI is a free tool for analyzing data, mainly focusing on finding patterns and unusual data points without needing labels. It's written in Java and aims to be fast and able to handle big datasets by using special structures. It's made for researchers and students to add their own methods and compare different algorithms easily. ELKI has been used in data science to cluster sperm whale codas, for phoneme clustering, for anomaly detection in spaceflight operations, for bike sharing redistribution, and traffic prediction. The university project

2368-748: The evaluation of cluster analysis results, adding new visualizations and some new algorithms. Version 0.6 (June 2013) introduces a new 3D adaption of parallel coordinates for data visualization, apart from the usual additions of algorithms and index structures. Version 0.7 (August 2015) adds support for uncertain data types, and algorithms for the analysis of uncertain data. Version 0.7.5 (February 2019) adds additional clustering algorithms, anomaly detection algorithms, evaluation measures, and indexing structures. Version 0.8 (October 2022) adds automatic index creation, garbage collection, and incremental priority search, as well as many more algorithms such as BIRCH . Java (programming language) Java

2432-491: The existing code. This includes the possibility of defining a custom distance function and using existing indexes for acceleration. ELKI uses a service loader architecture to allow publishing extensions as separate jar files . ELKI uses optimized collections for performance rather than the standard Java API. For loops for example are written similar to C++ iterators : In contrast to typical Java iterators (which can only iterate over objects), this conserves memory, because

2496-586: The generated servlet creates the response. Swing is a graphical user interface library for the Java SE platform. It is possible to specify a different look and feel through the pluggable look and feel system of Swing. Clones of Windows , GTK+ , and Motif are supplied by Sun. Apple also provides an Aqua look and feel for macOS . Where prior implementations of these looks and feels may have been considered lacking, Swing in Java SE 6 addresses this problem by using more native GUI widget drawing routines of

2560-596: The heap, such as the Z Garbage Collector (ZGC) introduced in Java 11, and Shenandoah GC, introduced in Java 12 but unavailable in Oracle-produced OpenJDK builds. Shenandoah is instead available in third-party builds of OpenJDK, such as Eclipse Temurin . For most applications in Java, G1GC is sufficient. In prior versions of Java, such as Java 8, the Parallel Garbage Collector was used as the default garbage collector. Having solved

2624-658: The host hardware. End-users commonly use a Java Runtime Environment (JRE) installed on their device for standalone Java applications or a web browser for Java applets . Standard libraries provide a generic way to access host-specific features such as graphics, threading , and networking . The use of universal bytecode makes porting simple. However, the overhead of interpreting bytecode into machine instructions made interpreted programs almost always run more slowly than native executables . Just-in-time (JIT) compilers that compile byte-codes to machine code during runtime were introduced from an early stage. Java's Hotspot compiler

ELKI - Misplaced Pages Continue

2688-556: The implementation of floating-point arithmetic, and a history of security vulnerabilities in the primary Java VM implementation HotSpot . Developers have criticized the complexity and verbosity of the Java Persistence API (JPA), a standard part of Java EE. This has led to increased adoption of higher-level abstractions like Spring Data JPA, which aims to simplify database operations and reduce boilerplate code. The growing popularity of such frameworks suggests limitations in

2752-432: The index acceleration. The database core also provides fast and memory efficient collections for object collections and associative structures such as nearest neighbor lists. ELKI makes extensive use of Java interfaces, so that it can be extended easily in many places. For example, custom data types, distance functions, index structures, algorithms, input parsers, and output modules can be added and combined without modifying

2816-537: The introduction of just-in-time compilation in 1997/1998 for Java 1.1 , the addition of language features supporting better code analysis (such as inner classes, the StringBuilder class, optional assertions, etc.), and optimizations in the Java virtual machine, such as HotSpot becoming Sun's default JVM in 2000. With Java 1.5, the performance was improved with the addition of the java.util.concurrent package, including lock-free implementations of

2880-470: The iterator can internally use primitive values for data storage. The reduced garbage collection improves the runtime. Optimized collections libraries such as GNU Trove3 , Koloboke , and fastutil employ similar optimizations. ELKI includes data structures such as object collections and heaps (for, e.g., nearest neighbor search ) using such optimizations. The visualization module uses SVG for scalable graphics output, and Apache Batik for rendering of

2944-467: The language. For example, early versions of the C programming language that followed ANSI C and its former standards did not have a dedicated Boolean type. Instead, numeric values of zero are interpreted as false , and any other value is interpreted as true . The newer C99 added a distinct Boolean type _Bool (the more intuitive name bool as well as the macros true and false can be included with stdbool.h ), and C++ supports bool as

3008-455: The memory management problem does not relieve the programmer of the burden of handling properly other kinds of resources, like network or database connections, file handles, etc., especially in the presence of exceptions. The syntax of Java is largely influenced by C++ and C . Unlike C++, which combines the syntax for structured, generic, and object-oriented programming, Java was built almost exclusively as an object-oriented language. All code

3072-496: The platforms. The platforms are: The classes in the Java APIs are organized into separate groups called packages . Each package contains a set of related interfaces , classes, subpackages and exceptions . Sun also provided an edition called Personal Java that has been superseded by later, standards-based Java ME configuration-profile pairings. One design goal of Java is portability , which means that programs written for

3136-498: The primitive data types consist of 4 integral types, 2 floating-point types, a 16-byte decimal type, a Boolean type, a date/time type, a Unicode character type, and a Unicode string type. Rust has primitive unsigned and signed fixed width integers in the format u or i respectively followed by any bit width that is a power of two between 8 and 128 giving the types u8 , u16 , u32 , u64 , u128 , i8 , i16 , i32 , i64 and i128 . Also available are

3200-420: The processor have a one-to-one correspondence with objects in the computer's memory, and operations on these types are often the fastest possible in most cases. Integer addition, for example, can be performed as a single machine instruction, and some offer specific instructions to process sequences of characters with a single instruction. But the choice of primitive data type may affect performance, for example it

3264-451: The program and can be read by some integrated development environments (IDEs) such as Eclipse to allow developers to access documentation within the IDE. The following is a simple example of a "Hello, World!" program that writes a message to the standard output : Java applets are programs embedded in other applications, mainly in web pages displayed in web browsers. The Java applet API

SECTION 50

#1732802433870

3328-527: The selling of licenses for specialized products such as the Java Enterprise System. On November 13, 2006, Sun released much of its Java virtual machine (JVM) as free and open-source software (FOSS), under the terms of the GPL-2.0-only license. On May 8, 2007, Sun finished the process, making all of its JVM's core code available under free software /open-source distribution terms, aside from

3392-555: The specifications of the Java Community Process , Sun had relicensed most of its Java technologies under the GPL-2.0-only license. Oracle offers its own HotSpot Java Virtual Machine, however the official reference implementation is the OpenJDK JVM which is free open-source software and used by most developers and is the default JVM for almost all Linux distributions. As of September 2024 , Java 23

3456-468: The standard JPA implementation's ease-of-use for modern Java development. The Java Class Library is the standard library , developed to support application development in Java. It is controlled by Oracle in cooperation with others through the Java Community Process program. Companies or individuals participating in this process can influence the design and development of the APIs. This process has been

3520-409: The type system of Java was proven unsound in that it is possible to use generics to construct classes and methods that allow assignment of an instance one class to a variable of another unrelated class. Such code is accepted by the compiler, but fails at run time with a class cast exception. Criticisms directed at Java include the implementation of generics, speed, the handling of unsigned numbers,

3584-545: The types usize and isize which are unsigned and signed integers that are the same bit width as a reference with the usize type being used for indices into arrays and indexable collection types. Rust also has: Built-in types are distinguished from others by having specific support in the compiler or runtime, to the extent that it would not be possible to simply define them in a header file or standard library module. Besides integers, floating-point numbers, and Booleans, other built-in types include: A character type

3648-434: The underlying computer architecture . The syntax of Java is similar to C and C++ , but has fewer low-level facilities than either of them. The Java runtime provides dynamic capabilities (such as reflection and runtime code modification) that are typically not available in traditional compiled languages. Java gained popularity shortly after its release, and has been a very popular programming language since then. Java

3712-571: The underlying platforms. JavaFX is a software platform for creating and delivering desktop applications , as well as rich web applications that can run across a wide variety of devices. JavaFX is intended to replace Swing as the standard GUI library for Java SE , but since JDK 11 JavaFX has not been in the core JDK and instead in a separate module. JavaFX has support for desktop computers and web browsers on Microsoft Windows , Linux , and macOS . JavaFX does not have support for native OS look and feels. In 2004, generics were added to

3776-482: The user interface as well as lossless export into PostScript and PDF for easy inclusion in scientific publications in LaTeX . Exported files can be edited with SVG editors such as Inkscape . Since cascading style sheets are used, the graphics design can be restyled easily. Unfortunately, Batik is rather slow and memory intensive, so the visualizations are not very scalable to large data sets (for larger data sets, only

3840-417: The web service methods for the actual business logic. JavaServer Pages ( JSP ) are server-side Java EE components that generate responses, typically HTML pages, to HTTP requests from clients . JSPs embed Java code in an HTML page by using the special delimiters <% and %> . A JSP is compiled to a Java servlet , a Java application in its own right, the first time it is accessed. After that,

3904-440: Was deprecated with the release of Java 9 in 2017. Java servlet technology provides Web developers with a simple, consistent mechanism for extending the functionality of a Web server and for accessing existing business systems. Servlets are server-side Java EE components that generate responses to requests from clients . Most of the time, this means generating HTML pages in response to HTTP requests, although there are

SECTION 60

#1732802433870

3968-665: Was initially called Oak after an oak tree that stood outside Gosling's office. Later the project went by the name Green and was finally renamed Java , from Java coffee , a type of coffee from Indonesia . Gosling designed Java with a C / C++ -style syntax that system and application programmers would find familiar. Sun Microsystems released the first public implementation as Java 1.0 in 1996. It promised write once, run anywhere (WORA) functionality, providing no-cost run-times on popular platforms . Fairly secure and featuring configurable security, it allowed network- and file-access restrictions. Major web browsers soon incorporated

4032-689: Was renamed J2SE. In 2006, for marketing purposes, Sun renamed new J2 versions as Java EE , Java ME , and Java SE , respectively. In 1997, Sun Microsystems approached the ISO/IEC JTC 1 standards body and later the Ecma International to formalize Java, but it soon withdrew from the process. Java remains a de facto standard , controlled through the Java Community Process . At one time, Sun made most of its Java implementations available without charge, despite their proprietary software status. Sun generated revenue from Java through

4096-625: Was the third most popular programming language in 2022 according to GitHub . Although still widely popular, there has been a gradual decline in use of Java in recent years with other languages using JVM gaining popularity. Java was originally developed by James Gosling at Sun Microsystems . It was released in May 1995 as a core component of Sun's Java platform . The original and reference implementation Java compilers , virtual machines, and class libraries were originally released by Sun under proprietary licenses . As of May 2007, in compliance with

#869130