Misplaced Pages

Whispering

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.

Whispering is an unvoiced mode of phonation in which the vocal cords are abducted so that they do not vibrate; air passes between the arytenoid cartilages to create audible turbulence during speech. Supralaryngeal articulation remains the same as in normal speech.

#306693

178-439: In normal speech, the vocal cords alternate between states of voice and voicelessness. In whispering, only the voicing segments change, so that the vocal cords alternate between whisper and voicelessness (though the acoustic difference between the two states is minimal). Because of this, implementing speech recognition for whispered speech is more difficult, as the characteristic spectral range needed to detect syllables and words

356-580: A deep learning method called Long short-term memory (LSTM), a recurrent neural network published by Sepp Hochreiter & Jürgen Schmidhuber in 1997. LSTM RNNs avoid the vanishing gradient problem and can learn "Very Deep Learning" tasks that require memories of events that happened thousands of discrete time steps ago, which is important for speech. Around 2007, LSTM trained by Connectionist Temporal Classification (CTC) started to outperform traditional speech recognition in certain applications. In 2015, Google's speech recognition reportedly experienced

534-430: A finite state transducer verifying certain assumptions. Dynamic time warping is an approach that was historically used for speech recognition but has now largely been displaced by the more successful HMM-based approach. Dynamic time warping is an algorithm for measuring similarity between two sequences that may vary in time or speed. For instance, similarities in walking patterns would be detected, even if in one video

712-409: A realist perspective considering the observed phenomena as an external and independent reality is often associated with an emphasis on empirical data collection and a more distanced and objective attitude. Idealists , on the other hand, hold that external reality is not fully independent of the mind and tend, therefore, to include more subjective tendencies in the research process as well. For

890-458: A research question , which determines what kind of information one intends to acquire. Some theorists prefer an even wider understanding of methodology that involves not just the description, comparison, and evaluation of methods but includes additionally more general philosophical issues. One reason for this wider approach is that discussions of when to use which method often take various background assumptions for granted, for example, concerning

1068-412: A biologist inserting viral DNA into a bacterium is engaged in a form of experimentation. Pure observation, on the other hand, involves studying independent entities in a passive manner. This is the case, for example, when astronomers observe the orbits of astronomical objects far away. Observation played the main role in ancient science . The scientific revolution in the 16th and 17th century affected

1246-519: A chest X-ray vs. a gastrointestinal contrast series for a radiology system. Prolonged use of speech recognition software in conjunction with word processors has shown benefits to short-term-memory restrengthening in brain AVM patients who have been treated with resection . Further research needs to be conducted to determine cognitive benefits for individuals whose AVMs have been treated using radiologic techniques. Substantial efforts have been devoted in

1424-432: A coherent perspective by examining and reevaluating all the relevant beliefs and intuitions. Pragmatists focus on the practical consequences of philosophical theories to assess whether they are true or false. Experimental philosophy is a recently developed approach that uses the methodology of social psychology and the cognitive sciences for gathering empirical evidence and justifying philosophical claims. In

1602-434: A collect call"), domotic appliance control, search key words (e.g. find a podcast where particular words were spoken), simple data entry (e.g., entering a credit card number), preparation of structured documents (e.g. a radiology report), determining speaker characteristics, speech-to-text processing (e.g., word processors or emails ), and aircraft (usually termed direct voice input ). Automatic pronunciation assessment

1780-407: A combination hidden Markov model, which includes both the acoustic and language model information and combining it statically beforehand (the finite state transducer , or FST, approach). A possible improvement to decoding is to keep a set of good candidates instead of just keeping the best candidate, and to use a better scoring function ( re scoring ) to rate these good candidates so that we may pick

1958-405: A complex body of rules and postulates guiding research or as the analysis of such rules and procedures. As a body of rules and postulates, a methodology defines the subject of analysis as well as the conceptual tools used by the analysis and the limits of the analysis. Research projects are usually governed by a structured procedure known as the research process. The goal of this process is given by

SECTION 10

#1732797210307

2136-425: A comprehensive philosophical system based on them. Phenomenology gives particular importance to how things appear to be. It consists in suspending one's judgments about whether these things actually exist in the external world. This technique is known as epoché and can be used to study appearances independent of assumptions about their causes. The method of conceptual analysis came to particular prominence with

2314-402: A continuum and not as a dichotomy. A lot of qualitative research is concerned with some form of human experience or behavior , in which case it tends to focus on a few individuals and their in-depth understanding of the meaning of the studied phenomena. Examples of the qualitative method are a market researcher conducting a focus group in order to learn how people react to a new product or

2492-446: A different speaker and recording conditions; for further speaker normalization, it might use vocal tract length normalization (VTLN) for male-female normalization and maximum likelihood linear regression (MLLR) for more general speaker adaptation. The features would have so-called delta and delta-delta coefficients to capture speech dynamics and in addition, might use heteroscedastic linear discriminant analysis (HLDA); or might skip

2670-648: A dramatic performance jump of 49% through CTC-trained LSTM, which is now available through Google Voice to all smartphone users. Transformers , a type of neural network based solely on "attention", have been widely adopted in computer vision and language modeling, sparking the interest of adapting such models to new domains, including speech recognition. Some recent papers reported superior performance levels using transformer models for speech recognition, but these models usually require large scale training datasets to reach high performance levels. The use of deep feedforward (non-recurrent) networks for acoustic modeling

2848-461: A few years into the 2000s. But these methods never won over the non-uniform internal-handcrafting Gaussian mixture model / hidden Markov model (GMM-HMM) technology based on generative models of speech trained discriminatively. A number of key difficulties had been methodologically analyzed in the 1990s, including gradient diminishing and weak temporal correlation structure in the neural predictive models. All these difficulties were in addition to

3026-496: A finger control on the steering-wheel, enables the speech recognition system and this is signaled to the driver by an audio prompt. Following the audio prompt, the system has a "listening window" during which it may accept a speech input for recognition. Simple voice commands may be used to initiate phone calls, select radio stations or play music from a compatible smartphone, MP3 player or music-loaded flash drive. Voice recognition capabilities vary between car make and model. Some of

3204-439: A fixed set of questions given to each individual. They contrast with unstructured interviews , which are closer to a free-flow conversation and require more improvisation on the side of the interviewer for finding interesting and relevant questions. Semi-structured interviews constitute a middle ground: they include both predetermined questions and questions not planned in advance. Structured interviews make it easier to compare

3382-403: A good methodology helps researchers arrive at reliable theories in an efficient way. The choice of method often matters since the same factual material can lead to different conclusions depending on one's method. Interest in methodology has risen in the 20th century due to the increased importance of interdisciplinary work and the obstacles hindering efficient cooperation. The term "methodology"

3560-436: A limited and subordinate utility but becomes a diversion or even counterproductive by hindering practice when given too much emphasis. Another line of criticism concerns more the general and abstract nature of methodology. It states that the discussion of methods is only useful in concrete and particular cases but not concerning abstract guidelines governing many or all cases. Some anti-methodologists reject methodology based on

3738-452: A list or a controlled vocabulary ) are relatively minimal for people who are sighted and who can operate a keyboard and mouse. A more significant issue is that most EHRs have not been expressly tailored to take advantage of voice-recognition capabilities. A large part of the clinician's interaction with the EHR involves navigation through the user interface using menus, and tab/button clicks, and

SECTION 20

#1732797210307

3916-404: A medical researcher performing an unstructured in-depth interview with a participant from a new experimental therapy to assess its potential benefits and drawbacks. It is also used to improve quantitative research, such as informing data collection materials and questionnaire design. Qualitative research is frequently employed in fields where the pre-existing knowledge is inadequate. This way, it

4094-404: A negative form based on falsification. In this regard, positive instances do not confirm a hypothesis but negative instances disconfirm it. Positive indications that the hypothesis is true are only given indirectly if many attempts to find counterexamples have failed. A cornerstone of this approach is the null hypothesis , which assumes that there is no connection (see causality ) between whatever

4272-455: A paradigm change that gave a much more central role to experimentation in the scientific methodology. This is sometimes expressed by stating that modern science actively "puts questions to nature". While the distinction is usually clear in the paradigmatic cases, there are also many intermediate cases where it is not obvious whether they should be characterized as observation or as experimentation. A central discussion in this field concerns

4450-484: A particular case. According to Aleksandr Georgievich Spirkin, "[a] methodology is a system of principles and general ways of organising and structuring theoretical and practical activity, and also the theory of this system". Helen Kara defines methodology as "a contextual framework for research, a coherent and logical scheme based on views, beliefs, and values, that guides the choices researchers make". Ginny E. Garcia and Dudley L. Poston understand methodology either as

4628-403: A relaxed definition of whispering (i.e., production of short-range, low-amplitude acoustic signals which are significantly different than those produced at high amplitude) cannot be applied to humans without including vocalizations distinct from human whispering (e.g., creaky voice , and falsetto ). Further research is needed to ascertain the existence of whispering in non-humans as established in

4806-430: A renaissance of applications of deep feedforward neural networks for speech recognition. By early 2010s speech recognition, also called voice recognition was clearly differentiated from speaker recognition, and speaker independence was considered a major breakthrough. Until then, systems required a "training" period. A 1987 ad for a doll had carried the tagline "Finally, the doll that understands you." – despite

4984-502: A research project. In this sense, methodologies include various theoretical commitments about the intended outcomes of the investigation. The term "methodology" is sometimes used as a synonym for the term "method". A method is a way of reaching some predefined goal. It is a planned and structured procedure for solving a theoretical or practical problem . In this regard, methods stand in contrast to free and unstructured approaches to problem-solving. For example, descriptive statistics

5162-609: A research question and helps the researchers decide what methods to use in the process. For example, methodology should assist the researcher in deciding why one method of sampling is preferable to another in a particular case or which form of data analysis is likely to bring the best results. Methodology achieves this by explaining, evaluating and justifying methods. Just as there are different methods, there are also different methodologies. Different methodologies provide different approaches to how methods are evaluated and explained and may thus make different suggestions on what method to use in

5340-568: A security process. From the technology perspective, speech recognition has a long history with several waves of major innovations. Most recently, the field has benefited from advances in deep learning and big data . The advances are evidenced not only by the surge of academic papers published in the field, but more importantly by the worldwide industry adoption of a variety of deep learning methods in designing and deploying speech recognition systems. The key areas of growth were: vocabulary size, speaker independence, and processing speed. Raj Reddy

5518-491: A sequence of repeatable instructions. The goal of following the steps of a method is to bring about the result promised by it. In the context of inquiry, methods may be defined as systems of rules and procedures to discover regularities of nature , society , and thought . In this sense, methodology can refer to procedures used to arrive at new knowledge or to techniques of verifying and falsifying pre-existing knowledge claims. This encompasses various issues pertaining both to

Whispering - Misplaced Pages Continue

5696-509: A sequence of symbols or quantities. HMMs are used in speech recognition because a speech signal can be viewed as a piecewise stationary signal or a short-time stationary signal. In a short time scale (e.g., 10 milliseconds), speech can be approximated as a stationary process . Speech can be thought of as a Markov model for many stochastic purposes. Another reason why HMMs are popular is that they can be trained automatically and are simple and computationally feasible to use. In speech recognition,

5874-486: A single researcher or a single discipline but are in need of collaborative efforts from many fields. Such interdisciplinary undertakings profit a lot from methodological advances, both concerning the ability to understand the methods of the respective fields and in relation to developing more homogeneous methods equally used by all of them. Most criticism of methodology is directed at one specific form or understanding of it. In such cases, one particular methodological theory

6052-534: A single unit. Although DTW would be superseded by later algorithms, the technique carried on. Achieving speaker independence remained unsolved at this time period. During the late 1960s Leonard Baum developed the mathematics of Markov chains at the Institute for Defense Analysis . A decade later, at CMU, Raj Reddy's students James Baker and Janet M. Baker began using the hidden Markov model (HMM) for speech recognition. James Baker had learned about HMMs from

6230-513: A speech interface prototype for the Apple computer known as Casper. Lernout & Hauspie , a Belgium-based speech recognition company, acquired several other companies, including Kurzweil Applied Intelligence in 1997 and Dragon Systems in 2000. The L&H speech technology was used in the Windows XP operating system. L&H was an industry leader until an accounting scandal brought an end to

6408-439: A statistical distribution that is a mixture of diagonal covariance Gaussians, which will give a likelihood for each observed vector. Each word, or (for more general speech recognition systems), each phoneme , will have a different output distribution; a hidden Markov model for a sequence of words or phonemes is made by concatenating the individual trained hidden Markov models for the separate words and phonemes. Described above are

6586-470: A substantial amount of data be maintained by the EMR (now more commonly referred to as an Electronic Health Record or EHR). The use of speech recognition is more naturally suited to the generation of narrative text, as part of a radiology/pathology interpretation, progress note or discharge summary: the ergonomic gains of using speech recognition to enter structured discrete data (e.g., numeric values or codes from

6764-467: A summer job at the Institute of Defense Analysis during his undergraduate education. The use of HMMs allowed researchers to combine different sources of knowledge, such as acoustics, language, and syntax, in a unified probabilistic model. The 1980s also saw the introduction of the n-gram language model. Much of the progress in the field is owed to the rapidly increasing capabilities of computers. At

6942-523: A time, until the solution to the initial problem is found. An important advantage of the synthetic method is its clear and short logical exposition. One disadvantage is that it is usually not obvious in the beginning that the steps taken lead to the intended conclusion. This may then come as a surprise to the reader since it is not explained how the mathematician knew in the beginning which steps to take. The analytic method often reflects better how mathematicians actually make their discoveries. For this reason, it

7120-447: A very similar method: it approaches philosophical questions by looking at how the corresponding terms are used in ordinary language . Many methods in philosophy rely on some form of intuition . They are used, for example, to evaluate thought experiments , which involve imagining situations to assess their possible consequences in order to confirm or refute philosophical theories. The method of reflective equilibrium tries to form

7298-488: A way of mastering it. On the theoretical side, this concerns ways of forming true beliefs and solving problems. On the practical side, this concerns skills of influencing nature and dealing with each other. These different methods are usually passed down from one generation to the next. Spirkin holds that the interest in methodology on a more abstract level arose in attempts to formalize these techniques to improve them as well as to make it easier to use them and pass them on. In

Whispering - Misplaced Pages Continue

7476-435: Is a method of data analysis , radiocarbon dating is a method of determining the age of organic objects, sautéing is a method of cooking, and project-based learning is an educational method. The term "technique" is often used as a synonym both in the academic and the everyday discourse. Methods usually involve a clearly defined series of decisions and actions to be used under certain circumstances, usually expressable as

7654-958: Is a method that allows a computer to find an optimal match between two given sequences (e.g., time series) with certain restrictions. That is, the sequences are "warped" non-linearly to match each other. This sequence alignment method is often used in the context of hidden Markov models. Neural networks emerged as an attractive acoustic modeling approach in ASR in the late 1980s. Since then, neural networks have been used in many aspects of speech recognition such as phoneme classification, phoneme classification through multi-objective evolutionary algorithms, isolated word recognition, audiovisual speech recognition , audiovisual speaker recognition and speaker adaptation. Neural networks make fewer explicit assumptions about feature statistical properties than HMMs and have several qualities making them more attractive recognition models for speech recognition. When used to estimate

7832-588: Is an artificial neural network with multiple hidden layers of units between the input and output layers. Similar to shallow neural networks, DNNs can model complex non-linear relationships. DNN architectures generate compositional models, where extra layers enable composition of features from lower layers, giving a huge learning capacity and thus the potential of modeling complex patterns of speech data. A success of DNNs in large vocabulary speech recognition occurred in 2010 by industrial researchers, in collaboration with academic researchers, where large output layers of

8010-501: Is an inborn natural tendency in children to develop in a certain way. For them, pedagogy is about how to help this process happen by ensuring that the required external conditions are set up. Herbartianism identifies five essential components of teaching: preparation, presentation, association, generalization, and application. They correspond to different phases of the educational process: getting ready for it, showing new ideas, bringing these ideas in relation to known ideas, understanding

8188-429: Is associated with a variety of meanings. In its most common usage, it refers either to a method, to the field of inquiry studying methods, or to philosophical discussions of background assumptions involved in these processes. Some researchers distinguish methods from methodologies by holding that methods are modes of data collection while methodologies are more general research strategies that determine how to conduct

8366-405: Is being observed. It is up to the researcher to do all they can to disprove their own hypothesis through relevant methods or techniques, documented in a clear and replicable process. If they fail to do so, it can be concluded that the null hypothesis is false, which provides support for their own hypothesis about the relation between the observed phenomena. Significantly more methodological variety

8544-418: Is called "proceduralism". According to it, the goal of methodology is to boil down the research process to a simple set of rules or a recipe that automatically leads to good research if followed precisely. However, it has been argued that, while this ideal may be acceptable for some forms of quantitative research, it fails for qualitative research. One argument for this position is based on the claim that research

8722-403: Is central to both approaches how the group of individuals used for the data collection is selected. This process is known as sampling . It involves the selection of a subset of individuals or phenomena to be measured. Important in this regard is that the selected samples are representative of the whole population, i.e. that no significant biases were involved when choosing. If this is not the case,

8900-540: Is closely associated with the natural sciences . It is based on precise numerical measurements, which are then used to arrive at exact general laws. This precision is also reflected in the goal of making predictions that can later be verified by other researchers. Examples of quantitative research include physicists at the Large Hadron Collider measuring the mass of newly created particles and positive psychologists conducting an online survey to determine

9078-457: Is difficult to ascertain the existence of whispering in non-humans. This is made more difficult by the specific physiology of human whispering. By sufficiently relaxing the definition of whispering, it can be argued any number of non-human species demonstrate whisper-like behaviors. Often these behaviors function to increase fitness . If whispering is more broadly defined as the "production of short-range, low-amplitude acoustic signals," whispering

SECTION 50

#1732797210307

9256-467: Is edited and report finalized. Deferred speech recognition is widely used in the industry currently. One of the major issues relating to the use of speech recognition in healthcare is that the American Recovery and Reinvestment Act of 2009 ( ARRA ) provides for substantial financial benefits to physicians who utilize an EMR according to "Meaningful Use" standards. These standards require that

9434-493: Is essential for avoiding inaccuracies from accent bias, especially in high-stakes assessments; from words with multiple correct pronunciations; and from phoneme coding errors in machine-readable pronunciation dictionaries. In 2022, researchers found that some newer speech to text systems, based on end-to-end reinforcement learning to map audio signals directly into words, produce word and phrase confidence scores very closely correlated with genuine listener intelligibility. In

9612-413: Is found in the social sciences , where both quantitative and qualitative approaches are used. They employ various forms of data collection, such as surveys , interviews, focus groups, and the nominal group technique. Surveys belong to quantitative research and usually involve some form of questionnaire given to a large group of individuals. It is paramount that the questions are easily understandable by

9790-452: Is generally used quietly, to limit the hearing of speech to those closest to the speaker; for example, to convey secret information without being overheard or to avoid disturbing others in a quiet place such as a library or place of worship. Loud whispering, known as a stage whisper , is generally used only for dramatic or emphatic purposes. Whispering can strain the vocal cords more than regular speech in some people, for whom speaking softly

9968-411: Is heavily dependent on keyboard and mouse: voice-based navigation provides only modest ergonomic benefits. By contrast, many highly customized systems for radiology or pathology dictation implement voice "macros", where the use of certain phrases – e.g., "normal report", will automatically fill in a large number of default values and/or generate boilerplate, which will vary with the type of the exam – e.g.,

10146-432: Is important so that other researchers are able to repeat the experiments to confirm or disconfirm the initial study. For this reason, various factors and variables of the situation often have to be controlled to avoid distorting influences and to ensure that subsequent measurements by other researchers yield the same results. The scientific method is a quantitative approach that aims at obtaining numerical data. This data

10324-574: Is incapable of learning the language due to conditional independence assumptions similar to a HMM. Consequently, CTC models can directly learn to map speech acoustics to English characters, but the models make many common spelling mistakes and must rely on a separate language model to clean up the transcripts. Later, Baidu expanded on the work with extremely large datasets and demonstrated some commercial success in Chinese Mandarin and English. In 2016, University of Oxford presented LipNet ,

10502-412: Is no one single scientific method. In this regard, the expression "scientific method" refers not to one specific procedure but to different general or abstract methodological aspects characteristic of all the aforementioned fields. Important features are that the problem is formulated in a clear manner and that the evidence presented for or against a theory is public, reliable, and replicable. The last point

10680-436: Is not a technique but a craft that cannot be achieved by blindly following a method. In this regard, research depends on forms of creativity and improvisation to amount to good science. Other types include inductive, deductive, and transcendental methods. Inductive methods are common in the empirical sciences and proceed through inductive reasoning from many particular observations to arrive at general conclusions, often in

10858-452: Is not given through the total absence of tone . More advanced techniques such as neural networks may be used, however, as is done by Amazon Alexa . There is no symbol in the IPA for whispered phonation, since it is not used phonemically in any language. However, a sub-dot under phonemically voiced segments is sometimes seen in the literature, as [ʃʊ̣ḍ] for whispered should. Whispering

SECTION 60

#1732797210307

11036-578: Is not just a waste of time but actually has negative side effects. Such an argument may be defended by analogy to other skills that work best when the agent focuses only on employing them. In this regard, reflection may interfere with the process and lead to avoidable mistakes. According to an example by Gilbert Ryle , "[w]e run, as a rule, worse, not better, if we think a lot about our feet". A less severe version of this criticism does not reject methodology per se but denies its importance and rejects an intense focus on it. In this regard, methodology has still

11214-571: Is observed in myriad animals including non-human mammals, fish, and insects. If whispering is restricted to include only acoustic signals which are significantly different than those produced at high amplitude, whispering is still observed across biological taxa . An unlikely example is the croaking gourami . Croaking gouramis produce a high-amplitude "croak" during agonistic disputes by beating specialized pectoral fins. Female gouramis additionally use these fins to produce an acoustically distinct, low-amplitude "purr" during copulation . If whispering

11392-509: Is often argued that the paradigm of the natural sciences is a one-sided development of reason , which is not equally well suited to all areas of inquiry. The divide between quantitative and qualitative methods in the social sciences is one consequence of this criticism. Which method is more appropriate often depends on the goal of the research. For example, quantitative methods usually excel for evaluating preconceived hypotheses that can be clearly formulated and measured. Qualitative methods, on

11570-407: Is often broken down into several steps. In a typical case, the procedure starts with regular observation and the collection of information. These findings then lead the scientist to formulate a hypothesis describing and explaining the observed phenomena. The next step consists in conducting an experiment designed for this specific hypothesis. The actual results of the experiment are then compared to

11748-441: Is often described using mathematical formulas. The goal is usually to arrive at some universal generalizations that apply not just to the artificial situation of the experiment but to the world at large. Some data can only be acquired using advanced measurement instruments. In cases where the data is very complex, it is often necessary to employ sophisticated statistical techniques to draw conclusions from it. The scientific method

11926-427: Is often seen as the better method for teaching mathematics. It starts with the intended conclusion and tries to find another formula from which it can be deduced. It then goes on to apply the same process to this new formula until it has traced back all the way to already proven theorems. The difference between the two methods concerns primarily how mathematicians think and present their proofs . The two are equivalent in

12104-402: Is possible to get a first impression of the field and potential theories, thus paving the way for investigating the issue in further studies. Quantitative methods dominate in the natural sciences but both methodologies are used in the social sciences. Some social scientists focus mostly on one method while others try to investigate the same phenomenon using a variety of different methods. It

12282-502: Is recommended instead. In 2010, it was discovered that whispering is one of the many triggers of ASMR , a tingling sensation caused by listening to soft, relaxing sounds. This phenomenon made news headlines after videos on YouTube of people speaking up close to the camera in a soft whisper, giving the viewer tingles. People often listen to these videos to help them sleep and to relax. The prevalence and function of low-amplitude signaling by non-humans are poorly characterized. As such, it

12460-445: Is rejected but not methodology at large when understood as a field of research comprising many different theories. In this regard, many objections to methodology focus on the quantitative approach, specifically when it is treated as the only viable approach. Nonetheless, there are also more fundamental criticisms of methodology in general. They are often based on the idea that there is little value to abstract discussions of methods and

12638-415: Is rejected by interpretivists . Max Weber , for example, argues that the method of the natural sciences is inadequate for the social sciences. Instead, more importance is placed on meaning and how people create and maintain their social worlds. The critical methodology in social science is associated with Karl Marx and Sigmund Freud . It is based on the assumption that many of the phenomena studied using

12816-490: Is restricted to include only creatures possessing vocal folds (i.e., mammals and some reptiles), whispering has been observed in species including cotton-top tamarins and a variety of bats. In captive cotton-top tamarins, whisper-like behavior is speculated to enable troop communication while not alerting predators. Numerous species of bats (e.g., spotted bats , northern long-eared bats , and western barbastelles ) alter their echolocation calls to avoid detection by prey. Such

12994-480: Is similar but the interaction between the participants is more structured. The goal is to determine how much agreement there is among the experts on the different issues. The initial responses are often given in written form by each participant without a prior conversation between them. In this manner, group effects potentially influencing the expressed opinions are minimized. In later steps, the different responses and comments may be discussed and compared to each other by

13172-430: Is termed a "procedure". A similar but less complex characterization is sometimes found in the field of language teaching , where the teaching process may be described through a three-level conceptualization based on "approach", "method", and "technique". One question concerning the definition of methodology is whether it should be understood as a descriptive or a normative discipline. The key difference in this regard

13350-405: Is that "[m]ethodology is too important to be left to methodologists". Alan Bryman has rejected this negative outlook on methodology. He holds that Becker's criticism can be avoided by understanding methodology as an inclusive inquiry into all kinds of methods and not as a mere doctrine for converting non-believers to one's preferred method. Part of the importance of methodology is reflected in

13528-626: Is that the different paradigms are incommensurable . This means that there is no overarching framework to assess the conflicting theoretical and methodological assumptions. This critique puts into question various presumptions of the quantitative approach associated with scientific progress based on the steady accumulation of data. Other discussions of abstract theoretical issues in the philosophy of science are also sometimes included. This can involve questions like how and whether scientific research differs from fictional writing as well as whether research studies objective facts rather than constructing

13706-480: Is the most general term. It can be defined as "a way or direction used to address a problem based on a set of assumptions". An example is the difference between hierarchical approaches, which consider one task at a time in a hierarchical manner, and concurrent approaches, which consider them all simultaneously. Methodologies are a little more specific. They are general strategies needed to realize an approach and may be understood as guidelines for how to make choices. Often

13884-414: Is the study of research methods. However, the term can also refer to the methods themselves or to the philosophical discussion of associated background assumptions. A method is a structured procedure for bringing about a certain goal, like acquiring knowledge or verifying knowledge claims. This normally involves various steps, like choosing a sample , collecting data from this sample, and interpreting

14062-406: Is then argued that the observed phenomena can only exist if their conditions of possibility are fulfilled. This way, the researcher may draw general psychological or metaphysical conclusions based on the claim that the phenomenon would not be observable otherwise. It has been argued that a proper understanding of methodology is important for various issues in the field of research. They include both

14240-735: Is to do away with hand-crafted feature engineering and to use raw features. This principle was first explored successfully in the architecture of deep autoencoder on the "raw" spectrogram or linear filter-bank features, showing its superiority over the Mel-Cepstral features which contain a few stages of fixed transformation from spectrograms. The true "raw" features of speech, waveforms, have more recently been shown to produce excellent larger-scale speech recognition results. Since 2014, there has been much research interest in "end-to-end" ASR. Traditional phonetic-based (i.e., all HMM -based model) approaches required separate components and training for

14418-401: Is to what extent they can be applied to other fields, like the social sciences and history . The success of the natural sciences was often seen as an indication of the superiority of the quantitative methodology and used as an argument to apply this approach to other fields as well. However, this outlook has been put into question in the more recent methodological discourse. In this regard, it

14596-413: Is used in education such as for spoken language learning. The term voice recognition or speaker identification refers to identifying the speaker, rather than what they are saying. Recognizing the speaker can simplify the task of translating speech in systems that have been trained on a specific person's voice or it can be used to authenticate or verify the identity of a speaker as part of

14774-425: Is usually difficult to use these insights to discern more general patterns true for a wider public. One advantage of focus groups is that they can help the researcher identify a wide range of distinct perspectives on the issue in a short time. The group interaction may also help clarify and expand interesting contributions. One disadvantage is due to the moderator's personality and group effects , which may influence

14952-622: Is whether methodology just provides a value-neutral description of methods or what scientists actually do. Many methodologists practice their craft in a normative sense, meaning that they express clear opinions about the advantages and disadvantages of different methods. In this regard, methodology is not just about what researchers actually do but about what they ought to do or how to perform good research. Theorists often distinguish various general types or approaches to methodology. The most influential classification contrasts quantitative and qualitative methodology . Quantitative research

15130-498: The Common European Framework of Reference for Languages (CEFR) assessment criteria for "overall phonological control", intelligibility outweighs formally correct pronunciation at all levels. In the health care sector, speech recognition can be implemented in front-end or back-end of the medical documentation process. Front-end speech recognition is where the provider dictates into a speech-recognition engine,

15308-515: The JAS-39 Gripen cockpit, Englund (2004) found recognition deteriorated with increasing g-loads . The report also concluded that adaptation greatly improved the results in all cases and that the introduction of models for breathing was shown to improve recognition scores significantly. Contrary to what might have been expected, no effects of the broken English of the speakers were found. It was evident that spontaneous speech caused problems for

15486-575: The Sphinx-II system at CMU. The Sphinx-II system was the first to do speaker-independent, large vocabulary, continuous speech recognition and it had the best performance in DARPA's 1992 evaluation. Handling continuous speech with a large vocabulary was a major milestone in the history of speech recognition. Huang went on to found the speech recognition group at Microsoft in 1993. Raj Reddy's student Kai-Fu Lee joined Apple where, in 1992, he helped develop

15664-461: The University of Montreal in 2016. The model named "Listen, Attend and Spell" (LAS), literally "listens" to the acoustic signal, pays "attention" to different parts of the signal and "spells" out the transcript one character at a time. Unlike CTC-based models, attention-based models do not have conditional-independence assumptions and can learn all the components of a speech recognizer including

15842-706: The computer science , linguistics and computer engineering fields. The reverse process is speech synthesis . Some speech recognition systems require "training" (also called "enrollment") where an individual speaker reads text or isolated vocabulary into the system. The system analyzes the person's specific voice and uses it to fine-tune the recognition of that person's speech, resulting in increased accuracy. Systems that do not use training are called "speaker-independent" systems. Systems that use training are called "speaker dependent". Speech recognition applications include voice user interfaces such as voice dialing (e.g. "call home"), call routing (e.g. "I would like to make

16020-419: The quantitative approach , philosophical debates in methodology include the distinction between the inductive and the hypothetico-deductive interpretation of the scientific method. For qualitative research , many basic assumptions are tied to philosophical positions such as hermeneutics , pragmatism , Marxism , critical theory , and postmodernism . According to Kuhn, an important factor in such debates

16198-429: The skills , knowledge, and practical guidance needed to conduct scientific research in an efficient manner. It acts as a guideline for various decisions researchers need to take in the scientific process. Methodology can be understood as the middle ground between concrete particular methods and the abstract and general issues discussed by the philosophy of science . In this regard, methodology comes after formulating

16376-517: The DNN based on context dependent HMM states constructed by decision trees were adopted. See comprehensive reviews of this development and of the state of the art as of October 2014 in the recent Springer book from Microsoft Research. See also the related background of automatic speech recognition and the impact of various machine learning paradigms, notably including deep learning , in recent overview articles. One fundamental principle of deep learning

16554-628: The EARS program: IBM , a team led by BBN with LIMSI and Univ. of Pittsburgh , Cambridge University , and a team composed of ICSI , SRI and University of Washington . EARS funded the collection of the Switchboard telephone speech corpus containing 260 hours of recorded conversations from over 500 speakers. The GALE program focused on Arabic and Mandarin broadcast news speech. Google 's first effort at speech recognition came in 2007 after hiring some researchers from Nuance. The first product

16732-442: The advent of analytic philosophy . It studies concepts by breaking them down into their most fundamental constituents to clarify their meaning. Common sense philosophy uses common and widely accepted beliefs as a philosophical tool. They are used to draw interesting conclusions. This is often employed in a negative sense to discredit radical philosophical positions that go against common sense . Ordinary language philosophy has

16910-575: The best one according to this refined score. The set of candidates can be kept either as a list (the N-best list approach) or as a subset of the models (a lattice ). Re scoring is usually done by trying to minimize the Bayes risk (or an approximation thereof) Instead of taking the source sentence with maximal probability, we try to take the sentence that minimizes the expectancy of a given loss function with regards to all possible transcriptions (i.e., we take

17088-415: The capabilities of deep learning models, particularly due to the high costs of training models from scratch, and the small size of available corpus in many languages and/or specific domains. An alternative approach to CTC-based models are attention-based models. Attention-based ASR models were introduced simultaneously by Chan et al. of Carnegie Mellon University and Google Brain and Bahdanau et al. of

17266-475: The choice of methodology may have a severe impact on a research project. The reason is that very different and sometimes even opposite conclusions may follow from the same factual material based on the chosen methodology. Aleksandr Georgievich Spirkin argues that methodology, when understood in a wide sense, is of great importance since the world presents us with innumerable entities and relations between them. Methods are needed to simplify this complexity and find

17444-722: The claim that researchers need freedom to do their work effectively. But this freedom may be constrained and stifled by "inflexible and inappropriate guidelines". For example, according to Kerry Chamberlain , a good interpretation needs creativity to be provocative and insightful, which is prohibited by a strictly codified approach. Chamberlain uses the neologism "methodolatry" to refer to this alleged overemphasis on methodology. Similar arguments are given in Paul Feyerabend 's book " Against Method ". However, these criticisms of methodology in general are not always accepted. Many methodologists defend their craft by pointing out how

17622-507: The cloud and require a network connection as opposed to the device locally. The first attempt at end-to-end ASR was with Connectionist Temporal Classification (CTC)-based systems introduced by Alex Graves of Google DeepMind and Navdeep Jaitly of the University of Toronto in 2014. The model consisted of recurrent neural networks and a CTC layer. Jointly, the RNN-CTC model learns the pronunciation and acoustic model together, however it

17800-421: The collection of data and their analysis. Concerning the collection, it involves the problem of sampling and of how to go about the data collection itself, like surveys, interviews, or observation. There are also numerous methods of how the collected data can be analyzed using statistics or other ways of interpreting it to extract interesting conclusions. However, many theorists emphasize the differences between

17978-438: The company in 2001. The speech technology from L&H was bought by ScanSoft which became Nuance in 2005. Apple originally licensed software from Nuance to provide speech recognition capability to its digital assistant Siri . In the 2000s DARPA sponsored two speech recognition programs: Effective Affordable Reusable Speech-to-Text (EARS) in 2002 and Global Autonomous Language Exploitation (GALE). Four teams participated in

18156-408: The context of regular schools . But in its widest sense, it encompasses all forms of education, both inside and outside schools. In this wide sense, pedagogy is concerned with "any conscious activity by one person designed to enhance learning in another". The teaching happening this way is a process taking place between two parties: teachers and learners. Pedagogy investigates how the teacher can help

18334-488: The core elements of the most common, HMM-based approach to speech recognition. Modern speech recognition systems use various combinations of a number of standard techniques in order to improve results over the basic approach described above. A typical large-vocabulary system would need context dependency for the phonemes (so that phonemes with different left and right context would have different realizations as HMM states); it would use cepstral normalization to normalize for

18512-519: The correctness of the learner's pronunciation and ideally their intelligibility to listeners, sometimes along with often inconsequential prosody such as intonation , pitch , tempo , rhythm , and stress . Pronunciation assessment is also used in reading tutoring , for example in products such as Microsoft Teams and from Amira Learning. Automatic pronunciation assessment can also be used to help diagnose and treat speech disorders such as apraxia . Assessing authentic listener intelligibility

18690-469: The correlation between income and self-assessed well-being . Qualitative research is characterized in various ways in the academic literature but there are very few precise definitions of the term. It is often used in contrast to quantitative research for forms of study that do not quantify their subject matter numerically. However, the distinction between these two types is not always obvious and various theorists have argued that it should be understood as

18868-447: The data at hand. It tries to summarize the most salient features and present them in insightful ways. This can happen, for example, by visualizing its distribution or by calculating indices such as the mean or the standard deviation . Inferential statistics, on the other hand, uses this data based on a sample to draw inferences about the population at large. That can take the form of making generalizations and predictions or by assessing

19046-500: The data collected does not reflect what the population as a whole is like. This affects generalizations and predictions drawn from the biased data. The number of individuals selected is called the sample size . For qualitative research, the sample size is usually rather small, while quantitative research tends to focus on big groups and collecting a lot of data. After the collection, the data needs to be analyzed and interpreted to arrive at interesting conclusions that pertain directly to

19224-460: The data is misinterpreted to defend conclusions that are not directly supported by the measurements themselves. In recent decades, many researchers in the social sciences have started combining both methodologies. This is known as mixed-methods research . A central motivation for this is that the two approaches can complement each other in various ways: some issues are ignored or too difficult to study with one methodology and are better approached with

19402-486: The data. The study of methods concerns a detailed description and analysis of these processes. It includes evaluative aspects by comparing different methods. This way, it is assessed what advantages and disadvantages they have and for what research goals they may be used. These descriptions and evaluations depend on philosophical background assumptions. Examples are how to conceptualize the studied phenomena and what constitutes evidence for or against them. When understood in

19580-450: The database to find conversations of interest. Some government research programs focused on intelligence applications of speech recognition, e.g. DARPA's EARS's program and IARPA 's Babel program . In the early 2000s, speech recognition was still dominated by traditional approaches such as hidden Markov models combined with feedforward artificial neural networks . Today, however, many aspects of speech recognition have been taken over by

19758-463: The delta and delta-delta coefficients and use splicing and an LDA -based projection followed perhaps by heteroscedastic linear discriminant analysis or a global semi-tied co variance transform (also known as maximum likelihood linear transform , or MLLT). Many systems use so-called discriminative training techniques that dispense with a purely statistical approach to HMM parameter estimation and instead optimize some classification-related measure of

19936-529: The discovery of new methods, like methodological skepticism and the phenomenological method , has had important impacts on the philosophical discourse. A great variety of methods has been employed throughout the history of philosophy. Methodological skepticism gives special importance to the role of systematic doubt. This way, philosophers try to discover absolutely certain first principles that are indubitable. The geometric method starts from such first principles and employs deductive reasoning to construct

20114-417: The distinction between the inductive and the hypothetico-deductive methodology . The core disagreement between these two approaches concerns their understanding of the confirmation of scientific theories. The inductive approach holds that a theory is confirmed or supported by all its positive instances, i.e. by all the observations that exemplify it. For example, the observations of many white swans confirm

20292-426: The efficiency and reliability of research can be improved through a proper understanding of methodology. A criticism of more specific forms of methodology is found in the works of the sociologist Howard S. Becker . He is quite critical of methodologists based on the claim that they usually act as advocates of one particular method usually associated with quantitative research. An often-cited quotation in this regard

20470-452: The end of the DARPA program in 1976, the best computer available to researchers was the PDP-10 with 4 MB ram. It could take up to 100 minutes to decode just 30 seconds of speech. Two practical products were: By this point, the vocabulary of the typical commercial speech recognition system was larger than the average human vocabulary. Raj Reddy's former student, Xuedong Huang , developed

20648-454: The expected results based on one's hypothesis. The findings may then be interpreted and published, either as a confirmation or disconfirmation of the initial hypothesis. Two central aspects of the scientific method are observation and experimentation . This distinction is based on the idea that experimentation involves some form of manipulation or intervention. This way, the studied phenomena are actively created or shaped. For example,

20826-462: The fact that it was described as "which children could train to respond to their voice". In 2017, Microsoft researchers reached a historical human parity milestone of transcribing conversational telephony speech on the widely benchmarked Switchboard task. Multiple deep learning models were used to optimize speech recognition accuracy. The speech recognition word error rate was reported to be as low as 4 professional human transcribers working together on

21004-509: The field of mathematics , various methods can be distinguished, such as synthetic, analytic, deductive, inductive, and heuristic methods. For example, the difference between synthetic and analytic methods is that the former start from the known and proceed to the unknown while the latter seek to find a path from the unknown to the known. Geometry textbooks often proceed using the synthetic method. They start by listing known definitions and axioms and proceed by taking inferential steps , one at

21182-439: The field of research, for example, the goal of this process is to find reliable means to acquire knowledge in contrast to mere opinions acquired by unreliable means. In this regard, "methodology is a way of obtaining and building up ... knowledge". Various theorists have observed that the interest in methodology has risen significantly in the 20th century. This increased interest is reflected not just in academic publications on

21360-587: The first end-to-end sentence-level lipreading model, using spatiotemporal convolutions coupled with an RNN-CTC architecture, surpassing human-level performance in a restricted grammar dataset. A large-scale CNN-RNN-CTC architecture was presented in 2018 by Google DeepMind achieving 6 times better performance than human experts. In 2019, Nvidia launched two CNN-CTC ASR models, Jasper and QuarzNet, with an overall performance WER of 3%. Similar to other deep learning applications, transfer learning and domain adaptation are important strategies for reusing and extending

21538-468: The form of universal laws. Deductive methods, also referred to as axiomatic methods, are often found in formal sciences , such as geometry . They start from a set of self-evident axioms or first principles and use deduction to infer interesting conclusions from these axioms. Transcendental methods are common in Kantian and post-Kantian philosophy. They start with certain particular observations. It

21716-402: The general principle behind their instances, and putting what one has learned into practice. Learning theories focus primarily on how learning takes place and formulate the proper methods of teaching based on these insights. One of them is apperception or association theory , which understands the mind primarily in terms of associations between ideas and experiences. On this view, the mind

21894-436: The goal and nature of research. These assumptions can at times play an important role concerning which method to choose and how to follow it. For example, Thomas Kuhn argues in his The Structure of Scientific Revolutions that sciences operate within a framework or a paradigm that determines which questions are asked and what counts as good science. This concerns philosophical disagreements both about how to conceptualize

22072-411: The group as a whole. Most of these forms of data collection involve some type of observation . Observation can take place either in a natural setting, i.e. the field , or in a controlled setting such as a laboratory. Controlled settings carry with them the risk of distorting the results due to their artificiality. Their advantage lies in precisely controlling the relevant factors, which can help make

22250-487: The hidden Markov model would output a sequence of n -dimensional real-valued vectors (with n being a small integer, such as 10), outputting one of these every 10 milliseconds. The vectors would consist of cepstral coefficients, which are obtained by taking a Fourier transform of a short time window of speech and decorrelating the spectrum using a cosine transform , then taking the first (most significant) coefficients. The hidden Markov model will tend to have in each state

22428-421: The interactions and responses of the participants. The interview often starts by asking the participants about their opinions on the topic under investigation, which may, in turn, lead to a free exchange in which the group members express and discuss their personal views. An important advantage of focus groups is that they can provide insight into how ideas and understanding operate in a cultural context. However, it

22606-557: The lack of big training data and big computing power in these early days. Most speech recognition researchers who understood such barriers hence subsequently moved away from neural nets to pursue generative modeling approaches until the recent resurgence of deep learning starting around 2009–2010 that had overcome all these difficulties. Hinton et al. and Deng et al. reviewed part of this recent history about how their collaboration with each other and then with colleagues across four groups (University of Toronto, Microsoft, Google, and IBM) ignited

22784-458: The larger article. Speech recognition Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also known as automatic speech recognition ( ASR ), computer speech recognition or speech-to-text ( STT ). It incorporates knowledge and research in

22962-921: The last decade to the test and evaluation of speech recognition in fighter aircraft . Of particular note have been the US program in speech recognition for the Advanced Fighter Technology Integration (AFTI) / F-16 aircraft ( F-16 VISTA ), the program in France for Mirage aircraft, and other programs in the UK dealing with a variety of aircraft platforms. In these programs, speech recognizers have been operated successfully in fighter aircraft, with applications including setting radio frequencies, commanding an autopilot system, setting steer-point coordinates and weapons release parameters, and controlling flight display. Working with Swedish pilots flying in

23140-465: The learner undergo experiences that promote their understanding of the subject matter in question. Various influential pedagogical theories have been proposed. Mental-discipline theories were already common in ancient Greek and state that the main goal of teaching is to train intellectual capacities. They are usually based on a certain ideal of the capacities, attitudes, and values possessed by educated people. According to naturalistic theories, there

23318-414: The main application of this technology is computer-aided pronunciation teaching (CAPT) when combined with computer-aided instruction for computer-assisted language learning (CALL), speech remediation , or accent reduction . Pronunciation assessment does not determine unknown speech (as in dictation or automatic transcription ) but instead, knowing the expected word(s) in advance, it attempts to verify

23496-504: The meaning of the studied phenomena and less at universal and predictive laws. Common methods found in the social sciences are surveys , interviews , focus groups , and the nominal group technique . They differ from each other concerning their sample size, the types of questions asked, and the general setting. In recent decades, many social scientists have started using mixed-methods research , which combines quantitative and qualitative methodologies. Many discussions in methodology concern

23674-424: The methods instead of researching them. This ambiguous attitude towards methodology is sometimes even exemplified in the same person. Max Weber , for example, criticized the focus on methodology during his time while making significant contributions to it himself. Spirkin believes that one important reason for this development is that contemporary society faces many global problems. These problems cannot be solved by

23852-548: The most recent car models offer natural-language speech recognition in place of a fixed set of commands, allowing the driver to use full sentences and common phrases. With such systems there is, therefore, no need for the user to memorize a set of fixed command words. Automatic pronunciation assessment is the use of speech recognition to verify the correctness of pronounced speech, as distinguished from manual assessment by an instructor or proctor. Also called speech verification, pronunciation evaluation, and pronunciation scoring,

24030-467: The natural sciences do. Positivists agree with this characterization, in contrast to interpretive and critical perspectives on the social sciences. According to William Neumann, positivism can be defined as "an organized method for combining deductive logic with precise empirical observations of individual behavior in order to discover and confirm a set of probabilistic causal laws that can be used to predict general patterns of human activity". This view

24208-450: The natural sciences is called the scientific method . It includes steps like observation and the formulation of a hypothesis . Further steps are to test the hypothesis using an experiment, to compare the measurements to the expected results, and to publish the findings. Qualitative research is more characteristic of the social sciences and gives less prominence to exact numerical measurements. It aims more at an in-depth understanding of

24386-517: The number of fields to which it is relevant. They include the natural sciences and the social sciences as well as philosophy and mathematics. The dominant methodology in the natural sciences (like astronomy , biology , chemistry , geoscience , and physics ) is called the scientific method . Its main cognitive aim is usually seen as the creation of knowledge , but various closely related aims have also been proposed, like understanding, explanation, or predictive success. Strictly speaking, there

24564-450: The observations more reliable and repeatable. Non-participatory observation involves a distanced or external approach. In this case, the researcher focuses on describing and recording the observed phenomena without causing or changing them, in contrast to participatory observation . An important methodological debate in the field of social sciences concerns the question of whether they deal with hard, objective, and value-neutral facts, as

24742-402: The opinions stated by the participants. When applied to cross-cultural settings, cultural and linguistic adaptations and group composition considerations are important to encourage greater participation in the group discussion. The nominal group technique is similar to focus groups with a few important differences. The group often consists of experts in the field in question. The group size

24920-433: The original LAS model. Latent Sequence Decompositions (LSD) was proposed by Carnegie Mellon University , MIT and Google Brain to directly emit sub-word units which are more natural than English characters; University of Oxford and Google DeepMind extended LAS to "Watch, Listen, Attend and Spell" (WLAS) to handle lip reading surpassing human-level performance. Typically a manual control input, for example by means of

25098-586: The other approaches are mere distortions or surface illusions. It seeks to uncover deeper structures of the material world hidden behind these distortions. This approach is often guided by the goal of helping people effect social changes and improvements. Philosophical methodology is the metaphilosophical field of inquiry studying the methods used in philosophy . These methods structure how philosophers conduct their research, acquire knowledge, and select between competing theories. It concerns both descriptive issues of what methods have been used by philosophers in

25276-425: The other hand, are based on a variety of studies and try to arrive at more general principles applying to different fields. They may also give particular prominence to the analysis of the language of science and the formal structure of scientific explanation. A closely related classification distinguishes between philosophical, general scientific, and special scientific methods. One type of methodological outlook

25454-444: The other hand, can be used to study complex individual issues, often with the goal of formulating new hypotheses. This is especially relevant when the existing knowledge of the subject is inadequate. Important advantages of quantitative methods include precision and reliability. However, they have often difficulties in studying very complex phenomena that are commonly of interest to the social sciences. Additional problems can arise when

25632-453: The other. In other cases, both approaches are applied to the same issue to produce more comprehensive and well-rounded results. Qualitative and quantitative research are often associated with different research paradigms and background assumptions. Qualitative researchers often use an interpretive or critical approach while quantitative researchers tend to prefer a positivistic approach. Important disagreements between these approaches concern

25810-419: The participants since the answers might not have much value otherwise. Surveys normally restrict themselves to closed questions in order to avoid various problems that come with the interpretation of answers to open questions . They contrast in this regard to interviews, which put more emphasis on the individual participant and often involve open questions. Structured interviews are planned in advance and have

25988-569: The past and normative issues of which methods should be used. Many philosophers emphasize that these methods differ significantly from the methods found in the natural sciences in that they usually do not rely on experimental data obtained through measuring equipment . Which method one follows can have wide implications for how philosophical theories are constructed, what theses are defended, and what arguments are cited in favor or against. In this regard, many philosophical disagreements have their source in methodological disagreements. Historically,

26166-445: The person was walking slowly and if in another he or she were walking more quickly, or even if there were accelerations and deceleration during the course of one observation. DTW has been applied to video, audio, and graphics – indeed, any data that can be turned into a linear representation can be analyzed with DTW. A well-known application has been automatic speech recognition, to cope with different speaking speeds. In general, it

26344-413: The phenomena in a new light. In this regard, a methodology is similar to a paradigm. A similar view is defended by Spirkin, who holds that a central aspect of every methodology is the world view that comes with it. The discussion of background assumptions can include metaphysical and ontological issues in cases where they have important implications for the proper research methodology. For example,

26522-535: The phenomena it claims to study. In the latter sense, some methodologists have even claimed that the goal of science is less to represent a pre-existing reality and more to bring about some kind of social change in favor of repressed groups in society. Viknesh Andiappan and Yoke Kin Wan use the field of process systems engineering to distinguish the term "methodology" from the closely related terms "approach", "method", "procedure", and "technique". On their view, "approach"

26700-399: The phenomena studied, what constitutes evidence for and against them, and what the general goal of researching them is. So in this wider sense, methodology overlaps with philosophy by making these assumptions explicit and presenting arguments for and against them. According to C. S. Herrman, a good methodology clarifies the structure of the data to be analyzed and helps the researchers see

26878-426: The probabilities of a speech feature segment, neural networks allow discriminative training in a natural and efficient manner. However, in spite of their effectiveness in classifying short-time units such as individual phonemes and isolated words, early neural networks were rarely successful for continuous recognition tasks because of their limited ability to model temporal dependencies. One approach to this limitation

27056-424: The probability of a concrete hypothesis. Pedagogy can be defined as the study or science of teaching methods . In this regard, it is the methodology of education : it investigates the methods and practices that can be applied to fulfill the aims of education . These aims include the transmission of knowledge as well as fostering skills and character traits . Its main focus is on teaching methods in

27234-432: The problem of conducting efficient and reliable research as well as being able to validate knowledge claims by others. Method is often seen as one of the main factors of scientific progress . This is especially true for the natural sciences where the developments of experimental methods in the 16th and 17th century are often seen as the driving force behind the success and prominence of the natural sciences. In some cases,

27412-457: The pronunciation, acoustic and language model directly. This means, during deployment, there is no need to carry around a language model making it very practical for applications with limited memory. By the end of 2016, the attention-based models have seen considerable success including outperforming the CTC models (with or without an external language model). Various extensions have been proposed since

27590-517: The pronunciation, acoustic, and language model . End-to-end models jointly learn all the components of the speech recognizer. This is valuable since it simplifies the training process and deployment process. For example, a n-gram language model is required for all HMM-based systems, and a typical n-gram language model often takes several gigabytes in memory making them impractical to deploy on mobile devices. Consequently, modern commercial ASR systems from Google and Apple (as of 2017 ) are deployed on

27768-430: The purposes of data collection. Some researcher employ the go-along method by conducting interviews while they and the participants navigate through and engage with their environment. Focus groups are a qualitative research method often used in market research . They constitute a form of group interview involving a small number of demographically similar people. Researchers can use this method to collect data based on

27946-452: The question of whether the quantitative approach is superior, especially whether it is adequate when applied to the social domain. A few theorists reject methodology as a discipline in general. For example, some argue that it is useless since methods should be used rather than studied. Others hold that it is harmful because it restricts the freedom and creativity of researchers. Methodologists often respond to these objections by claiming that

28124-485: The reasons cited for and against them. In this regard, it may be argued that what matters is the correct employment of methods and not their meticulous study. Sigmund Freud , for example, compared methodologists to "people who clean their glasses so thoroughly that they never have time to look through them". According to C. Wright Mills , the practice of methodology often degenerates into a "fetishism of method and technique". Some even hold that methodological reflection

28302-404: The recognized words are displayed as they are spoken, and the dictator is responsible for editing and signing off on the document. Back-end or deferred speech recognition is where the provider dictates into a digital dictation system, the voice is routed through a speech-recognition machine and the recognized draft document is routed along with the original voice file to the editor, where the draft

28480-484: The recognizer, as might have been expected. A restricted vocabulary, and above all, a proper syntax, could thus be expected to improve recognition accuracy substantially. The Eurofighter Typhoon , currently in service with the UK RAF , employs a speaker-dependent system, requiring each pilot to create a template. The system is not used for any safety-critical or weapon-critical tasks, such as weapon release or lowering of

28658-443: The research question. This way, the wealth of information obtained is summarized and thus made more accessible to others. Especially in the case of quantitative research, this often involves the application of some form of statistics to make sense of the numerous individual measurements. Many discussions in the history of methodology center around the quantitative methods used by the natural sciences. A central question in this regard

28836-455: The responses of the different participants and to draw general conclusions. However, they also limit what may be discovered and thus constrain the investigation in many ways. Depending on the type and depth of the interview, this method belongs either to quantitative or to qualitative research. The terms research conversation and muddy interview have been used to describe interviews conducted in informal settings which may not occur purely for

29014-481: The role of objectivity and hard empirical data as well as the research goal of predictive success rather than in-depth understanding or social change. Various other classifications have been proposed. One distinguishes between substantive and formal methodologies. Substantive methodologies tend to focus on one specific area of inquiry. The findings are initially restricted to this specific field but may be transferrable to other areas of inquiry. Formal methodologies, on

29192-560: The same benchmark, which was funded by IBM Watson speech team on the same task. Both acoustic modeling and language modeling are important parts of modern statistically based speech recognition algorithms. Hidden Markov models (HMMs) are widely used in many systems. Language modeling is also used in many other natural language processing applications such as document classification or statistical machine translation . Modern general-purpose speech recognition systems are based on hidden Markov models. These are statistical models that output

29370-584: The sense that the same proof may be presented either way. Statistics investigates the analysis, interpretation, and presentation of data . It plays a central role in many forms of quantitative research that have to deal with the data of many observations and measurements. In such cases, data analysis is used to cleanse , transform , and model the data to arrive at practically useful conclusions. There are numerous methods of data analysis. They are usually divided into descriptive statistics and inferential statistics . Descriptive statistics restricts itself to

29548-535: The sentence that minimizes the average distance to other possible sentences weighted by their estimated probability). The loss function is usually the Levenshtein distance , though it can be different distances for specific tasks; the set of possible transcriptions is, of course, pruned to maintain tractability. Efficient algorithms have been devised to re score lattices represented as weighted finite state transducers with edit distances represented themselves as

29726-454: The steady incremental improvements of the past few decades, the application of deep learning decreased word error rate by 30%. This innovation was quickly adopted across the field. Researchers have begun to use deep learning techniques for language modeling as well. In the long history of speech recognition, both shallow form and deep form (e.g. recurrent nets) of artificial neural networks had been explored for many years during 1980s, 1990s and

29904-424: The subject but also in the institutionalized establishment of training programs focusing specifically on methodology. This phenomenon can be interpreted in different ways. Some see it as a positive indication of the topic's theoretical and practical importance. Others interpret this interest in methodology as an excessive preoccupation that draws time and energy away from doing research on concrete subjects by applying

30082-410: The term "framework" is used as a synonym. A method is a still more specific way of practically implementing the approach. Methodologies provide the guidelines that help researchers decide which method to follow. The method itself may be understood as a sequence of techniques. A technique is a step taken that can be observed and measured. Each technique has some immediate result. The whole sequence of steps

30260-677: The terms "method" and "methodology". In this regard, methodology may be defined as "the study or description of methods" or as "the analysis of the principles of methods, rules, and postulates employed by a discipline". This study or analysis involves uncovering assumptions and practices associated with the different methods and a detailed description of research designs and hypothesis testing . It also includes evaluative aspects: forms of data collection, measurement strategies, and ways to analyze data are compared and their advantages and disadvantages relative to different research goals and situations are assessed. In this regard, methodology provides

30438-462: The training data. Examples are maximum mutual information (MMI), minimum classification error (MCE), and minimum phone error (MPE). Decoding of the speech (the term for what happens when the system is presented with a new utterance and must compute the most likely source sentence) would probably use the Viterbi algorithm to find the best path, and here there is a choice between dynamically creating

30616-425: The undercarriage, but is used for a wide range of other cockpit functions. Voice commands are confirmed by visual and/or aural feedback. The system is seen as a major design feature in the reduction of pilot workload , and even allows the pilot to assign targets to his aircraft with two simple voice commands or to any of his wingmen with only five commands. Methodology In its most common sense, methodology

30794-411: The universal hypothesis that "all swans are white". The hypothetico-deductive approach, on the other hand, focuses not on positive instances but on deductive consequences of the theory. This way, the researcher uses deduction before conducting an experiment to infer what observations they expect. These expectations are then compared to the observations they actually make. This approach often takes

30972-413: The widest sense, methodology also includes the discussion of these more abstract issues. Methodologies are traditionally divided into quantitative and qualitative research . Quantitative research is the main methodology of the natural sciences . It uses precise numerical measurements . Its goal is usually to find universal laws used to make predictions about future events. The dominant methodology in

31150-687: Was GOOG-411 , a telephone based directory service. The recordings from GOOG-411 produced valuable data that helped Google improve their recognition systems. Google Voice Search is now supported in over 30 languages. In the United States, the National Security Agency has made use of a type of speech recognition for keyword spotting since at least 2006. This technology allows analysts to search through large volumes of recorded conversations and isolate mentions of keywords. Recordings can be indexed and analysts can run queries over

31328-578: Was introduced during the later part of 2009 by Geoffrey Hinton and his students at the University of Toronto and by Li Deng and colleagues at Microsoft Research, initially in the collaborative work between Microsoft and the University of Toronto which was subsequently expanded to include IBM and Google (hence "The shared views of four research groups" subtitle in their 2012 review paper). A Microsoft research executive called this innovation "the most dramatic change in accuracy since 1979". In contrast to

31506-521: Was the first person to take on continuous speech recognition as a graduate student at Stanford University in the late 1960s. Previous systems required users to pause after each word. Reddy's system issued spoken commands for playing chess . Around this time Soviet researchers invented the dynamic time warping (DTW) algorithm and used it to create a recognizer capable of operating on a 200-word vocabulary. DTW processed speech by dividing it into short frames, e.g. 10ms segments, and processing each frame as

31684-449: Was to use neural networks as a pre-processing, feature transformation or dimensionality reduction, step prior to HMM based recognition. However, more recently, LSTM and related recurrent neural networks (RNNs), Time Delay Neural Networks(TDNN's), and transformers have demonstrated improved performance in this area. Deep neural networks and denoising autoencoders are also under investigation. A deep feedforward neural network (DNN)

#306693