PHP Phonetic and Linguistics Tools: Convert words into phonetic representation strings

Recommend this page to a friend!
  Info   View files Documentation   View files View files (36)   DownloadInstall with Composer Download .zip   Reputation   Support forum   Blog    
Last Updated Ratings Unique User Downloads Download Rankings
2021-03-30 (5 months ago) RSS 2.0 feedNot yet rated by the usersTotal: 44 All time: 10,145 This week: 208Up
Version License PHP version Categories
linguistics 1.1.0BSD License7.3Algorithms, Audio, PHP 7
Description Author

This package can convert words into phonetic representation strings.

It can take a given string of English words and converts it into array of strings that represent the words in terms of phonemes.

Currently the package can convert the words into phonemes returned in several possible formats that can be either in IPA (International Phonetic Association) format, NYSIIS, soundex and metaphone.

It also provides classes to compare words in terms of the way they sound, so it is possible to learn how much they match to find whether they can be similar.

Picture of Carlos Artur Curvelo da Matos
  Performance   Level  
Name: Carlos Artur Curvelo da ... is available for providing paid consulting. Contact Carlos Artur Curvelo da ... .
Classes: 17 packages by
Country: Portugal Portugal
Age: 43
All time rank: 323441 in Portugal Portugal
Week rank: 201 Up3 in Portugal Portugal Up
Innovation award
Innovation award
Nominee: 11x

Winner: 2x

Details

Linguistics

Codacy Security Scan License GitHub release

NEW --> Support to NYSIIS encoding What's next? --> Support to Caverphone, Arpabet

This package aims to provide a comprehensive group of new functions and methods to deal with linguistics and phonetics algorithms commonly used for developing or information technology. While PHP already offers functions to encode strings in metaphone and soundex algorithms, some other useful algorithms can't be reached from native functions.

Also, this package brings a dictionary to provide immediate conversion from almost any English word, from text to IPA phonetic symbols. For this moment, just en_US is available, but there are plans to include other languages or dialects eventually.

Installation

The easier way of using this package is to require it using Composer - although the package can be simply cloned and used, as long as the namespaces are respected.

composer require carloswph/linguistics

Usage

This has been organized in independent classes. The first class Phonetics provide three different methods. The method symbols() converts a string in IPA phonetic symbols. If a longer string is provided, the class splits the string in words, returning the respective symbology to all words, excluding repetitions. Additionally, the class provides a bridge for applying the existent functions of PHP - metaphone() and soundex().

All methods offer three different possibilities of response: TXT, JSON or PHP array. It returns TXT by default, so if you want a different format, you can pass the additional argument in the method. A few examples will make it clearer:


use Linguistics\Phonetics;

require __DIR__ . '/vendor/autoload.php';

$str = 'To be or not to be, that is the question';

Phonetics::symbols($str);
/*
Returns:

[ to ] => /?tu/, /t?/, /t?/
[ be ] => /?bi/, /bi/
[ or ] => /???/, /?/
[ not ] => /?n?t/
[ that ] => /?t/, /?t/
[ is ] => /??z/, /?z/
[ the ] => /??/, /?/, /i/
[ question ] => /?kw?st??n/, /?kw???n/
*/

Phonetics::soundex($str);
/*
Returns:

[ to ] => T000
[ be ] => B000
[ or ] => O600
[ not ] => N300
[ that ] => T300
[ is ] => I200
[ the ] => T000
[ question ] => Q235
*/
Phonetics::metaphone($str);
/*
Returns:

[ to ] => T
[ be ] => B
[ or ] => OR
[ not ] => NT
[ that ] => 0T
[ is ] => IS
[ the ] => 0
[ question ] => KSXN
*/

Phonetics::symbols($str, 'array');
/*
Returns:

array(8) { ["to"]=> array(3) { [0]=> string(6) "/?tu/" [1]=> string(6) " /t?/" [2]=> string(6) " /t?/" } ["be"]=> array(2) { [0]=> string(6) "/?bi/" [1]=> string(5) " /bi/" } ["or"]=> array(2) { [0]=> string(8) "/???/" [1]=> string(5) " /?/" } ["not"]=> array(1) { [0]=> string(8) "/?n?t/" } ["that"]=> array(2) { [0]=> string(9) "/?t/" [1]=> string(8) " /?t/" } ["is"]=> array(2) { [0]=> string(7) "/??z/" [1]=> string(6) " /?z/" } ["the"]=> array(3) { [0]=> string(8) "/??/" [1]=> string(7) " /?/" [2]=> string(6) " /i/" } ["question"]=> array(2) { [0]=> string(15) "/?kw?st??n/" [1]=> string(14) " /?kw???n/" } }
*/

Phonetics::symbols($str, 'json');
/*
Returns:

string(410) "{"to":["\/\u02c8tu\/"," \/t\u0259\/"," \/t\u026a\/"],"be":["\/\u02c8bi\/"," \/bi\/"],"or":["\/\u02c8\u0254\u0279\/"," \/\u025d\/"],"not":["\/\u02c8n\u0251t\/"],"that":["\/\u02c8\u00f0\u00e6t\/"," \/\u00f0\u0259t\/"],"is":["\/\u02c8\u026az\/"," \/\u026az\/"],"the":["\/\u02c8\u00f0\u0259\/"," \/\u00f0\u0259\/"," \/\u00f0i\/"],"question":["\/\u02c8kw\u025bst\u0283\u0259n\/"," \/\u02c8kw\u025b\u0283\u0259n\/"]}"
*/

NYSIIS encoding

From v1.1.0, the Phonetics class was reinforced with an additional method which returns The New York State Identification and Intelligence System Phonetic Code, or NYSIIS, to every single word in a sentence (excluding repeated words). The use follows the same logic of the previous static methods.

Phonetics::nysiis($str);
/*
Returns:

[ to ] => T
[ be ] => B
[ or ] => AR
[ not ] => NAT
[ that ] => THAT
[ is ] => A
[ the ] => TH
[ question ] => GAASTAAN
*/

Underway

Three other classes are currently underway:

  • An encoding class for Caverphone algorithm, versions 1.0 and 2.0
  • An encoding class for Match Rating Approach comparison and string encoding implementation.
  • An interesting legacy encoding on Arpabet algorithm
  • Roger Root encoding?

The next stable version, 1.2.0, should already bring the Caverphone class, at least compatible for encoding the 1.0 version of this algorithm.

  Files folder image Files  
File Role Description
Files folder image.github (1 directory)
Files folder imagesrc (5 files, 1 directory)
Accessible without login Plain text file composer.json Data Auxiliary data
Accessible without login Plain text file composer.lock Data Auxiliary data
Accessible without login Plain text file README.md Doc. Documentation

  Files folder image Files  /  .github  
File Role Description
Files folder imageworkflows (1 file)

  Files folder image Files  /  .github  /  workflows  
File Role Description
  Accessible without login Plain text file codacy-analysis.yml Data Auxiliary data

  Files folder image Files  /  src  
File Role Description
Files folder imagedata (1 directory)
  Plain text file Arpabet.php Class Class source
  Plain text file Caverphone.php Class Class source
  Plain text file MatchRatingApproach.php Class Class source
  Plain text file Nysiis.php Class Class source
  Plain text file Phonetics.php Class Class source

  Files folder image Files  /  src  /  data  
File Role Description
Files folder imageen_us (27 files)

  Files folder image Files  /  src  /  data  /  en_us  
File Role Description
  Accessible without login Plain text file a.json Data Auxiliary data
  Accessible without login Plain text file b.json Data Auxiliary data
  Accessible without login Plain text file c.json Data Auxiliary data
  Accessible without login Plain text file d.json Data Auxiliary data
  Accessible without login Plain text file e.json Data Auxiliary data
  Accessible without login Plain text file f.json Data Auxiliary data
  Accessible without login Plain text file g.json Data Auxiliary data
  Accessible without login Plain text file h.json Data Auxiliary data
  Accessible without login Plain text file i.json Data Auxiliary data
  Accessible without login Plain text file j.json Data Auxiliary data
  Accessible without login Plain text file k.json Data Auxiliary data
  Accessible without login Plain text file l.json Data Auxiliary data
  Accessible without login Plain text file m.json Data Auxiliary data
  Accessible without login Plain text file n.json Data Auxiliary data
  Accessible without login Plain text file o.json Data Auxiliary data
  Accessible without login Plain text file p.json Data Auxiliary data
  Accessible without login Plain text file q.json Data Auxiliary data
  Accessible without login Plain text file r.json Data Auxiliary data
  Accessible without login Plain text file s.json Data Auxiliary data
  Accessible without login Plain text file t.json Data Auxiliary data
  Accessible without login Plain text file u.json Data Auxiliary data
  Accessible without login Plain text file v.json Data Auxiliary data
  Accessible without login Plain text file w.json Data Auxiliary data
  Accessible without login Plain text file x.json Data Auxiliary data
  Accessible without login Plain text file y.json Data Auxiliary data
  Accessible without login Plain text file z.json Data Auxiliary data
  Accessible without login Plain text file _.json Data Auxiliary data

 Version Control Unique User Downloads Download Rankings  
 100%
Total:44
This week:0
All time:10,145
This week:208Up
For more information send a message to info at phpclasses dot org.