gpt4 book ai didi

java - 如何使用 Java 在印地语和拉丁字符之间音译印地语文本?

转载 作者:塔克拉玛干 更新时间:2023-11-02 20:12:45 26 4
gpt4 key购买 nike

如何使用 Java 将用英文字母表编写的印地语意思转换为印地语?

例如。

输入文本是:anil NE lath marke apko Ganga me hi Fenk diya。

印地语

输出文本是:अनिलनेलातमार्केआपकोगंगामेंहीफेंकदिया

如何使用 Java 或任何 Java API 进行转换?

我喜欢一个叫 Jitter 的 google 以外的 api 但出现错误

Source is: inko
Input is: a2b45xdsfsdf
Output is:
Matches is: 0
Exception in thread "main" org.jtr.transliterate.CharacterParseException: No valid delimiter found for start of expression
at org.jtr.transliterate.Perl5Parser.parsePerlString(Perl5Parser.java:133)

源代码是

/*
* Copyright (c) 2001-2005, Nicholas Cull
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* * Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* * Neither the name "jtr" nor the names of its contributors
* may be used to endorse or promote products derived from this software
* without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR
* CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
* EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
* PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
* PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
* LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
* NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
* SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/

package org.jtr.transliterate;

/**
* A utility class for providing Perl 5 syntactic sugar on top of the
* {@link CharacterParser} class. This parses Perl-style transliteration strings
* into a form suitable for <code>CharacterParser</code>. For instance, the
* string <code>"tr/a-zA-Z/0-9a-zA-Z/cd"</code> is parsed into two strings
* <code>"a-zA-Z"</code>, <code>"0-9a-zA-Z"</code>, and the flags
* <code>COMPLEMENT_MASK | DELETE_UNREPLACEABLES_MASK</code>.
*
* @author <a href="mailto:run2000@users.sourceforge.net">Nicholas Cull</a>
* @version $Id: Perl5Parser.java,v 1.3 2005/03/14 06:40:18 run2000 Exp $
* @since 1.1
*/
public final class Perl5Parser {

/** Initial state of the sequence parser. */
private static final short INITIAL_STATE = 0;
/** Sequence parser encountered an escape character. */
private static final short ESCAPE_STATE = 1;
/** Sequence parser encountered the delimiter. */
private static final short DELIMITER_STATE = 2;

/** Private constructor to indicate this is a static class. */
private Perl5Parser() {
}

/**
* <p>Parses the given string in Perl syntax and returns a populated
* {@link CharacterReplacer} object that can be used to perform the
* specified transliteration.</p>
*
* <p>This is a simple factory method that calls the {@link #parsePerlString
* parsePerlString} method below, creates a new {@link CharacterReplacer}
* object and populates it with the results.</p>
*
* @param source the String to be parsed and compiled
* @return a new CharacterReplacer ready for transliterations
* @throws CharacterParseException something went wrong during parsing
* @throws NullPointerException source is <code>null</code>
*/
public static CharacterReplacer makeReplacer( String source )
throws CharacterParseException {

StringBuffer input = new StringBuffer();
StringBuffer output = new StringBuffer();
int flags = 0;
CharacterReplacer replacer;

flags = parsePerlString( source, input, output );
replacer = new CharacterReplacer( input.toString(), output.toString() );
replacer.setFlags( flags );
return replacer;
}

/**
* <p>Parses the given Perl-style transliteration string into two parts:</p>
* <ol>
* <li>The input character string to be transliterated
* <li>The replacement character string
* </ol>
* <p>These strings can then be fed into the constructors for
* {@link CharacterReplacer}. It also returns any flags encountered at the
* end of the string into a form suitable for CharacterReplacer.</p>
*
* @param source the string to be parsed
* @param input (out) the characters to be transliterated
* @param output (out) the replacement characters
* @return any flags parsed at the end of the character sequence
* @throws CharacterParseException there was a problem parsing the source
* String
* @throws NullPointerException source is <code>null</code>
*/
public static int parsePerlString( String source, StringBuffer input,
StringBuffer output ) throws CharacterParseException {

int length = source.length();
int pos = 0;
int flags = 0;
char delimiter;

if( length < 3 ) {
throw new CharacterParseException(
"Source is too small to be parsed", pos );
}

if( input == null || output == null ) {
throw new CharacterParseException(
"String buffers have not been initialized", pos );
}

if( source.startsWith( "tr" )) {
pos = 2;
} else if( source.startsWith( "y" )) {
pos = 1;
}

delimiter = source.charAt( pos );
if( delimiter == '-' || delimiter == '\\' ||
Character.isLetterOrDigit( delimiter )) {
throw new CharacterParseException(
"No valid delimiter found for start of expression", pos );
}

pos++;
pos = parseSequence( source, pos, delimiter, input );

if( pos == length ) {
throw new CharacterParseException(
"Cannot parse replacement sequence, no character sequence found",
pos );
}

pos = parseSequence( source, pos, delimiter, output );

// Parse any flags at the end
while( pos < length ) {
char flag = source.charAt( pos );
switch( flag ) {
case 'c':
flags = flags | CharacterParser.COMPLEMENT_MASK;
break;

case 'd':
flags = flags | CharacterParser.DELETE_UNREPLACEABLES_MASK;
break;

case 's':
flags = flags | CharacterParser.SQUASH_DUPLICATES_MASK;
break;

default:
throw new CharacterParseException(
"Unknown flag passed into character parser", pos );
}
pos++;
}
return flags;
}

/**
* Parse the first or second character sequence and place the parsed result
* into the given StringBuffer. The source string is scanned from initial
* position pos until an unescaped delimiter character is found. We use a
* simple finite state machine to determine when we encounter an escape
* character or a delimiter.
*
* @param source the source String to be scanned
* @param pos the starting position for the scan
* @param delimiter the delimiter character to indicate the end of the
* sequence
* @param buffer the character buffer to store the parsed result
* @return the new position of the parser
* @throws NullPointerException source or buffer are <code>null</code>
*/
private static int parseSequence( String source, int pos, char delimiter,
StringBuffer buffer ) {
int length = source.length();
short state = INITIAL_STATE;
int startPos = pos;
char curr = '\0';

while(( pos < length ) && ( state != DELIMITER_STATE )) {
curr = source.charAt( pos );
switch( state ) {
case INITIAL_STATE:
if( curr == '\\' ) {
state = ESCAPE_STATE;
} else if( curr == delimiter ) {
// Copy the current source to the buffer
buffer.append( source.substring( startPos, pos ));
state = DELIMITER_STATE;
}
break;

case ESCAPE_STATE:
if( curr == delimiter ) {
// Previous character was to escape the delimiter.
// Have to add the previous characters to the buffer.
buffer.append( source.substring( startPos, pos - 1 ));
startPos = pos;
}
state = INITIAL_STATE;
break;
}
pos++;
}

if( state != DELIMITER_STATE ) {
buffer.append( source.substring( startPos ));
}
return pos;
}

/**
* A simple test case for this class.
*
* @param args ignored
* @throws Exception if an exception is encountered, throw it to the caller
*/
public static void main( String args[] ) throws Exception {
String source = "inko";
String input = "a2b45xdsfsdf";
String output = "";
int matches = 0;

try {
CharacterReplacer replacer = makeReplacer( source );
output = replacer.doReplacement( input );
matches = replacer.getMatches();
} finally {
System.out.println( "Source is: " + source );
System.out.println( "Input is: " + input );
System.out.println( "Output is: " + output );
System.out.println( "Matches is: " + matches );
}
}
}

at org.jtr.transliterate.Perl5Parser.makeReplacer(Perl5Parser.java:82)
at org.jtr.transliterate.Perl5Parser.main(Perl5Parser.java:240)

最佳答案

如果我的理解正确,那么您希望能够将印地语文本音译成/从印地语文本音译出来。 (即从一个书写系统转换为另一个)。这通常比翻译(即从一种语言转换为另一种语言)容易得多。

我不知道有图书馆可以做到这一点,但是 this Wiktionary page on Hindi transliteration可能会让您入门。

关于java - 如何使用 Java 在印地语和拉丁字符之间音译印地语文本?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/11643810/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com