Parsing Text

by Gilbert Keith

Today’s ProjectEuler problem was not too bad:

The nth term of the sequence of triangle numbers is given by, tn = ½n(n+1); so the first ten triangle numbers are:

1, 3, 6, 10, 15, 21, 28, 36, 45, 55, …

By converting each letter in a word to a number corresponding to its alphabetical position and adding these values we form a word value. For example, the word value for SKY is 19 + 11 + 25 = 55 = t10. If the word value is a triangle number then we shall call the word a triangle word.

Using words.txt (right click and ‘Save Link/Target As…’), a 16K text file containing nearly two-thousand common English words, how many are triangle words?

This one runs in about 22 ms. Again, tried to modularize as much as possible, and didn’t hardcode anything (eg. the upper bound on the number of indices for the boolean array of triangular numbers). If I know I  have to run some calculation repeatedly, I like to calculate it once and then cache it. This is why I create a boolean array of triangular numbers – when I calculate the score for each word, I can simply check whether the value for that index is true or false.

I think if I obtained a large enough array of words, I could hadoop-ify this problem too (merely for the sake of learning.) 🙂

The next “similar” problem to #42 seems a bit complicated. I’m not quite sure I understand it:

By replacing each of the letters in the word CARE with 1, 2, 9, and 6 respectively, we form a square number: 1296 = 362. What is remarkable is that, by using the same digital substitutions, the anagram, RACE, also forms a square number: 9216 = 962. We shall call CARE (and RACE) a square anagram word pair and specify further that leading zeroes are not permitted, neither may a different letter have the same digital value as another letter.

Using words.txt (right click and ‘Save Link/Target As…’), a 16K text file containing nearly two-thousand common English words, find all the square anagram word pairs (a palindromic word is NOT considered to be an anagram of itself).

What is the largest square number formed by any member of such a pair?

NOTE: All anagrams formed must be contained in the given text file.

I think I’m assigning each letter in a word some arbitrary digit, checking if that sequence of digits gives me a square, and also finding an anagram of that word (also in the file) to see whether that gives me a square. Do I automatically exclude any word greater than 10 characters long?

Only 5 more left ’till 50 problems!

import java.io.BufferedReader;
import java.io.DataInputStream;
import java.io.FileInputStream;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.Arrays;

public class Problem42 {

	/**
	 * @param args
	 */

	int longestWordLength;
	ArrayList<String> words;
	int[] arrayOfTriangleNumbers;
	boolean[] isNumberTriangular;
	int countOfTriangularWords;

	public static void main(String[] args) {
		// TODO Auto-generated method stub

		long begin = System.currentTimeMillis();

		Problem42 input;
		input = new Problem42();
		String delimOne="\"",delimTwo=",";
		readFile(input,delimOne,delimTwo);
		buildArrayOfTriangleNums(input);

		System.out.println("Longest Word Length: " + input.longestWordLength);

		for (String word: input.words){
			char[] charArray = word.toCharArray();
			int wordScore=0;
			for (char letter:charArray) {
				wordScore += getCharValue(letter);
			}
			if (input.isNumberTriangular[wordScore]) {
				input.countOfTriangularWords++;
			}
		}

		System.out.println("The number of triangular words: " + input.countOfTriangularWords);
		long end = System.currentTimeMillis();
		System.out.println("Time: " + (end - begin));
	}

	private static void buildArrayOfTriangleNums(Problem42 input) {
		int i=0, nextTriangularNum=0;
		Boolean done=false;
		while (!done) {
			i++;
			nextTriangularNum += i;
			if (nextTriangularNum > input.longestWordLength*26) {
				done = true;
			}
		}

		input.isNumberTriangular = new boolean[nextTriangularNum+1];
		nextTriangularNum=0;
		Arrays.fill(input.isNumberTriangular,false);
		for (int j=1; j<=i; j++) {
			nextTriangularNum=nextTriangularNum+j;
			input.isNumberTriangular[nextTriangularNum]=true;
		}
	}

	private static int getCharValue(char character) {
		return ((int) character - 64);
	}

	private static void readFile(Problem42 inputObj, String delimOne,String delimTwo) {
		inputObj.words = new ArrayList<String>();
		String line="";

		try {
			FileInputStream fileStream = new FileInputStream("/Users/gk/Desktop/words.txt");
			DataInputStream dataStream = new DataInputStream(fileStream);

			BufferedReader bReader = new BufferedReader(new InputStreamReader(dataStream));

			int delimOnePos=0,delimTwoPos=0,commaPos=0;
			String wordString;
			int longWordLength=0,wordLength=0;
			while ((line=bReader.readLine()) != null) {
				do {
					delimOnePos=line.indexOf(delimOne,commaPos);
					delimTwoPos=line.indexOf(delimOne,delimOnePos+1);
					wordString = line.substring(delimOnePos+1,delimTwoPos).toUpperCase();
					inputObj.words.add(wordString);
					commaPos = line.indexOf(delimTwo,delimTwoPos);
					wordLength=wordString.length();
					if (wordLength > longWordLength) {
						longWordLength=wordLength;
					}

				} while(commaPos > 0);
				inputObj.longestWordLength=longWordLength;
			}
			dataStream.close();
		}catch (Exception e){//Catch exception if any
			  System.err.println("Error: " + e.getMessage());
			  }
	}

}

Advertisements