/*! This file is auto-generated */ .wp-block-button__link{color:#fff;background-color:#32373c;border-radius:9999px;box-shadow:none;text-decoration:none;padding:calc(.667em + 2px) calc(1.333em + 2px);font-size:1.125em}.wp-block-file__button{background:#32373c;color:#fff;text-decoration:none} Problem 41 (Text Analysis) The availability... [FREE SOLUTION] | 91Ó°ÊÓ

91Ó°ÊÓ

(Text Analysis) The availability of computers with string-manipulation capabilities has resulted in some rather interesting approaches to analyzing the writings of great authors. Much attention has been focused on whether William Shakespeare ever lived. Some scholars believe there is substantial evidence indicating that Christopher Marlowe or other authors actually penned the masterpieces attributed to Shakespeare. Researchers have used computers to find similarities in the writings of these two authors. This exercise examines three methods for analyzing texts with a computer. Note that thousands of texts, including Shakespeare, are available online at www.gutenberg.org. a. Write a program that reads several lines of text from the keyboard and prints a table indicating the number of occurrences of each letter of the alphabet in the text. For example, the phrase To be, or not to be: that is the question: contains one "a," two "b's," no "c's," etc. b. Write a program that reads several lines of text and prints a table indicating the number of one-letter words, two-letter words, threeletter words, etc., appearing in the text. For example, the phrase Whether 'tis nobler in the mind to suffer contains the following word lengths and occurrences:c. Write a program that reads several lines of text and prints a table indicating the number of occurrences of each different word in the text. The first version of your program should include the words in the table in the same order in which they appear in the text. For example, the lines To be, or not to be: that is the question: Whether 'tis nobler in the mind to suffer contain the words "to" three times, the word "be" two times, the word "or" once, etc. A more interesting (and useful) printout should then be attempted in which the words are sorted alphabetically.

Short Answer

Expert verified
Implement separate programs to count letter occurrences, word lengths, and unique words, then sort words alphabetically.

Step by step solution

01

Count Occurrences of Each Letter

Write a program that reads text input from a user, then iterate character by character to count each letter's occurrences. Use a dictionary where keys are alphabet letters and values are the counts of occurrences for each letter. Loop through each character, and if it's a letter, increment its count in the dictionary. Finally, display the count for each letter.
02

Count Word Lengths

Create a program to read the input text and split it into words. For each word, measure its length in letters. Use a dictionary where keys represent word lengths (e.g., 'one letter', 'two letters') and values are the counts of such words. Increment the count in the dictionary for each word's length. Print the result, showing how many words there are for each length.
03

Count Unique Words

Develop a program that processes the text input to extract each word. Use a dictionary where the keys are unique words and the values are their occurrence counts. Split the text by spaces and punctuation, convert each word to lowercase to ensure case insensitivity, and update the dictionary accordingly. List the words based on their appearance order, not alphabetically, in your initial printout.
04

Sort Words Alphabetically

Take the dictionary from Step 3 and sort the words alphabetically. Use Python's `sorted()` function to sort the dictionary keys (words) and then print the words along with their occurrence counts in sorted order. This will allow users to see frequency counts while easily finding words in the sorted table.

Unlock Step-by-Step Solutions & Ace Your Exams!

  • Full Textbook Solutions

    Get detailed explanations and key concepts

  • Unlimited Al creation

    Al flashcards, explanations, exams and more...

  • Ads-free access

    To over 500 millions flashcards

  • Money-back guarantee

    We refund you if you fail your exam.

Over 30 million students worldwide already upgrade their learning with 91Ó°ÊÓ!

Key Concepts

These are the key concepts you need to understand to accurately answer the question.

String Manipulation
String manipulation refers to the process of changing, analyzing, or inspecting strings – sequences of characters. In text analysis programming, we often deal with tasks like transforming strings to lowercase to ensure uniformity, removing unwanted characters such as punctuation, or splitting strings into smaller parts, such as words or letters. These operations are foundational for analyzing textual data because they prepare the data for more complex processing tasks.

Common string manipulation techniques include:
  • **Changing Cases**: Conversion of text to upper or lower case to ensure consistency during comparison.
  • **Trimming**: Removal of white spaces from the beginning and end, which can help avoid errors in text analysis.
  • **Splitting**: Division of a string into parts using delimiters like spaces or commas, which is especially useful for counting words or evaluating text structure.
  • **Replacement**: Substituting certain text segments with others, such as removing punctuation or correcting spelling errors.
Using these techniques ensures accurate processing of text data, essential for tasks like letter frequency analysis or word counting.
Letter Frequency Analysis
Letter Frequency Analysis involves counting how often each letter appears within a given text, which can provide critical insights into the text's characteristics. For example, certain authors might have unique linguistic signatures that can be identified through this type of analysis.

The process usually follows these steps:
  • **Initialization**: Create a dictionary with letters of the alphabet as keys and initialized counts as values.
  • **Iteration**: Loop through each character in the text. If the character is a letter, increase its count in the dictionary.
  • **Normalization**: Convert all text to the same case (usually lowercase) to ensure that the frequency count is case-insensitive.
  • **Result Display**: Print the frequency of each letter, which can be used for further statistical or pattern analysis.
This type of analysis is foundational in cryptography and historical research, aiding in tasks like deciphering codes or investigating authorship questions.
Word Count Programming
Word count programming automates the process of counting words in text, which is a frequently required data point in many text analysis scenarios. By counting the occurrences of word lengths, we can better understand the complexity of the language used.

The general process involves:
  • **Reading and Splitting**: Start by reading the input text and splitting it into a list of words.
  • **Length Counting**: For each word, calculate the number of letters it contains. Use a dictionary where keys represent the word lengths, and their corresponding values indicate how many times those lengths occur.
  • **Outputting Results**: Display the word length counts, which provides insight into the writing style, such as whether an author favors longer, more complex words or shorter, simpler ones.
Word count programming is used in numerous applications, from simple content length verification in writing software to complex linguistic research.
Unique Word Counting
Unique Word Counting focuses on identifying and tallying distinct words in text, which helps examine text complexity and diversity. This form of analysis considers both the variety of words and their frequency to offer insights into linguistic richness.

Here's a typical approach to counting unique words:
  • **Data Cleansing**: Initially, process the text to remove punctuation and convert each word to lowercase, ensuring uniformity.
  • **Extraction and Counting**: Split the text into words and, as you iterate through them, add each unique word to a dictionary with its occurrence count.
  • **Initial and Sorted Output**: First, print words in the order they appear to offer an unaltered overview of word usage. Additionally, sort these words alphabetically to provide an organized and easy-to-reference list of word frequencies.
Unique word counting is essential in both computational linguistics and text analytics, facilitating applications like keyword extraction, author profiling, and sentiment analysis.

One App. One Place for Learning.

All the tools & learning materials you need for study success - in one app.

Get started for free

Most popular questions from this chapter

(Check Protection) Computers are frequently employed in check-writing systems such as payroll and accounts-payable applications. Many strange stories circulate regarding weekly paychecks being printed (by mistake) for amounts in excess of \(1 million. Weird amounts are printed by computerized check-writing systems, because of human error or machine failure. Systems designers build controls into their systems to prevent such erroneous checks from being issued. Another serious problem is the intentional alteration of a check amount by someone who intends to cash a check fraudulently. To prevent a dollar amount from being altered, most computerized check-writing systems employ a technique called check protection. Checks designed for imprinting by computer contain a fixed number of spaces in which the computer may print an amount. Suppose that a paycheck contains eight blank spaces in which the computer is supposed to print the amount of a weekly paycheck. If the amount is large, then all eight of those spaces will be filled, for example, 12345678 (position numbers) On the other hand, if the amount is less than \)1000, then several of the spaces would ordinarily be left blank. For example, 99.87 \-------- 12345678 contains three blank spaces. If a check is printed with blank spaces, it is easier for someone to alter the amount of the check. To prevent a check from being altered, many check-writing systems insert leading asterisks to protect the amount as follows: ***99.87 \-------- 12345678 Write a program that inputs a dollar amount to be printed on a check and then prints the amount in check-protected format with leading asterisks if necessary. Assume that nine spaces are available for printing an amount.

(Printing Dates in Various Formats) Dates are commonly printed in several different formats in business correspondence. Two of the more common formats are 07/21/1955 July 21, 1955 Write a program that reads a date in the first format and prints that date in the second format

Write a program that encodes English language phrases into pig Latin. Pig Latin is a form of coded language often used for amusement. Many variations exist in the methods used to form pig Latin phrases. For simplicity, use the following algorithm: To form a pig-Latin phrase from an English-language phrase, tokenize the phrase into words with function strtok. To translate each English word into a pig-Latin word, place the first letter of the English word at the end of the English word and add the letters ay." Thus, the word "jump" becomes "umpjay," the word "the" becomes "hetay" and the word "computer" becomes "omputercay." Blanks between words remain as blanks. Assume that the English phrase consists of words separated by blanks, there are no punctuation marks and all words have two or more letters. Function printLatinword should display each word. [Hint: Each time a token is found in a call to strtok, pass the token pointer to function printLatinword and print the pig-Latin word.]

Perform the task specified by each of the following statements: a. Write the function header for a function called exchange that takes two pointers to double-precision, floating-point numbers x and y as parameters and does not return a value. b. Write the function prototype for the function in part (a). c. Write the function header for a function called evaluate that returns an integer and that takes as parameters integer x and a pointer to function poly. Function poly takes an integer parameter and returns an integer. d. Write the function prototype for the function in part (c). e. Write two statements that each initialize character array vowel with the string of vowels, "AEIOU".

The arrays should be filled as follows: The article array should contain the articles "the", "a", "one", "some" and "any"; the noun array should contain the nouns "boy", "girl", "dog", "town" and "car"; the verb array should contain the verbs "drove", "jumped", "ran", "walked" and "skipped"; the preposition array should contain the prepositions "to", "from", "over", "under" and "on".

See all solutions

Recommended explanations on Computer Science Textbooks

View all explanations

What do you think about this solution?

We value your feedback to improve our textbook solutions.

Study anywhere. Anytime. Across all devices.