This "Word Count" Program will read from a text.txt
file and display every unique word in the text, how many times it appears and what line(s) it appears on, and I'm going to show you how it's done using only C++.
/* THIS PROGRAM LISTS OUT THE INDIVIDUAL WORDS IN THE WORDS.TXT FILE
* AND STATES HOW MANY TIME THEY OCCUR AND ALSO WHAT LINE THEY APPEAR ON
*/
#include <iostream>
#include <fstream>
#include <sstream>
#include <map>
#include <set>
#include <string>
#include <iomanip>
#include <vector>
//DISPLAYS THE LINES WHERE EACH WORD APPEARS ON
void display_line(std::set<int> a){
std::cout << "[ ";
for(auto b: a)
std::cout << b << " ";
std::cout << "]\n";
}
//DISPLAYS THE WORD, HOW MANY TIMES IT APPEARS AND THE LINES IT APPEARS ON
void display_words(const std::map<std::string, int> &words, const std::map<std::string, std::set<int>> &occur) {
std::cout << std::setw(18) << std::left << "\nWord"
<< std::setw(14) << std::right << "Count"<< std::setw(26)<< "Occurrences(lines)" << std::endl;
std::cout << std::setw(96) << std::setfill('=') << "" << std::setfill(' ') << std::endl;
for (auto pair: words){
std::cout << std::setw(18) << std::left << pair.first
<< std::setw(13) << std::right << pair.second << std::setw(10);
display_line(occur.at(pair.first));
}
}
// This function removes periods, commas, semicolons and colon in
// a string and returns the clean version
std::string clean_string(const std::string &s) {
std::string result;
for (char c: s) {
if (c == '.' || c == ',' || c == ';' || c == ':')
continue;
else
result += c;
}
return result;
}
// word_count process the file and builds a map of words, the
// number of times they occur in the file and the lines they appear on
void word_count() {
std::map<std::string, int> words;
std::string line;
std::string word;
int num{0};
std::map<std::string, std::set<int>> occur;
std::ifstream in_file {"words.txt"};
if (in_file) {
while(!in_file.eof()){
std::getline(in_file, line);
num += 1;
std::stringstream ss{line};
while(!ss.eof()){
std::vector<int> temp;
ss >> word;
word = clean_string(word);
if(words.count(word)){ //CHECKS TO SEE IF THE WORD IS ALREADY IN THE MAP
words[word] += 1;
occur.at(word).insert(num);
}else{
words.insert(std::make_pair(word, 1));
occur.insert(std::make_pair(word, std::set<int>{num}));
}
}
}
in_file.close();
display_words(words, occur);
} else {
std::cerr << "Error opening input file" << std::endl;
}
}
int main() {
word_count();
return 0;
}
Dorothy lived in the midst of the great Kansas prairies, with Uncle
Henry, who was a farmer, and Aunt Em, who was the farmer's wife. Their
house was small, for the lumber to build it had to be carried by wagon
many miles. There were four walls, a floor and a roof, which made one
room; and this room contained a rusty looking cookstove, a cupboard for
the dishes, a table, three or four chairs, and the beds. Uncle Henry
and Aunt Em had a big bed in one corner, and Dorothy a little bed in
another corner. There was no garret at all, and no cellar except a
small hole dug in the ground, called a cyclone cellar, where the family
could go in case one of those great whirlwinds arose, mighty enough to
crush any building in its path. It was reached by a trap door in the
middle of the floor, from which a ladder led down into the small, dark
hole.
When Dorothy stood in the doorway and looked around, she could see
nothing but the great gray prairie on every side. Not a tree nor a
house broke the broad sweep of flat country that reached to the edge of
the sky in all directions. The sun had baked the plowed land into a
gray mass, with little cracks running through it. Even the grass was
not green, for the sun had burned the tops of the long blades until
they were the same gray color to be seen everywhere. Once the house
had been painted, but the sun blistered the paint and the rains washed
it away, and now the house was as dull and gray as everything else.
When Aunt Em came there to live she was a young, pretty wife. The sun
and wind had changed her, too. They had taken the sparkle from her
eyes and left them a sober gray; they had taken the red from her cheeks
and lips, and they were gray also. She was thin and gaunt, and never
smiled now. When Dorothy, who was an orphan, first came to her, Aunt
Em had been so startled by the child's laughter that she would scream
and press her hand upon her heart whenever Dorothy's merry voice
reached her ears; and she still looked at the little girl with wonder
that she could find anything to laugh at.
Uncle Henry never laughed. He worked hard from morning till night and
did not know what joy was. He was gray also, from his long beard to
his rough boots, and he looked stern and solemn, and rarely spoke.
It was Toto that made Dorothy laugh, and saved her from growing as gray
as her other surroundings. Toto was not gray; he was a little black
dog, with long silky hair and small black eyes that twinkled merrily on
either side of his funny, wee nose. Toto played all day long, and
Dorothy played with him, and loved him dearly.
Today, however, they were not playing. Uncle Henry sat upon the
doorstep and looked anxiously at the sky, which was even grayer than
usual. Dorothy stood in the door with Toto in her arms, and looked at
the sky too. Aunt Em was washing the dishes.
From the far north they heard a low wail of the wind, and Uncle Henry
and Dorothy could see where the long grass bowed in waves before the
coming storm. There now came a sharp whistling in the air from the
south, and as they turned their eyes that way they saw ripples in the
grass coming from that direction also.
Alright now i will walk through what's happening in the program, First I created a display_line
function(which is responsible for the outputting what lines the words appear on) which loops through a set of integers and outputs each integer.
The display_words
function(which is the main function responsible for output) which expects a map of string, int(word and the number of times they appear) pairs and a map of string, set of integers(words and the lines they appear on) pairs, i have formatted my output using components from the <iomanip>
header file and then loop through the words
map which contains the individual words paired with their number of occurrences in the pair, and output the word, occurrences then call the display_line
function and pass in the set contained in the occur
map which outputs the lines which the words appear on.
clean_string
function removes non-alphabet letters from the words.
Now i will explain the main program which is contained in the word_count
:
Important Variables Used:
map
object 'words' contains the individual words and the number of times they appear, 'occur' contains the individual words and the lines they appear on.string
object 'line' takes each line contained in the text file, 'word' contains each word in the line.int
'num' is used for what line the words appear on.ifstream
object 'in_file' which is initialized to the text file located in the same directory as the source code of this program
I create two string
objects line, word
, two map
objects words, occur
, an int and an ifstream
object which is initialized to the text file which a part of the "Wizards of Oz" story, then the main code starts. The if(in_file)
statement basically means "if the initialization of the ifstream object was successful (i.e if the words.txt
file was found) then run the subsequent codes that's in the brackets, if initialization was unsuccessful, i output "Error opening input file". While i'm not at the end of the text file (while(!in_file.eof())
), i want to pass in each line of the text into the "line" variable which is a string(std::getline(in_file, line);
), increment num
by 1 and then create a stringstream
object "ss" and initialize it to line
, while i'm not at the end of the line that ss
holds(while(!ss.eof());
), i want to pass in a word from the line contained in ss
and clean the word using the clean_string
function then check to see if the word is already contained in the map then check to see if the word is contained in words
, if true then i increment the number of times it appears and insert
in the line which it appears on, if false then i create a pair using make_pair
with the value of 'word' and '1' (i.e this is the first time the word is appearing) and insert it into 'words', i make another pair with the value of 'word' and 'num'(the current line) and insert it into 'occur'. After this is done for all the lines in the text, i close the file and call the display_words
function with 'words' and 'occur' as arguments, all i do in main()
is call the word_count
function which does all the work, when we run the code, we get an output that looks like:
NOTE : How the text file is found can be dependent on the IDE you're using, I use codelite and the text file is in the same directory as the source code of the program.
Also you can check my github repo which contains this entire program.