| <!-- | 
 | Copyright 2011 The Go Authors. All rights reserved. | 
 | Use of this source code is governed by a BSD-style | 
 | license that can be found in the LICENSE file. | 
 | --> | 
 |  | 
 | <codewalk title="Generating arbitrary text: a Markov chain algorithm"> | 
 |  | 
 | <step title="Introduction" src="doc/codewalk/markov.go:/Generating/,/line\./"> | 
 | 	This codewalk describes a program that generates random text using | 
 | 	a Markov chain algorithm. The package comment describes the algorithm | 
 | 	and the operation of the program. Please read it before continuing. | 
 | </step> | 
 |  | 
 | <step title="Modeling Markov chains" src="doc/codewalk/markov.go:/	chain/"> | 
 | 	A chain consists of a prefix and a suffix. Each prefix is a set | 
 | 	number of words, while a suffix is a single word. | 
 | 	A prefix can have an arbitrary number of suffixes. | 
 | 	To model this data, we use a <code>map[string][]string</code>. | 
 | 	Each map key is a prefix (a <code>string</code>) and its values are | 
 | 	lists of suffixes (a slice of strings, <code>[]string</code>). | 
 | 	<br/><br/> | 
 | 	Here is the example table from the package comment | 
 | 	as modeled by this data structure: | 
 | 	<pre> | 
 | map[string][]string{ | 
 | 	" ":          {"I"}, | 
 | 	" I":         {"am"}, | 
 | 	"I am":       {"a", "not"}, | 
 | 	"a free":     {"man!"}, | 
 | 	"am a":       {"free"}, | 
 | 	"am not":     {"a"}, | 
 | 	"a number!":  {"I"}, | 
 | 	"number! I":  {"am"}, | 
 | 	"not a":      {"number!"}, | 
 | }</pre> | 
 | 	While each prefix consists of multiple words, we | 
 | 	store prefixes in the map as a single <code>string</code>. | 
 | 	It would seem more natural to store the prefix as a | 
 | 	<code>[]string</code>, but we can't do this with a map because the | 
 | 	key type of a map must implement equality (and slices do not). | 
 | 	<br/><br/> | 
 | 	Therefore, in most of our code we will model prefixes as a | 
 | 	<code>[]string</code> and join the strings together with a space | 
 | 	to generate the map key: | 
 | 	<pre> | 
 | Prefix               Map key | 
 |  | 
 | []string{"", ""}     " " | 
 | []string{"", "I"}    " I" | 
 | []string{"I", "am"}  "I am" | 
 | </pre> | 
 | </step> | 
 |  | 
 | <step title="The Chain struct" src="doc/codewalk/markov.go:/type Chain/,/}/"> | 
 | 	The complete state of the chain table consists of the table itself and | 
 | 	the word length of the prefixes. The <code>Chain</code> struct stores | 
 | 	this data. | 
 | </step> | 
 |  | 
 | <step title="The NewChain constructor function" src="doc/codewalk/markov.go:/func New/,/\n}/"> | 
 | 	The <code>Chain</code> struct has two unexported fields (those that | 
 | 	do not begin with an upper case character), and so we write a | 
 | 	<code>NewChain</code> constructor function that initializes the | 
 | 	<code>chain</code> map with <code>make</code> and sets the | 
 | 	<code>prefixLen</code> field. | 
 | 	<br/><br/> | 
 | 	This is constructor function is not strictly necessary as this entire | 
 | 	program is within a single package (<code>main</code>) and therefore | 
 | 	there is little practical difference between exported and unexported | 
 | 	fields. We could just as easily write out the contents of this function | 
 | 	when we want to construct a new Chain. | 
 | 	But using these unexported fields is good practice; it clearly denotes | 
 | 	that only methods of Chain and its constructor function should access | 
 | 	those fields. Also, structuring <code>Chain</code> like this means we | 
 | 	could easily move it into its own package at some later date. | 
 | </step> | 
 |  | 
 | <step title="The Prefix type" src="doc/codewalk/markov.go:/type Prefix/"> | 
 | 	Since we'll be working with prefixes often, we define a | 
 | 	<code>Prefix</code> type with the concrete type <code>[]string</code>. | 
 | 	Defining a named type clearly allows us to be explicit when we are | 
 | 	working with a prefix instead of just a <code>[]string</code>. | 
 | 	Also, in Go we can define methods on any named type (not just structs), | 
 | 	so we can add methods that operate on <code>Prefix</code> if we need to. | 
 | </step> | 
 |  | 
 | <step title="The String method" src="doc/codewalk/markov.go:/func[^\n]+String/,/}/"> | 
 | 	The first method we define on <code>Prefix</code> is | 
 | 	<code>String</code>. It returns a <code>string</code> representation | 
 | 	of a <code>Prefix</code> by joining the slice elements together with | 
 | 	spaces. We will use this method to generate keys when working with | 
 | 	the chain map. | 
 | </step> | 
 |  | 
 | <step title="Building the chain" src="doc/codewalk/markov.go:/func[^\n]+Build/,/\n}/"> | 
 | 	The <code>Build</code> method reads text from an <code>io.Reader</code> | 
 | 	and parses it into prefixes and suffixes that are stored in the | 
 | 	<code>Chain</code>. | 
 | 	<br/><br/> | 
 | 	The <code><a href="/pkg/io/#Reader">io.Reader</a></code> is an | 
 | 	interface type that is widely used by the standard library and | 
 | 	other Go code. Our code uses the | 
 | 	<code><a href="/pkg/fmt/#Fscan">fmt.Fscan</a></code> function, which | 
 | 	reads space-separated values from an <code>io.Reader</code>. | 
 | 	<br/><br/> | 
 | 	The <code>Build</code> method returns once the <code>Reader</code>'s | 
 | 	<code>Read</code> method returns <code>io.EOF</code> (end of file) | 
 | 	or some other read error occurs. | 
 | </step> | 
 |  | 
 | <step title="Buffering the input" src="doc/codewalk/markov.go:/bufio\.NewReader/"> | 
 | 	This function does many small reads, which can be inefficient for some | 
 | 	<code>Readers</code>. For efficiency we wrap the provided | 
 | 	<code>io.Reader</code> with | 
 | 	<code><a href="/pkg/bufio/">bufio.NewReader</a></code> to create a | 
 | 	new <code>io.Reader</code> that provides buffering. | 
 | </step> | 
 |  | 
 | <step title="The Prefix variable" src="doc/codewalk/markov.go:/make\(Prefix/"> | 
 | 	At the top of the function we make a <code>Prefix</code> slice | 
 | 	<code>p</code> using the <code>Chain</code>'s <code>prefixLen</code> | 
 | 	field as its length. | 
 | 	We'll use this variable to hold the current prefix and mutate it with | 
 | 	each new word we encounter. | 
 | </step> | 
 |  | 
 | <step title="Scanning words" src="doc/codewalk/markov.go:/var s string/,/\n		}/"> | 
 | 	In our loop we read words from the <code>Reader</code> into a | 
 | 	<code>string</code> variable <code>s</code> using | 
 | 	<code>fmt.Fscan</code>. Since <code>Fscan</code> uses space to | 
 | 	separate each input value, each call will yield just one word | 
 | 	(including punctuation), which is exactly what we need. | 
 | 	<br/><br/> | 
 | 	<code>Fscan</code> returns an error if it encounters a read error | 
 | 	(<code>io.EOF</code>, for example) or if it can't scan the requested | 
 | 	value (in our case, a single string). In either case we just want to | 
 | 	stop scanning, so we <code>break</code> out of the loop. | 
 | </step> | 
 |  | 
 | <step title="Adding a prefix and suffix to the chain" src="doc/codewalk/markov.go:/	key/,/key\], s\)"> | 
 | 	The word stored in <code>s</code> is a new suffix. We add the new | 
 | 	prefix/suffix combination to the <code>chain</code> map by computing | 
 | 	the map key with <code>p.String</code> and appending the suffix | 
 | 	to the slice stored under that key. | 
 | 	<br/><br/> | 
 | 	The built-in <code>append</code> function appends elements to a slice | 
 | 	and allocates new storage when necessary. When the provided slice is | 
 | 	<code>nil</code>, <code>append</code> allocates a new slice. | 
 | 	This behavior conveniently ties in with the semantics of our map: | 
 | 	retrieving an unset key returns the zero value of the value type and | 
 | 	the zero value of <code>[]string</code> is <code>nil</code>. | 
 | 	When our program encounters a new prefix (yielding a <code>nil</code> | 
 | 	value in the map) <code>append</code> will allocate a new slice. | 
 | 	<br/><br/> | 
 | 	For more information about the <code>append</code> function and slices | 
 | 	in general see the | 
 | 	<a href="/doc/articles/slices_usage_and_internals.html">Slices: usage and internals</a> article. | 
 | </step> | 
 |  | 
 | <step title="Pushing the suffix onto the prefix" src="doc/codewalk/markov.go:/p\.Shift/"> | 
 | 	Before reading the next word our algorithm requires us to drop the | 
 | 	first word from the prefix and push the current suffix onto the prefix. | 
 | 	<br/><br/> | 
 | 	When in this state | 
 | 	<pre> | 
 | p == Prefix{"I", "am"} | 
 | s == "not" </pre> | 
 | 	the new value for <code>p</code> would be | 
 | 	<pre> | 
 | p == Prefix{"am", "not"}</pre> | 
 | 	This operation is also required during text generation so we put | 
 | 	the code to perform this mutation of the slice inside a method on | 
 | 	<code>Prefix</code> named <code>Shift</code>. | 
 | </step> | 
 |  | 
 | <step title="The Shift method" src="doc/codewalk/markov.go:/func[^\n]+Shift/,/\n}/"> | 
 | 	The <code>Shift</code> method uses the built-in <code>copy</code> | 
 | 	function to copy the last len(p)-1 elements of <code>p</code> to | 
 | 	the start of the slice, effectively moving the elements | 
 | 	one index to the left (if you consider zero as the leftmost index). | 
 | 	<pre> | 
 | p := Prefix{"I", "am"} | 
 | copy(p, p[1:]) | 
 | // p == Prefix{"am", "am"}</pre> | 
 | 	We then assign the provided <code>word</code> to the last index | 
 | 	of the slice: | 
 | 	<pre> | 
 | // suffix == "not" | 
 | p[len(p)-1] = suffix | 
 | // p == Prefix{"am", "not"}</pre> | 
 | </step> | 
 |  | 
 | <step title="Generating text" src="doc/codewalk/markov.go:/func[^\n]+Generate/,/\n}/"> | 
 | 	The <code>Generate</code> method is similar to <code>Build</code> | 
 | 	except that instead of reading words from a <code>Reader</code> | 
 | 	and storing them in a map, it reads words from the map and | 
 | 	appends them to a slice (<code>words</code>). | 
 | 	<br/><br/> | 
 | 	<code>Generate</code> uses a conditional for loop to generate | 
 | 	up to <code>n</code> words. | 
 | </step> | 
 |  | 
 | <step title="Getting potential suffixes" src="doc/codewalk/markov.go:/choices/,/}\n/"> | 
 | 	At each iteration of the loop we retrieve a list of potential suffixes | 
 | 	for the current prefix. We access the <code>chain</code> map at key | 
 | 	<code>p.String()</code> and assign its contents to <code>choices</code>. | 
 | 	<br/><br/> | 
 | 	If <code>len(choices)</code> is zero we break out of the loop as there | 
 | 	are no potential suffixes for that prefix. | 
 | 	This test also works if the key isn't present in the map at all: | 
 | 	in that case, <code>choices</code> will be <code>nil</code> and the | 
 | 	length of a <code>nil</code> slice is zero. | 
 | </step> | 
 |  | 
 | <step title="Choosing a suffix at random" src="doc/codewalk/markov.go:/next := choices/,/Shift/"> | 
 | 	To choose a suffix we use the | 
 | 	<code><a href="/pkg/math/rand/#Intn">rand.Intn</a></code> function. | 
 | 	It returns a random integer up to (but not including) the provided | 
 | 	value. Passing in <code>len(choices)</code> gives us a random index | 
 | 	into the full length of the list. | 
 | 	<br/><br/> | 
 | 	We use that index to pick our new suffix, assign it to | 
 | 	<code>next</code> and append it to the <code>words</code> slice. | 
 | 	<br/><br/> | 
 | 	Next, we <code>Shift</code> the new suffix onto the prefix just as | 
 | 	we did in the <code>Build</code> method. | 
 | </step> | 
 |  | 
 | <step title="Returning the generated text" src="doc/codewalk/markov.go:/Join\(words/"> | 
 | 	Before returning the generated text as a string, we use the | 
 | 	<code>strings.Join</code> function to join the elements of | 
 | 	the <code>words</code> slice together, separated by spaces. | 
 | </step> | 
 |  | 
 | <step title="Command-line flags" src="doc/codewalk/markov.go:/Register command-line flags/,/prefixLen/"> | 
 | 	To make it easy to tweak the prefix and generated text lengths we | 
 | 	use the <code><a href="/pkg/flag/">flag</a></code> package to parse | 
 | 	command-line flags. | 
 | 	<br/><br/> | 
 | 	These calls to <code>flag.Int</code> register new flags with the | 
 | 	<code>flag</code> package. The arguments to <code>Int</code> are the | 
 | 	flag name, its default value, and a description. The <code>Int</code> | 
 | 	function returns a pointer to an integer that will contain the | 
 | 	user-supplied value (or the default value if the flag was omitted on | 
 | 	the command-line). | 
 | </step> | 
 |  | 
 | <step title="Program set up" src="doc/codewalk/markov.go:/flag.Parse/,/rand.Seed/"> | 
 | 	The <code>main</code> function begins by parsing the command-line | 
 | 	flags with <code>flag.Parse</code> and seeding the <code>rand</code> | 
 | 	package's random number generator with the current time. | 
 | 	<br/><br/> | 
 | 	If the command-line flags provided by the user are invalid the | 
 | 	<code>flag.Parse</code> function will print an informative usage | 
 | 	message and terminate the program. | 
 | </step> | 
 |  | 
 | <step title="Creating and building a new Chain" src="doc/codewalk/markov.go:/c := NewChain/,/c\.Build/"> | 
 | 	To create the new <code>Chain</code> we call <code>NewChain</code> | 
 | 	with the value of the <code>prefix</code> flag. | 
 | 	<br/><br/> | 
 | 	To build the chain we call <code>Build</code> with | 
 | 	<code>os.Stdin</code> (which implements <code>io.Reader</code>) so | 
 | 	that it will read its input from standard input. | 
 | </step> | 
 |  | 
 | <step title="Generating and printing text" src="doc/codewalk/markov.go:/c\.Generate/,/fmt.Println/"> | 
 | 	Finally, to generate text we call <code>Generate</code> with | 
 | 	the value of the <code>words</code> flag and assigning the result | 
 | 	to the variable <code>text</code>. | 
 | 	<br/><br/> | 
 | 	Then we call <code>fmt.Println</code> to write the text to standard | 
 | 	output, followed by a carriage return. | 
 | </step> | 
 |  | 
 | <step title="Using this program" src="doc/codewalk/markov.go"> | 
 | 	To use this program, first build it with the | 
 | 	<a href="/cmd/go/">go</a> command: | 
 | 	<pre> | 
 | $ go build markov.go</pre> | 
 | 	And then execute it while piping in some input text: | 
 | 	<pre> | 
 | $ echo "a man a plan a canal panama" \ | 
 | 	| ./markov -prefix=1 | 
 | a plan a man a plan a canal panama</pre> | 
 | 	Here's a transcript of generating some text using the Go distribution's | 
 | 	README file as source material: | 
 | 	<pre> | 
 | $ ./markov -words=10 < $GOROOT/README | 
 | This is the source code repository for the Go source | 
 | $ ./markov -prefix=1 -words=10 < $GOROOT/README | 
 | This is the go directory (the one containing this README). | 
 | $ ./markov -prefix=1 -words=10 < $GOROOT/README | 
 | This is the variable if you have just untarred a</pre> | 
 | </step> | 
 |  | 
 | <step title="An exercise for the reader" src="doc/codewalk/markov.go"> | 
 | 	The <code>Generate</code> function does a lot of allocations when it | 
 | 	builds the <code>words</code> slice. As an exercise, modify it to | 
 | 	take an <code>io.Writer</code> to which it incrementally writes the | 
 | 	generated text with <code>Fprint</code>. | 
 | 	Aside from being more efficient this makes <code>Generate</code> | 
 | 	more symmetrical to <code>Build</code>. | 
 | </step> | 
 |  | 
 | </codewalk> |