Some teachers have posted videos on using StringNet

We have have been impressed by some videos by teachers introducing ways they use StringNet. Here is one created by Nail McMahon as a (skyped? and interactive) conference presentation. He illustrates and demos creative ways of using SN with students that we’d certainly never have thought of ourselves.

In the future we’ll post other sources that introduce SN and ways of using it. If you know of any other sources that do this, feel free to post them as reply here.



Posted in Uncategorized | Leave a comment

New function: Find similar words

A few months ago we changed the buttons beside our query box. Where there was one before, which said “Search,” now there are two: “Find patterns” and “Find similar words”. The first one, “Find patterns,” does the same thing the previous “search” button did before. The second button is a new beta trial function “Find similar words.” Let me introduce this new one.

To try it, input just one word, a content word (noun, verb, adjective for now) in the query box and click “Find similar words.” It is something like a thesaurus search, showing words with meanings similar to the query word. But a more accurate way to say it is that we show words that have similar ‘behaviors’ to the query word. The words we list as similar to the query word are those that occupy the same paradigmatic slot as the query word in multi-word patterns (that is, in our hybrid n-grams). We take our inspiration for this from Zellig Harris (1954; 1968), specifically, his notion that words derive their meaning(s) from the contexts of their use and so words with similar distributions (or sets of contexts) will tend to have similar meanings. The recent computational literature that pursues the implications of Harris is referenced in the paper we link to below. When the list of similar words appears after clicking “Find similar words,” it will include a column on the far right called: “shared patterns” with a number there for each of the ‘similar words’ listed. That number tells how many patterns the query word shares with that listed word (how many patterns where those two words are attested in the same paradigmatic slot in the pattern). Crucially, that number is also a link, an important link. Please read on.

Noteworthy but easily missed: What makes our ‘similar word’ function different from any thesaurus function that we know of is that we can show all of the contexts that the query word shares with any of the listed similar words.  For any of the listed similar words, to see those contexts it shares with the query word, click on the number listed in the “shared patterns” column. That yields a list of all the patterns those two words share. Most computational approaches to word similarity render a quantitative measure (a score) of similarity between two words. Our approach, however, is based on the hunch that for language learners, the key question concerning two words is not “How similar are they?” (answered by a score), but rather “How are they similar?” (which we answer by simply showing all the contexts they share). We have said this and elaborated on it some in the first two sections of this paper. We think the answer to the “How are they similar” question can come in the form of a ‘feel’ for the similarity that can result from encountering the words enough being used in similar ways.

For example, the top noun we list as similar to the noun ‘light’ is the noun ‘context’. Now  simply informing someone of this (or giving a similarity score as evidence) may not strike any chords. But encounters with both words in the same slot side by side would be a different matter: “in the [context/light] of the report,” (the report serving as a context that illuminates) or in any of the 110 other contexts they share.

This is a beta version, and there is plenty of noise in the results, but hopefully also plenty of goodies too, chances for directly encountering the shared contexts of words in order to get a feel for how they may resemble each other.

We trained this on nouns, so it may give weaker results for other parts of speech.

We presented our approach and initial results last November at the Joint Symposium on Semantic Processing (JSSP) 2013 (Trento, Italy). The published paper is here.



Posted in Uncategorized | Leave a comment

How spell: Using StringNet to fill in gaps

The data structure of StringNet (SN) makes it useful in ways we didn’t have in mind when we designed it. One of those ways is letting SN provide accurate fuller versions of expressions that we remember only partially or imperfectly. For example, submitting just two words of a longer multiword expression can yield the fleshed out expression with details filled in. So, a learner who recalls being corrected for saying, “Excuse me, how to spell?” but not the corrected version of it can submit a query of just the two words she feels sure of: ‘how spell’. The first pattern listed in the query results is ‘how do you spell [noun sg]’. Ah, there it is. And the [noun sg] complement in the pattern hints too that we don’t drop the complement when using the expression. Clicking on the example sentences of any of the patterns containing ‘how do you spell’ confirms the need for a complement.

We could stop there. The learner may have now discovered the need to use the inflected ‘how do you spell’ rather than the infinitive and subject-less ‘how to spell’ and to include a complement (how do you spell that?).

Or we could go on to explore some more. Sniff around the query results for whatever else we might dig up. Looking further down the same list of patterns yielded by the query of ‘how spell’, we find the original ‘error’ that led the learner here in the first place, ‘how to spell’. But if it’s an error, what’s it doing on the list of patterns attested in BNC? Because, come to think of it, the string ‘how to spell’ isn’t an error in and of itself. ‘How to spell’ can be used in English, but not conventionally in the context where the learner was corrected for it. Not, say, as a stand-alone interrogative. But switch the frame and there are plenty of acceptable contexts for how to spell–basically in complement positions: Could you tell me___; I wonder___; I don’t know___.  So it is less a matter of correct vs incorrect and more a matter of distinct distributions for the two expressions. The distribution of ‘how to spell’ is different from that of ‘how do you spell’.

We know this. But how is a learner to come to know it? And what is it specifically they need to come to know?

Here is where linking to example sentences can be worth the click. Examples of each of these two closely related patterns in context, compared side by side, can set into relief the different usage/distribution of ‘how do you spell’ and ‘how to spell’. Teachers can design discovery activities that guide precisely this sort of comparative inquiry. What SN provides here is the yielding up of patterns, the teasing apart of slight variations between and among closely related ones and listing them as separate patterns, and then the linking of these distinct but related patterns to the distinct sets of sentences that exemplify them distinctly. It is worth emphasizing that StringNet, by discovering and distinguishing the two patterns and listing them separately, also indirectly separates out the example sentences of the two patterns from each other. Linking to example sentences from ‘how do you spell [noun sg]’ will list sentences with exactly that pattern, and linking to example sentences from ‘how to spell’ renders a list of sentences with exactly that pattern. This in turn makes it possible to examine the two sets side by side and reckon the differences and similarities in the distribution of the two patterns, differences in the contexts of their use.

This all of course needs the deft touch of the seasoned teacher in contributing that all important dimension of language pedagogy: task design.

–D. Wible

Posted in Uncategorized | Leave a comment

How StringNet differs from a corpus and why

The most widely known type of resource for accessing language data for language analysis is corpora. And the most common way of accessing corpus content is through concordancing software. Their most common function is the keyword in context search (KWIC search).  StringNet isn’t a corpus. And here I will try to suggest the basic differences and some why’s for those differences.

StringNet isn’t a corpus. It’s is a massive archive of multiword patterns that have been statistically derived from a corpus (from the BNC). Further, StringNet massively cross-indexes these  patterns of English (2.2 billion of them) to each other to make it possible to easily navigate from one pattern to discover other related patterns (say, from ‘count yourself lucky’ to ‘consider yourself lucky’ to ‘consider yourself fortunate’ to ‘[verb] yourself [adj]‘ to many others). Here I try to clarify how this makes StringNet different from corpora and their concordancing softwares. I hope to show how StringNet relies on corpora but tries to help bridge gaps between what users might want to get from corpora on the one hand and what corpora and concordancing tools offer on the other.

Some basic distinctions influence what we can get out of corpora, yet these distinctions seem to fly under the radar. Some that we’ve tried to keep in mind in designing StringNet are: (1) token vs type; (2) syntagmatic vs paradigmatic patterns of word behavior; and (3) finding what I am looking for in a corpus vs discovering what I wouldn’t have thought to look for. We’ve published some work elaborating on these (see a reference below), but here’s a thumbnail of just one of them–-tokens vs types.

Corpora are collections of tokens, that is, of instances. Saussure’s ‘parole’. What puts off many learners using corpora is that KWIC searches yield tokens, and tokens are of little interest in and of themselves. Tokens are interesting to corpus users mainly as windows onto what they betoken; and what they betoken is types, that is, patterns–in this case, patterns of word use. And the path from tokens in concordance lines to patterns of recurrent word behaviors can be a long and winding one. StringNet has tried to distill the recurrent patterns of word use from British National Corpus. So rather than a list of tokens that a concordancer provides, a response to a StringNet query is a list of types, of the patterns in which the query word participates. Clicking on the ‘example’ icon beside one of those listed patterns, in turn, yields tokens of those patterns used in sentences.

By the way, click on everything you see in the StringNet query results. The clickability is intended to encourage exploration through the net that is StringNet.

Here’s a chapter where we say more on all this:

David Wible and Nai-Lung Tsao (2011).“Towards a New Generation of Corpus-derived Lexical Resources for Language Learning,”in Meunier F., De Cock S., Gilquin G. and Paquot M. (eds). A Taste for Corpora. Amsterdam: John Benjamins.

Posted in Uncategorized | Leave a comment

What the “expand” function can do

Clicking on “Expand”: One reason we call StringNet a net is that each pattern a word appears in is indexed to other related patterns, and these relations can be navigated and explored. From any pattern listed in a search result, one direction you can go is horizontally by clicking on “expand”. This shows patterns that are identical to the one you are looking at but one unit longer.

For example, a search of the word time yields the string from time to time as the first pattern in the search results. From this pattern, I could be curious how such an expression may be introduced, that is, what word(s) might precede this expression. I can get that by clicking “expand”. That lists patterns containing from time to time that are one unit longer. Specifically, among others it generates a list with:

[noun] from time to time

[verb] from time to time

By clicking on the [verb] slot  in [verb] from time to time I get a pop-up list of the verbs in BNC that introduce from time to time.  Specifically: vary/occur/used/…change/appear…from time to time.

Clicking “Parents”: By the way, clicking on “parents” rather than “expand” from the same pattern sends you upward (more abstract) rather than outward: So from the same pattern from time to time, clicking on “parents” a few times I work my way up to the more general pattern from [noun] to [noun]. Then, coming back down by clicking on “children” I get lots of specific sub-patterns of this proto-pattern. I get things like from year to year, from side to side, from head to toe/foot/tail, from left to right, and others.


Posted in Uncategorized | Leave a comment

Adjust search results using “Search options”

You can adjust a few of the features that determine how StringNet Navigator selects the patterns shown in query results. Click on “Search options” just next to the query box. It shows three search features that can be adjusted:

(1) Chunk length: The length of the chunks (i.e., patterns) shown in the search results. The default ranges “from 2 to 6“ grams in length. Changing that to, for example, “from 3 to 3″ for a query means the search results would include only patterns that are exactly 3-grams long (e.g., time after time; bide [pos pn] time; change over time; second time around; repeat [num] times…)

(2) Minimum frequency of chunks shown: The minimum number of tokens of a chunk (i.e.,pattern) attested in BNC. The default is frequency of 5. By changing this number upward, say to 20 or 50 or 250, a user is asking that patterns with fewer tokens than that (fewer than 2o or 50 or 250) be excluded from the query results.

(3) Hits per page: The number of patterns listed on a single page (default 30).

Feel free to play with these (especially the first two) to see how different settings might help bring you closer to the sorts of patterning you are interested in.


Posted in Uncategorized | Leave a comment

Fixed vs variable word forms in context

Some of the patterns that are listed in StringNet search results have bold-faced words. Here’s why.

StringNet searches yield a list of patterns (hybrid n-grams). Patterns are represented as strings of type labels, and there are roughly two kinds of type labels: (1) words or (2) parts of speech (POSs). Here’s a pattern with some of each type that show up in a search for the target word ‘go’:

A pattern with words only:              go to sleep

A pattern with words and POSs:   go so far as [to vb]

Notice, though, that words are distinguished further: bold vs not bold. We want to explain that a bit. A bold word represents a slot where there is attested variation in the form of that word. So in go to sleep, bold go means not just go (or maybe even not go at all) but goes/went. The exact variation attested can be seen by clicking on the bold word.

The bold vs non-bold distinction comes into relief in another pattern that shows up for the search term ‘go’:

                                                    what the hell be going on

There are 60 tokens of this pattern in BNC. The bold be indicates that variations of the lexeme be occur here in these 60 instances (and not necessarily the exact form ‘be’ itself). Be here is just the place holder for this lexeme in its various word forms (is, ‘s, was…). Again, the exact distribution of the attested forms of be shows up if you click on the be. 

In contrast, going in that same pattern what the hell be going on is not in bold. The non-bold going indicates that going is the exact word form that appears in all 60 tokens of this pattern in BNC

(Incidentally, there’s a bit of grammar reflected in the bold vs non-bold distinction in be going in this case of what the hell be going on: The fixed -ing form of go is due to the preceding be (the auxiliary for progressive aspect), and the be shows variation because, as the leftmost in the verb string, it varies with tense and agreement demands.)



Posted in Uncategorized | Leave a comment

StringNet tip: Everything is ‘clickable’

In May of this year we released StringNet 3.0 (SN 3.0), our latest version. To use it, click here. The new version includes a number of features not in earlier versions, and not all of them are as intuitive to use as we’d like. So we’ll be posting here information, suggestions, tips about those features and about ways of using SN 3.0 for a variety of purposes.

We’ll announce new postings by email to those on our mailing list. If you aren’t on our mailing list and want to be, go to the bottom of this page.

Our first tip: “CLICK ON ANYTHING”

SN takes a word or words as a query and returns a list of patterns (what we call ‘hybrid n-grams’) in which the submitted word is conventionally used or the submitted words are conventionally used together.

An important new feature of SN 3.o is that we’ve tried to make almost everything in the SN search results clickable. This clickability gets you from the ‘flat’ list of patterns given in the search results into StringNet as a net. It’s our way of linking from patterns or from any word or slot in a pattern to other related patterns and words and slots. Again, this is what makes StringNet a navigable ‘net’. Here we show one sort of clicking. (We’ll describe others in future posts.)

Example: Click on any word in a pattern. The screen shot below is taken from results for the query word “factor.” The red box marks one of those patterns: “a complicating factor.” Notice that the word “complicating” is highlighted in green. This is what happens when a user mouses over that word (or any word in any pattern). It turns green (and the cursor turns into the ‘pointing finger’ icon) as our invitation to click on that word to see the competition for that slot, that is, to see what other words could appear in that exact same slot (as attested in BNC). Clicking on the highlighted word triggers the pop-up shown below. In this case, the pop-up lists “Words that can replace ‘complicating’ here (i.e., in ‘a ____ factor’).”

Clicking around can uncover some noteworthy phenomena. For example, the list of words that can replace ‘complicating’ in ‘a complicating factor’ (shown partially in the pop-up) is rather long, with a roughly Zipf-like distribution up front and a long, flat tail (903 tokens of 147 different words on this list of competition for that slot). Only 13 of the 903 tokens are instances of ‘complicating’. That’s the picture for the slot modifying ‘factor’. But ‘click on anything’ allows us to click on ‘factor’ as well in this same pattern ‘a complicating factor’ so we can see its competition, what else the modifier ‘complicating’ can modify here. And it’s quite a different sort of list. It’s a list of one– “factor” is the only noun that “complicating” modifies in this context. (This relative frozen-ness of factor in this slot happens to account for why ‘a complicating factor’ is ranked rather high on the list of patterns for the query of ‘factor’.)

Hope this gives a taste of how clicking opens up some access to StringNet as a net. In future posts I’ll describe more entry points to StringNet that clicking creates.

Posted in Uncategorized | Leave a comment