top of page

The Presuppositions and Semantics of Concept Tracer

Writer's picture: Brian KobylarzBrian Kobylarz

Updated: May 4, 2024

Brian Kobylarz


This blog entry is meant to be a gentle prerequisite to multiple projects I am working on, with particular focus on Concept Tracer, which is a methodology and code pipeline for studying concepts via big-textual-data. While concept analysis is an important research aim in itself, I design new forms of concept analysis with conceptual engineering in mind. As this series unfolds, I will elaborate a couple options for studying concepts via a union of natural language processing, corpus linguistics, and complexity science.  I take conceptual engineering to be the intentional design of concepts. This can come in the form of creating a new concept, ameliorating an existing concept, or abandoning a concept one deems defective


This first post will address: 


  1. The philosophical and linguistic bedrock of my projects. 

  2. How to talk about concepts in terms of networks and complexity. 

  3. How concept analysis as complexity science can inform conceptual engineering. 



1.0 Philosophy 


I understand the history of philosophy as the history of concept analysis and creation, often with regards to (though not limited to) more abstract concepts. To analyze a concept is to inquire into what semantics, logical entailments, inferences, and behaviors generally follow from a given concept and/or combination of concepts. Traditionally, the philosopher reads tome after tome—taking inventory of diverse concept usagesin an attempt to pin down an understanding of concepts like “justice” or “truth,” “structure” or “democracy” etc. Attempts are then made to refine these concepts through debate and dialogue. This approach, when done successfully, cleans up our thinking and speaking, thereby expanding and constraining our agency. 


In more everyday situations, concepts are also of immense importance. I may experience the same redwood that the park ranger does, but I am limited in my capacity to recognize invasive species or potential environmental benefits the tree provides. While we both observe the same redwood, we come to wildly different conclusions, thereby suggesting that there is something different about the two of us rather than the redwood itself. This difference is that the park ranger has a more specialized and refined collection of concepts than I do as a mere tourist. In all aspects of life, we are as capable as our concepts, and what I mean by this is that concepts precede intentional action: to conceptualize is to have some way of recognizing and then intervening into a matter; the limits of which are the limits of our concepts.


“The paradigm observer is not the man who sees and reports what all normal observers see and report, but the man who sees in familiar objects what no one else has seen before.”

 - Norwood Hanson, Philosopher of Science


In the age of internet pipelines and echo chambers: our concepts are under constant revision by moneyed and ideological interests. Our minds are bombarded with words, sounds, and visuals: all of which are masterfully orchestrated to sway us in one or another direction. The gerrymandering of a mind is often drawn by the media and content a given mind consumes. To understand this is to have some semblance of our contemporary situation; to formally model this is to discover patterns in concept use that may be able to break us free from conceptual narrowness. If you have found yourself observing a worldly event and are able to predict what different groups will say about itbefore they say anythingthen this is a series that might interest you. If you are, like me, horrified by this predictability and its consequences, then this is a blog series that can show how you may engineer your concepts and I invite you to expand on the material I present. The idea is to offer new tools to explore the space of both existing concepts and latent concepts: delivering us better ways of thinking, speaking, and ultimately intervening. I must stress the point that I do not mean to suggest that we replace traditional forms of concept analysis (such as the philosopher’s hermeneutical approach or the psychologist’s prototype approach) but rather that what follows should be used in tandem—the subsequent methods should be thought of as supplemental to traditional forms of concept analysis, with unique opportunities for conceptual engineering. 



2.0 Our First Task is to Determine how Words Hang Together


2.1 The Pivotal Presupposition 


Why words? Well, we have an abundance of them in text format; and as I will show: this means we have something tangible that can be measured by natural language processing and then studied as complex systems. Second, there is a rich philosophical tradition of pairing words with concepts (if not outright conflating the two). I take words to be instances of concept use, which is a philosophical move not unlike Wittgenstein’s meaning as use and Wilfrid Sellars’s “the new way of words” So, where we see a word: let that be understood as an instance of a concept being used.



2.2 The Operationalization Presupposition 


For decades psychologists, linguists, and machine learning engineers have refined algorithms for studying words mathematically. There are a number of options including, but not limited to: Latent Semantic Analysis, Distributional Semantic Models, and Word Embeddings. As you can tell from the links provided: there are significant overlaps between these methods (in particular the vectorization of words), but there is something more fundamental throughout them all–they all rest on the same foundation: word co-occurrence. 









This quote is often read in a semantic register, and it runs something like this:

If we wish to understand a word’s intended meaning, then we ought to look at those words surrounding it. To solidify the point, let’s consider this heavily censored sentence:


1) The river bank needs to be expanded 12 feet in both directions to account for the excess water runoff.


We can glean a possible meaning of “bank” here as being the place where monetary exchange happens. This relies heavily on our knowledge of banks. We cannot help but to inject meaning as we are not given the option of reading how the author of this sentence is using “bank.” 


Let's continue to remove censors.


2) The  river bank needs to be expanded 12 feet in both directions to account for the excess water runoff. 


As I remove censors, we can infer that this bank has been expanded 12 feet in both directions, for whatever reason(s). 


3) The river bank needs to be expanded 12 feet in both directions to account for the excess water runoff. 


With more information, we grasp a greater conception of how the author is using the word “bank” and that they are claiming that the bank was expanded by 12 feet in both directions because there is a flooding issue. But what If I remove the remaining censors?


4) The river bank needs to be expanded 12 feet in both directions to account for the excess water runoff. 


It is only after the final censors are removed that we notice that we were mistaken and that what is being referred to is actually a river bank and not a bank in terms of a financial institution. The example of a river bank is innocuous (unless flooding is a real issue for you) and is only used to demonstrate the operationalization presupposition, but consider what words tend to accompany more divisive words such as “migrant”, “enemy combatant”,  “illegal alien”, or “refugee” and how their co-occurrences can differ with different speaker communities. Likewise, imagine the co-occurrences for words like “men”, “women”, “left wing”, and “disabled”; I’m sure you can think of some dark accompaniments of words from some questionable people. The combinations of words, from different communities, can vary substantially and different word co-occurrences mean different thoughts and behaviors–different inferences. Word co-occurrence comes into view as an important phenomenon to study as we become locked into habits of combining words (combining concepts) and using them for different ends. 


2.3 The Structure of Word Co-Occurrence 


While word co-occurrence can be used to understand semantics (such as determining what sense of “bank” was intended), it can also be used to determine how a word is used in conjunction with other words structurally (in terms of their frequency of co-occurrence) and these structures can be visualized as networks. This is a subtle distinction that can make all the difference for the concept engineer who finds themselves tinkering with concepts that are used in myriad delicate ways. Let's return to our imaginary author, who is concerned with river maintenance, but this time as semanticists armed with complexity science. 


Visualizing Word Co-Occurrence Networks or How to Think of Words as Parts that Comprise Complex Systems. 


Network models are often used in complexity science to represent multivariable systems and how they’re interconnected. This allows the complexity scientist to visualize relationships that might otherwise not be transparent to them. It is the combined complex nature of our conceptual resources and the imperative to engineer with caution that demands this kind of investigation.


Here we will address a complex system property as it pertains to semantics or concept analysis:


  1. Network Modeling of word co-occurrence.

  2. generative entrenchment (complex properties of concept use).  


Let’s generate a couple other sentences from our imaginary author and represent their word co-occurrences as networks.  


  1. “The continued management of a river is necessary relative to urban development.” 

  2. “The corrosion of river banks, over time, can be disastrous for local ecology.” 

  3. “The river is hospitable to diverse species including insect, duck, and fish populations.”

  4. “An agent-based-model can be used to simulate insect, duck, and fish populations over time.” 


Let the nodes of our network be words and the edges (links) be the co-occurrences our imaginary author uses to link together words to build coherent sentences.



This is a complex network, and it only concerns four sentences from our imaginary author. You may be wondering what can be done with such a network and perhaps not much without the aid of algorithms (we aren’t there yet).  When we build a network model of word co-occurrence, we want to focus on some words that are more relevant than others; in natural language processing this is known as ‘stop word removal’ and amounts to the removal of words like: “the”, “a”, “and”, “if”, “not” etc. Often with models there are trade-offs between accuracy and usability, so let's remove some stop words and try to design a more usable word co-occurrence network containing only those words that are relevant to our investigation (how the author links ecological, computational, and cautionary words with regards to river management). 


After removing stop words, it is easier to see a complex property in this network: generative entrenchment. We see that the word “river” is entrenched as it connects three clusters (or topics) that our author tends towards when talking about this “river.” Starting in a clockwise manner; at about 12, we may say that this cluster (“banks” “disastrous” “ecology” “local” “corrosion”) regards the concern the author has with river managementthat when our author speaks or writes about this “river” one possible move they tend to make is to express a sense of urgency and worry. At about 3, we find another cluster of words: “agent-based-model,” “simulate,” “hospitable,” “populations,” “species,” “diverse,” “insect,” “duck,” and “fish.” From this cluster we can infer that the author suggests using “agent-based-models'' to “simulate” the habitants of the “river.” Notice that “time”, like “river”, is also entrenched as it connects the cluster at 12 and the cluster at 3. Finally, at around 8 we see a cluster containing the words: “necessary,” “management,” “urban,” etc. This cluster may be said to denote the causal actors the imaginary author is worried will harm the animal and insect populations of the river. To further expand on the nature of generative entrenchment, I will defer to of William Wimsatt (the creator of the concept): 


“GENERATIVE ENTRENCHMENT (GE). A measure of how many things depend upon an element and thus likely to change if it changes. (In an abstract network, if means of access [robustness] are paths to a node, then GE is the reach of paths from a node. Thus robustness and GE are complementary measures of local order in a complex system. What is derived from what depends upon the manipulations or inferences involved, so whether a given relation involves one or the other may depend on the specified operations.) Things with higher GE are more evolutionarily conservative because the chance that random changes in them will be adaptive declines exponentially with increasing GE. They also generate more massive changes when they do change. Things that stay around long enough get entrenched and more resistant to change because they have more things depending on them and depending on them to greater degrees.” (Wimsatt, 2004)


We can determine which nodes (or words, or concepts) possess this property of generative entrenchment simply by counting the number of links present in this network:


Continued: 6 links

Management: 6 links

River: 19 links

Necessary: 6 links

Relative: 6 links

Urban: 6 links

Development: 6 links

Corrosion: 6 links

Banks: 6 links

Time: 12 links

Disastrous: 6 links

Local: 6 links

Ecology: 6 links

Hospitable: 7 links

Diverse: 7 links

Species: 7 links

Insect: 10 links

Duck: 10 links

Fish: 10 links

Populations: 10 links

Agent-based-model: 6 links

Simulate: 6 links


We find that there are 6 words that feature the property generative entrenchment. These six words (concepts) make up the core of our imaginary author’s concerns with regards to the river and its inhabitants; all other words are auxiliary–they emerge when our author considers the “river,” its inhabitants, and what can be done about urban development. We can thus read this network as suggesting that changes made to “river” and “time” would likely be disastrous for the imaginary author’s conceptualizations as changes made to these entrenched nodes would fundamentally change the way our imaginary author speaks about these topics. It would additionally be extremely hard to change these concepts, if at all possible. Now imagine the urban developers pushing a narrative that talks about the importance of urban development–the usual kind of “profits over ecology” sort of thing. We see that the entrenchment would make it difficult for our imaginary author to buy into this, but if the urban developers hit hard enough (perhaps through propaganda campaigns) that the same entrenchments that preserve a conceptual network can also be the weak points of this network as any changes made to them can drastically change the topology of the network. If the imaginary author comes to see the river, local flooding, and the inhabitants of the river as “unfortunate externalities” (the “couple of eggs that must be broken to make an omelet”) in the “local business revitalization project” then the imaginary author could be swayed. 


3.0 What then is Actionable for the Conceptual Engineer? 


After having determined the structure of our imaginary author’s use of words, we can read-off possible conceptual engineering projects from this network in order to help our imaginary author make their case. I will paste the network here again such that it is easier to reference:




One option is to address the concept: “agent-based-models.” From this word co-occurrence network we can see that our imaginary author has a very limited understanding of agent-based-models, as the author seems strictly focused on modeling the inhabitants of the river and not the environment of those inhabitants. We know that the imaginary author understands the causal impact urban development has on river bank corrosion, but they do not directly link together agent-based-models with this–thereby missing out on further empirical support for their case. We also see that “agent-based-models” is not entrenched–for our author–and as such, would likely be easier to engineer. It could be that our imaginary author is not sufficiently familiar with “agent-based-models” and what they do.

An agent-based model can be used to simulate not only agents but also environments; and more often than not they are. While this example is less a serious critique of the concept ‘agent based models’ and more of a ‘proof of concept,’ one wonders if there is opportunity to capture causal and correlative relationships between urban development, river bank erosion, and the biodiversity of the river. Perhaps the concept “agent-based-model” has misled our imaginary author and could use refining.


Question: Can you think of a better way to conceptualize “agent-based-models” such that it evades this issue? 


One option could be: “Agent-Environment-Models” which makes explicit the ability to simulate environments along with agents. 


If successful: this could very well inform “the continued management of a river,” that is “necessary relative to urban development,” thereby linking together all three clusters for a stronger (more robust) concept network. Let's plug a new sentence in and see how this changes the word co-occurrence network for our imaginary author.


The sentence: An agent-environment-model can be used to simulate urban development, river corrosion along with insect, duck, and fish populations over time. 




Our network has reached greater connectivity between its nodes. With the new concept “agent-environment-model” having brought together those nodes that concern urban development, corrosion, insect, fish, and duck populations. Likewise, we can imagine a new inference our imaginary author could make if they utilize agent-environment-models: 1) If urban development continues to displace x amount of rain water then the river banks will erode at y rate which will have z impact on biodiversity. 


This would make for a more complete representation of those dynamics that affect the river and a better legal case against the urban developers. We additionally see that “corrosion” (of the river banks) has become entrenched as it is now directedly linked to those nodes representing the biodiversity (fish, insects, and ducks). 


Now, let’s try to ameliorate the concept of “agent-environment-models” in order to bring closer those “management” and “agent-environment-model” co-occurrences. We do not, Afterall, want those urban developers to push the responsibility for river management off on local municipalities. 


The additional sentence: An agent-environment-model can simulate different urban management plans to include drainage, river-dreading, and regular debris clean up; thereby informing urban developers how they can continue to build without contributing to flooding and harming local insect, duck, and fish populations. 




We have effectively created a new concept for our imaginary author (“agent-environment-models”) and then ameliorated it such that it includes a more robust sense of functionality (expanding the possibility of simulating urban river management). In the top left we see many entrenched nodes, linking together to form a stronger word co-occurrence network with regards to what can be done and who is at fault. As we incorporate new words (new concepts) we help our imaginary author to immunize themselves against external control over their conceptual resources as when the urban developers try to push a narrative.



4.0 What About Scaling up? 


We only looked at one imaginary author and a couple of sentences. A question remains as to what networks can be built when scaling up to include large corpora such as internet message boards and forums. To accomplish this, we must be careful, as including too much textual data may afford us networks that don’t correspond to any actually existing speakers or speaker communities. Conversely, if we use too little textual data, we may not accurately assess how concepts are being used in the wild. It is here that corpus linguistics and internet databases can be a significant aid—especially those that are already refined, such as reddit being decomposable into subreddits: each representing a different speaker community. We also only looked at one complex property of networks (generative entrenchment) and there are many other ways to study networks well beyond this, some of which can additionally inform conceptual engineering projects. In the next entry I will cover these themes, exploring how corpus linguistics and the design of corpora, can further inform conceptual engineering and do so on a larger scale. I will then introduce code for Concept Tracer, which will automate much of this process of building large word co-occurrence networks. 


5.0 Bullet Point Break Down 


  • Words are instances of concept use

  • Word co-occurrence can give us a semantics as well as the structure of word use. 

  • The structure of word co-occurrence can be visualized as a network. 

  • We can read off complex properties from network visualizations.

  • We can determine how changes made to a given concept will affect other concepts in a network—putting the ‘engineering’ in ‘conceptual engineering.’ 

257 views

Recent Posts

See All

Comments


bottom of page