'
Bazarbayeva G.A., Karbоzova G.K.
MARKERS OF LINGUISTIC ARTIFICIALITY IN INTERNET COMMUNICATION (BASED ON BOT MATERIALS) *
Аннотация:
this article examines linguistic markers of artificiality in AI-generated online texts. Based on a comparative corpus of human and machine-authored responses, the study identifies five key indicators: lexical repetition, syntactic rigidity, pragmatic deviation, emotional flatness, and cultural detachment. The findings highlight the limitations of AI language and provide a methodological framework for detecting artificiality in digital discourse.
Ключевые слова:
artificial intelligence, linguistic artificiality, lexical repetition, pragmatic deviation, emotional neutrality, cultural context
The integration of artificial intelligence into digital communication has dramatically reshaped how language is produced, distributed, and perceived online. Once the exclusive domain of human authors, textual creation has now become a shared space between human minds and machine algorithms. Chatbots, virtual assistants, and automated reply systems are increasingly present in platforms ranging from education to business, from healthcare to entertainment. These systems generate text that appears fluent and coherent—often indistinguishable from human language at first glance. Yet, as this paper argues, surface fluency should not be mistaken for genuine linguistic authenticity.While AI models such as GPT and other language generators are trained on vast corpora of natural language, their outputs are fundamentally different from human speech in purpose, process, and structure. Human language is shaped not only by grammar and vocabulary but by experience, intent, emotion, and culture. It is relational, adaptive, and context-aware. AI-generated language, in contrast, is the result of predictive algorithms operating within statistical parameters. It does not stem from personal experience or emotional understanding. This results in texts that may be grammatically flawless, but semantically shallow or pragmatically inconsistent.This divergence has given rise to the concept of linguistic artificiality—a term used to describe the specific textual features that suggest non-human authorship. These features are not necessarily errors, on the contrary, many AI-generated texts appear too perfect, too structured, or too neutral. The challenge, therefore, is not to catch mistakes, but to notice the absence of natural variation, personal voice, or cultural context. Linguistic artificiality can manifest in the form of repetitive phrasing, overuse of formal connectors, lack of emotional modulation, or the absence of idiomatic and culturally anchored expressions.As artificial intelligence becomes more embedded in daily life, the ability to recognize these subtle signs of artificiality becomes a matter of digital literacy. It affects not only academic honesty and authorship attribution but also interpersonal trust, decision-making, and the integrity of information exchange. In settings where emotional intelligence, cultural sensitivity, or ethical nuance is required, the limitations of AI language become more visible and consequential.The purpose of this paper is to explore and analyze the most common markers of linguistic artificiality in internet communication, with a specific focus on text generated by conversational bots. The analysis is based on a comparative corpus of human- and AI-generated responses, categorized according to five major criteria: lexical repetition, syntactic rigidity, pragmatic deviation, emotional neutrality, and cultural absence.The paper is structured in three analytical sections. The first focuses on surface-level linguistic features—word choice and sentence structure. The second investigates deeper discourse patterns, particularly how bots manage context, inference, and conversational logic. The third explores the emotional and cultural dimensions of language, examining where AI fails to replicate human tone, empathy, or cultural reference points. Each section includes tables and diagrams built on original data. The final conclusion synthesizes these observations and offers recommendations for improving awareness, detection, and critical engagement with AI-generated content.One of the most observable differences between human and AI-generated texts lies in the lexical and syntactic dimensions of language. These aspects form the surface structure of discourse, making them an ideal entry point for analyzing linguistic artificiality. While AI systems are trained to replicate grammatical forms and sentence logic, their language often lacks the richness, variation, and spontaneity typical of human communication. This section examines two of the most prominent surface-level indicators of artificiality: lexical repetition and syntactic rigidity.Lexical repetition refers to the frequent reuse of the same words or phrases across a single passage. While repetition may occasionally serve rhetorical functions in human writing—such as emphasis or stylistic rhythm—in AI texts, it often arises from probability-based predictions. Language models prioritize high-likelihood word sequences, which leads to the recurrence of general-purpose terms like “important,” “necessary,” “therefore,” or “it is known that.” This overuse reduces lexical diversity and contributes to a mechanical tone. In contrast, human-authored texts demonstrate broader vocabulary use, including idioms, metaphors, and spontaneous lexical substitutions that reflect thought processes and individuality [1].Syntactic rigidity complements lexical repetition as another dominant trait of AI-generated discourse. Bots tend to use sentence templates that follow predictable grammatical patterns, such as parallel clauses, linear logical connectors, and symmetrical sentence forms. For example: “This topic is important because it affects many people. In addition, it has social implications. Therefore, it should be studied.” While grammatically correct, such construction lacks dynamic pacing and stylistic modulation. Human syntax, on the other hand, includes sentence fragments, questions, abrupt shifts, and rhythm changes, especially in informal settings. This variation is often influenced by audience awareness, emotional tone, or communicative intent [2].To assess these differences empirically, a comparative analysis was conducted using two corpora: 100 AI-generated responses and 100 human-authored responses, drawn from internet communication platforms. The texts were evaluated based on the frequency of repeated vocabulary, variety of sentence structures, and use of idiomatic or irregular forms. The results are summarized below.Table 1. Frequency of Key Linguistic Markers (Author’s Data).Note: Compiled by the author.As shown in Table 1, bot-generated texts significantly exceed human texts in both lexical repetition and syntactic uniformity. The gap is particularly notable in syntactic structure, where AI texts display nearly double the regularity of human texts. This suggests that AI systems, though effective at producing grammatically valid outputs, still rely on repetitive and standardized structural frames that lack creativity or rhetorical flair.These findings have important implications for content assessment and AI detection. In educational settings, for instance, overuse of structured transitions and narrow vocabulary can indicate the presence of machine-generated work. Similarly, in customer service or journalism, identifying such markers may help verify content authenticity and authorship. In sum, lexical and syntactic indicators are strong, surface-level signs of artificiality. They provide a measurable and interpretable way to distinguish between human and AI-generated texts, especially when examined in combination. However, to gain a deeper understanding of machine language limitations, one must look beyond structure to the pragmatic layer, where intent, relevance, and context are tested. The next section will explore this in greater depth.While lexical and syntactic structures can be analyzed visibly on the surface of a text, a more profound layer of language emerges when we examine how meaning operates in context. This layer is known as pragmatics—the study of how speakers use language in real-life situations to achieve communicative goals. Pragmatic awareness allows humans to infer, imply, suggest, and adapt based on shared assumptions and social norms. In contrast, AI-generated texts, although grammatically sound, often demonstrate clear signs of pragmatic failure.One of the most widely accepted models of pragmatics is the Gricean framework, which defines four conversational maxims that guide cooperative communication: Quantity (do not say too much or too little), Quality (say what you believe to be true), Relevance (be on topic), and Manner (be clear and unambiguous) [3]. In human dialogue, these principles are often followed intuitively. However, AI systems—lacking true understanding—frequently violate them, especially in terms of quantity and relevance.For example, when a user asks a chatbot a simple question, the response might include unnecessary definitions, redundant restatements, or overly general background information. Although this verbosity is generated with the intent to be helpful, it leads to inefficient or awkward communication. Bots cannot gauge what the user already knows, nor can they interpret indirect meaning or emotional cues. They operate through completion prediction, not context-aware reasoning.To better understand these pragmatic deviations, the study compared 100 AI-generated responses to 100 human-generated ones, analyzing specific violations of Gricean maxims. The criteria included over-explaining, straying off-topic, failing to answer implicitly asked questions, and using formal language in informal contexts. The results are summarized below.Table 2. Frequency of Pragmatic Violations (Author’s Data).Note: Compiled by the author.As Table 2 shows, bots violate the maxim of quantity nearly four times more often than humans and the maxim of relevance nearly three times more often. These figures suggest that even when AI responses are informative, they often fail to strike the correct balance between precision and brevity or context and detail. Human discourse, by contrast, tends to be more selective, often relying on shared knowledge, intonation, and subtext to complete meaning efficiently [4].Another frequent issue is the bot’s struggle with discourse cohesion. While bots may use connectors like “Furthermore,” “In addition,” or “Therefore,” these transitions are often linear and generic, lacking the logical depth or emotional texture found in human reasoning. Discourse is not merely a string of sentences—it is an evolving structure that depends on memory, purpose, and audience awareness. Bots typically do not track long-range coherence or modify tone mid-conversation.The result is that AI-generated texts can feel disconnected or sterile at the discourse level. Responses might technically answer a question but do so in a way that ignores nuance, leaves implications unexplored, or fails to engage the reader on an interpersonal level. In casual conversations, for instance, bots often miss irony, sarcasm, or hesitation—elements that humans interpret naturally through context. In summary, pragmatic and discourse-level deviations are reliable and diagnostically rich indicators of linguistic artificiality. They reveal the underlying gap between text generation and communicative intention. Unlike lexical or syntactic features, which may be polished with training, pragmatic sensitivity requires a level of social cognition that current AI does not possess. The next section will examine another layer of this gap: the inability of AI to express emotion or integrate cultural specificity into its language.One of the most subtle yet powerful indicators of linguistic artificiality is the absence of emotion and cultural grounding in AI-generated texts. While language models have achieved impressive fluency, they continue to fall short in expressing human-like affect and socially embedded references. This phenomenon is referred to as affective flattening and cultural absence, and it fundamentally limits the authenticity of machine-generated communication.Human language is not only a tool for conveying information—it is also a vehicle for emotion. Whether in joy, frustration, irony, or empathy, emotional expression adds depth to communication and strengthens interpersonal bonds. According to Pennebaker, emotional tone in human texts is dynamic and context-sensitive, it reflects internal states and changes naturally across topics, audiences, and emotional situations [5]. In contrast, AI-generated language tends to maintain a consistently neutral tone, regardless of subject matter. This neutrality can be interpreted as professional or polite, but over time it reveals an underlying artificiality.For instance, when responding to emotionally charged prompts—such as personal loss, celebration, or social injustice—AI-generated messages often use generalized statements like “It is important to stay positive” or “People experience different emotions.” While factually correct, these responses lack emotional resonance and fail to demonstrate empathy. They feel flat because they are detached from real human experience. This affective monotony contrasts sharply with human speech, where even the choice of adjectives, punctuation, or sentence rhythm can convey mood.A second, equally important element is the lack of cultural anchoring in AI language. Humans naturally embed references to culture, history, pop media, and local context in their speech. These references not only enrich meaning but also signal social identity and community belonging. For example, phrases like “during lockdown,” “Taylor Swift era,” or “after the World Cup” carry emotional and temporal weight. Bots, unless explicitly trained on these references, avoid them or reproduce them inaccurately.To illustrate these trends, the study analyzed 100 bot texts and 100 human texts based on their use of emotionally expressive vocabulary and culturally specific references. The findings are visualized below.Diagram 1. Statistical Distribution of Linguistic Markers (Author’s Data). Note: Compiled by the author.(Explanation: In Diagram 1, bots score higher on lexical and syntactic regularity but significantly lower on emotional variation and cultural referencing. Human responses display broader emotional range and richer cultural context.)The limitations shown in Diagram 1 highlight a core truth: AI does not feel. It does not participate in lived culture, share collective memory, or experience emotional fluctuation. As a result, its language is based on simulation, not sensation. Jurafsky and Martin point out that emotional and cultural dimensions of communication are acquired through experience and interaction—not through data processing alone [6]. Even the most advanced models, while capable of mimicking style or tone, cannot access the affective or symbolic functions of language in full.This matters not just linguistically, but socially and ethically. In settings such as education, therapy, media, or diplomacy, language must do more than inform—it must connect. If users cannot distinguish between artificial and authentic emotional expression, they may misinterpret intent or rely on AI-generated text in inappropriate contexts affective flattening and cultural absence are not only limitations—they are markers. They indicate the border between machine output and human voice, and they offer one of the most reliable ways to detect linguistic artificiality. While future AI may improve in simulating affect and referencing culture, these features remain—at present—deeply human.Conclusion.The growing presence of artificial intelligence in online communication has created a new linguistic reality—one in which human and machine-generated texts co-exist, often indistinguishably. While language models have made remarkable progress in grammar, fluency, and stylistic imitation, this study shows that they continue to exhibit clear signs of linguistic artificiality. These signs are not always visible at the surface level, rather, they emerge through careful analysis of how language is structured, used in context, and shaped by human experience.Across the three analytical domains explored in this paper—lexical-syntactic structure, pragmatic-discursive logic, and affective-cultural depth—AI-generated texts consistently diverge from human-authored language. The tendency of bots to repeat words, rely on templated sentence structures, and overuse formal connectors reveals a mechanistic logic. Their frequent violations of conversational norms show a lack of social awareness and communicative intention. Most notably, their failure to express emotion and reference culture exposes the absence of lived experience.These findings carry implications far beyond linguistics. As AI becomes increasingly embedded in journalism, education, customer service, and public discourse, it is critical to develop awareness around what machines can and cannot authentically replicate. Recognizing linguistic artificiality is not about blaming the technology—it is about responsibly engaging with its output.This study offers a framework for doing just that. By identifying five key indicators—lexical repetition, syntactic rigidity, pragmatic violation, emotional flatness, and cultural detachment—it becomes possible to critically evaluate the origin and intent behind digital texts. These markers provide both educators and readers with tools for promoting AI literacy, encouraging transparency in authorship, and safeguarding meaningful human expression.
Номер журнала Вестник науки №6 (87) том 2
Ссылка для цитирования:
Bazarbayeva G.A., Karbоzova G.K. MARKERS OF LINGUISTIC ARTIFICIALITY IN INTERNET COMMUNICATION (BASED ON BOT MATERIALS) // Вестник науки №6 (87) том 2. С. 1351 - 1360. 2025 г. ISSN 2712-8849 // Электронный ресурс: https://www.вестник-науки.рф/article/24053 (дата обращения: 13.07.2025 г.)
Вестник науки © 2025. 16+
*