Finding Sentence using regular expressions with .net
In the field of linguistics, a sentence —an expression in natural language— is often defined to indicate a grammatical and lexical unit consisting of one or more words that represent distinct concepts. A sentence can include words grouped meaningfully to express a statement, question, exclamation, request or command.
All the sentences contins group of words ending with .(punctuation mark), ?(question mark), !(exclamation mark) or : (colon) etc. In this regular expression we are going to find out sentences, means group of words ending with .(punctuation mark), ?(question mark), !(exclamation mark) or : (colon) etc.
Regular Expression Pattern
(?sx-m)[^\r\n].*?(?:(?:\.|\?|!|\:)\s)
A description of the regular expression:
Change options within the enclosing group [sx-m]
Turn OFF Multiline option
Turn ON Single Line option
Turn ON Ignore Pattern Whitespace option
Any character that is NOT in this class: [\r\n]
Any character, any number of repetitions, as few as possible
Match expression but don’t capture it. [(?:\.|\?|!|\:)\s]
(?:\.|\?|!|\:)\s
Match expression but don’t capture it. [\.|\?|!|\:]
Select from 4 alternatives
Literal .
Literal ?
!
Literal :
Whitespace
Read the rest of this entry »