reglibcpp  1.7.0
(Naïve) C++ implementation of models for regular languages
Classes | Public Types | Public Member Functions | Static Public Member Functions | Static Public Attributes | List of all members
reg::expression Class Reference

Represents formal regular expressions. More...

#include <expression.h>

Classes

struct  literals
 
struct  parser
 Parses regular expressions. More...
 

Public Types

enum  operation {
  empty, symbol, kleene, concatenation,
  alternation
}
 The different purposes an RE may fulfill. More...
 
typedef std::shared_ptr< expression const > exptr
 This is the type used to handle regular expressions. More...
 

Public Member Functions

size_t size () const
 Reports the size of this RE's tree representation. More...
 
operation getOperation () const
 Reports this RE's function. More...
 
 operator nfa const & () const
 Returns an NFA accepting the language that this RE describes. More...
 
bool operator== (nfa const &other) const
 Checks whether this RE describes the same regular language as another object. More...
 
bool operator!= (nfa const &other) const
 Checks whether this RE describes a different regular language than another object. More...
 
bool operator== (expression const &r) const
 
bool operator!= (expression const &r) const
 
char32_t extractSymbol () const
 Reports this symbol expression's UTF-32-encoded symbol. More...
 
std::string extractUtf8Symbol () const
 Reports this symbol expression's UTF-8-encoded symbol. More...
 
std::u32string to_u32string () const
 Describes this RE in UTF-32-encoded human-readable form. More...
 
std::string to_string () const
 Describes this RE in UTF-8-encoded human-readable form. More...
 
std::vector< exptr >::const_iterator begin () const
 Returns an iterator pointing to this RE's first subexpression. More...
 
std::vector< exptr >::const_iterator end () const
 Returns an iterator pointing behind this RE's last subexpression. More...
 

Static Public Member Functions

static void reset ()
 Resets the symbols used for RE operators to their defaults. More...
 
static exptr const & spawnEmptySet ()
 Gives an RE representing the empty set ∅. More...
 
static exptr const & spawnEmptyString ()
 Gives an RE representing the empty string ε. More...
 
static exptr const & spawnSymbol (char32_t symbol)
 Gives an RE representing the given UTF-32-encoded symbol. More...
 
static exptr const & spawnSymbol (std::string const &utf8Symbol)
 Same as above for a UTF-8-encoded symbol. More...
 
static exptr spawnKleene (exptr const &b, bool optimized=true, bool aggressive=false)
 Gives an RE representing the Kleene closure of a given RE. More...
 
static exptr spawnConcatenation (exptr const &l, exptr const &r, bool optimized=true, bool aggressive=false)
 Gives an RE representing the concatenation of two given REs. More...
 
static exptr spawnAlternation (exptr const &l, exptr const &r, bool optimized=true, bool aggressive=false)
 Gives an RE representing the alternation of two given REs. More...
 
static exptr spawnFromString (std::u32string const &re, literals lits, bool optimized=false, bool aggressive=false)
 
static exptr spawnFromString (std::string const &utf8Re, literals lits, bool optimized=false, bool aggressive=false)
 
static exptr spawnFromString (std::u32string const &re, bool optimized=false, bool aggressive=false)
 Gives an RE encoded in a given string. More...
 
static exptr spawnFromString (std::string const &utf8Re, bool optimized=false, bool aggressive=false)
 Same as above for a UTF-8-encoded string. More...
 

Static Public Attributes

static char32_t L = U'('
 The symbol used to represent the Left parenthesis in a regular expression. More...
 
static char32_t R = U')'
 The symbol used to represent the Right parenthesis in a regular expression. More...
 
static char32_t K = U'*'
 The symbol used to represent the Kleene star in a regular expression. More...
 
static char32_t A = U'+'
 The symbol used to represent the Alternation in a regular expression. More...
 
static char32_t E = U'ε'
 The symbol used to represent the Empty string in a regular expression. More...
 
static char32_t N = U'∅'
 The symbol used to represent the Null/empty set in a regular expression. More...
 

Detailed Description

Represents formal regular expressions.

One should never need to handle such an object directly, however, much less copy or move it and therefore copy and move constructors are deleted.

To work with regular expressions, one should use expression::exptr, which aliases a shared_ptr to an actual object and can be copied and moved to one's heart's content. To access member functions, one might dereference exptrs temporarily or, better yet, use the arrow -> operator.

See also
expression::exptr

Definition at line 28 of file expression.h.

Member Typedef Documentation

◆ exptr

This is the type used to handle regular expressions.

Every method works on shared_ptrs to the actual regular expressions, to help with basic comparisons and to save memory.

For example, every symbol's (and the empty string's and the empty set's) regular expression is only instantiated once and then pointed to by as many exptrs as one likes.

Definition at line 40 of file expression.h.

Member Enumeration Documentation

◆ operation

The different purposes an RE may fulfill.

See also
spawnEmptySet
spawnEmptyString
spawnSymbol
spawnKleene
spawnConcatenation
spawnAlternation

Definition at line 84 of file expression.h.

84 { empty, symbol, kleene, concatenation, alternation };

Member Function Documentation

◆ begin()

vector< expression::exptr >::const_iterator reg::expression::begin ( ) const

Returns an iterator pointing to this RE's first subexpression.

Definition at line 315 of file expression.cpp.

315  {
316  return subExpressions.cbegin();
317 }

◆ end()

vector< expression::exptr >::const_iterator reg::expression::end ( ) const

Returns an iterator pointing behind this RE's last subexpression.

Definition at line 320 of file expression.cpp.

320  {
321  return subExpressions.cend();
322 }

◆ extractSymbol()

char32_t reg::expression::extractSymbol ( ) const

Reports this symbol expression's UTF-32-encoded symbol.

Returns
the char32_t encoded within this symbol expression, U'\0' for an empty string
Exceptions
std::logic_errorif this expression's purpose is not that of a symbol

Definition at line 249 of file expression.cpp.

249  {
250  auto it = std::find_if(
251  expression::symbols.begin(),
252  expression::symbols.end(),
253  [&](pair<char32_t,exptr> const& entry)->bool{return entry.second.get() == this;}
254  );
255  if (it == expression::symbols.end()) {
256  throw std::logic_error("This RE does not seem to be a valid symbol expression.");
257  }
258  return it->first;
259 }
std::vector< exptr >::const_iterator begin() const
Returns an iterator pointing to this RE's first subexpression.
Definition: expression.cpp:315
std::vector< exptr >::const_iterator end() const
Returns an iterator pointing behind this RE's last subexpression.
Definition: expression.cpp:320

◆ extractUtf8Symbol()

string reg::expression::extractUtf8Symbol ( ) const

Reports this symbol expression's UTF-8-encoded symbol.

Returns
the character encoded within this symbol expression, "" for an empty string
Exceptions
std::logic_errorif this expression's purpose is not that of a symbol

Definition at line 266 of file expression.cpp.

266  {
267  char32_t symbol(extractSymbol());
268  if (symbol == U'\0') {
269  return string();
270  } else {
271  return converter.to_bytes(extractSymbol());
272  }
273 }
std::wstring_convert< std::codecvt_utf8< char32_t >, char32_t > converter
Converts between UTF-8-encoded and UTF-32-encoded strings.
Definition: utils.cpp:4
char32_t extractSymbol() const
Reports this symbol expression's UTF-32-encoded symbol.
Definition: expression.cpp:249

◆ getOperation()

expression::operation reg::expression::getOperation ( ) const

Reports this RE's function.

Note that the empty string's function is technically that of a symbol.

Returns
the expression::operation best describing this RE's purpose

Definition at line 205 of file expression.cpp.

205  {
206  return op;
207 }

◆ operator nfa const &()

reg::expression::operator nfa const & ( ) const

Returns an NFA accepting the language that this RE describes.

◆ operator!=() [1/2]

bool reg::expression::operator!= ( nfa const &  other) const

Checks whether this RE describes a different regular language than another object.

Returns
false if this RE's language is exactly the same as the other object's, true else

Definition at line 230 of file expression.cpp.

230  {
231  return !operator==(other);
232 }
bool operator==(nfa const &other) const
Checks whether this RE describes the same regular language as another object.
Definition: expression.cpp:222

◆ operator!=() [2/2]

bool reg::expression::operator!= ( expression const &  other) const
Deprecated:
Was generalized to compare to any class in reg.

Definition at line 240 of file expression.cpp.

240  {
241  return *this != static_cast<nfa const&>(other);
242 }

◆ operator==() [1/2]

bool reg::expression::operator== ( nfa const &  other) const

Checks whether this RE describes the same regular language as another object.

Returns
true if this RE's language is exactly the same as the other object's, false else

Definition at line 222 of file expression.cpp.

222  {
223  return other.operator==(*this);
224 }

◆ operator==() [2/2]

bool reg::expression::operator== ( expression const &  other) const
Deprecated:
Was generalized to compare to any class in reg.

Definition at line 235 of file expression.cpp.

235  {
236  return *this == static_cast<nfa const&>(other);
237 }

◆ reset()

void reg::expression::reset ( )
static

Resets the symbols used for RE operators to their defaults.

Definition at line 36 of file expression.cpp.

36  {
37  L = U'(';
38  R = U')';
39  K = U'*';
40  A = U'+';
41  E = U'ε';
42  N = U'∅';
43 }
static char32_t N
The symbol used to represent the Null/empty set in a regular expression.
Definition: expression.h:41
static char32_t K
The symbol used to represent the Kleene star in a regular expression.
Definition: expression.h:41
static char32_t L
The symbol used to represent the Left parenthesis in a regular expression.
Definition: expression.h:41
static char32_t E
The symbol used to represent the Empty string in a regular expression.
Definition: expression.h:41
static char32_t R
The symbol used to represent the Right parenthesis in a regular expression.
Definition: expression.h:41
static char32_t A
The symbol used to represent the Alternation in a regular expression.
Definition: expression.h:41

◆ size()

size_t reg::expression::size ( ) const

Reports the size of this RE's tree representation.

In this context, an RE's size will be defined recursively as follows:

  • .size() = 1
  • ε.size() = 1
  • <symbol>.size() = 1
  • (l+r).size() = 1 + l.size() + r.size()
  • (lr).size() = 1 + l.size() + r.size()
  • (b*).size() = 1 + b.size()
    Returns
    a measure of how many subexpressions this RE consists of

Definition at line 192 of file expression.cpp.

192  {
193  size_t s(1);
194  for (exptr const& re : *this) {
195  s += re->size();
196  }
197  return s;
198 }
std::shared_ptr< expression const > exptr
This is the type used to handle regular expressions.
Definition: expression.h:40

◆ spawnAlternation()

expression::exptr reg::expression::spawnAlternation ( expression::exptr const &  l,
expression::exptr const &  r,
bool  optimized = true,
bool  aggressive = false 
)
static

Gives an RE representing the alternation of two given REs.

More formally, the RE's language will be L(l+r) = L(l) ∪ L(r).

Parameters
lexptr to one of the REs
rexptr to the other RE
optimizedwhether simplifications on the syntax level should be applied
aggressivewhether the simplifications should check the semantic level
Returns
exptr to the RE representing the alternation of l and r

Definition at line 123 of file expression.cpp.

123  {
124  if (optimized) {
125  if (l == expression::spawnEmptySet()) {
126  return r;
127  }
128  if (r == expression::spawnEmptySet()) {
129  return l;
130  }
131  if (l == r) {
132  return l;
133  }
134  if (aggressive) {
135  if (*l == *expression::spawnEmptySet()) {
136  return r;
137  }
138  if (*r == *expression::spawnEmptySet()) {
139  return l;
140  }
141  if (*l == *r) {
142  return l->size() > r->size() ? r : l;
143  }
144  dfa empty;
145  if (empty == nfa::subtract(*l, *r).build()) {
146  return r;
147  }
148  if (empty == nfa::subtract(*r, *l).build()) {
149  return l;
150  }
151  }
152  }
153  return expression::exptr(new expression(l, r, operation::alternation));
154 }
static nfa::builder subtract(nfa const &n1, nfa const &n2)
Creates a builder for an NFA accepting the set difference of the languages of two NFAs...
Definition: nfa.cpp:604
std::shared_ptr< expression const > exptr
This is the type used to handle regular expressions.
Definition: expression.h:40
static exptr const & spawnEmptySet()
Gives an RE representing the empty set ∅.
Definition: expression.cpp:50

◆ spawnConcatenation()

expression::exptr reg::expression::spawnConcatenation ( expression::exptr const &  l,
expression::exptr const &  r,
bool  optimized = true,
bool  aggressive = false 
)
static

Gives an RE representing the concatenation of two given REs.

More formally, the RE's language will be L(lr) = L(l) • L(r).

Parameters
lexptr to the first RE
rexptr to the second RE
optimizedwhether simplifications on the syntax level should be applied
aggressivewhether the simplifications should check the semantic level
Returns
exptr to the RE representing the concatenation of l and r

Definition at line 88 of file expression.cpp.

88  {
89  if (optimized) {
90  if (l == expression::spawnEmptyString()) {
91  return r;
92  }
93  if (r == expression::spawnEmptyString()) {
94  return l;
95  }
98  }
99  if (aggressive) {
100  if (*l == *expression::spawnEmptyString()) {
101  return r;
102  }
103  if (*r == *expression::spawnEmptyString()) {
104  return l;
105  }
106  if (*r == *expression::spawnEmptySet() || *l == *expression::spawnEmptySet()) {
107  return expression::spawnEmptySet();
108  }
109  }
110  }
111  return expression::exptr(new expression(l, r, operation::concatenation));
112 }
static exptr const & spawnEmptyString()
Gives an RE representing the empty string ε.
Definition: expression.cpp:59
std::shared_ptr< expression const > exptr
This is the type used to handle regular expressions.
Definition: expression.h:40
static exptr const & spawnEmptySet()
Gives an RE representing the empty set ∅.
Definition: expression.cpp:50

◆ spawnEmptySet()

expression::exptr const & reg::expression::spawnEmptySet ( )
static

Gives an RE representing the empty set ∅.

More formally, the RE's language will be {}.

Returns
exptr to the RE representing the empty set ∅

Definition at line 50 of file expression.cpp.

50  {
51  return expression::empty;
52 }

◆ spawnEmptyString()

expression::exptr const & reg::expression::spawnEmptyString ( )
static

Gives an RE representing the empty string ε.

More formally, the RE's language will be {ε}.

Returns
exptr to the RE representing the empty string ε

Definition at line 59 of file expression.cpp.

59  {
60  return expression::spawnSymbol(U'\0');
61 }
static exptr const & spawnSymbol(char32_t symbol)
Gives an RE representing the given UTF-32-encoded symbol.
Definition: expression.cpp:69

◆ spawnFromString() [1/4]

expression::exptr reg::expression::spawnFromString ( std::u32string const &  re,
literals  lits,
bool  optimized = false,
bool  aggressive = false 
)
static
Deprecated:
Use the static members of expression for providing your own literals.

Definition at line 609 of file expression.cpp.

609  {
610  char32_t l = L;
611  char32_t r = R;
612  char32_t k = K;
613  char32_t a = A;
614  char32_t e = E;
615  char32_t n = N;
616  L = lits.L;
617  R = lits.R;
618  K = lits.S;
619  A = lits.P;
620  E = lits.EPSILON;
621  N = lits.EMPTY;
622  parser stringParser(re);
623  exptr result;
624  try {
625  result = stringParser(optimized, aggressive);
626  L = l;
627  R = r;
628  K = k;
629  A = a;
630  E = e;
631  N = n;
632  } catch (std::bad_function_call ex) {
633  L = l;
634  R = r;
635  K = k;
636  A = a;
637  E = e;
638  N = n;
639  throw std::invalid_argument("Malformed regular expression.");
640  }
641  return result;
642 }
static char32_t N
The symbol used to represent the Null/empty set in a regular expression.
Definition: expression.h:41
static char32_t K
The symbol used to represent the Kleene star in a regular expression.
Definition: expression.h:41
static char32_t L
The symbol used to represent the Left parenthesis in a regular expression.
Definition: expression.h:41
static char32_t E
The symbol used to represent the Empty string in a regular expression.
Definition: expression.h:41
std::shared_ptr< expression const > exptr
This is the type used to handle regular expressions.
Definition: expression.h:40
static char32_t R
The symbol used to represent the Right parenthesis in a regular expression.
Definition: expression.h:41
static char32_t A
The symbol used to represent the Alternation in a regular expression.
Definition: expression.h:41

◆ spawnFromString() [2/4]

expression::exptr reg::expression::spawnFromString ( std::string const &  utf8Re,
literals  lits,
bool  optimized = false,
bool  aggressive = false 
)
static
Deprecated:
Use the static members of expression for providing your own literals.

Definition at line 645 of file expression.cpp.

645  {
646  return spawnFromString(converter.from_bytes(utf8Re), lits, optimized, aggressive);
647 }
std::wstring_convert< std::codecvt_utf8< char32_t >, char32_t > converter
Converts between UTF-8-encoded and UTF-32-encoded strings.
Definition: utils.cpp:4
static exptr spawnFromString(std::u32string const &re, literals lits, bool optimized=false, bool aggressive=false)
Definition: expression.cpp:609

◆ spawnFromString() [3/4]

expression::exptr reg::expression::spawnFromString ( std::u32string const &  re,
bool  optimized = false,
bool  aggressive = false 
)
static

Gives an RE encoded in a given string.

Parameters
rethe RE in text form
optimizedwhether simplifications on the syntax level should be applied
aggressivewhether the simplifications should check the semantic level
Returns
exptr to the RE represented by the given string
Exceptions
std::invalid_argumentif the re string is malformed

Definition at line 657 of file expression.cpp.

657  {
658  parser stringParser(re);
659  try {
660  return stringParser(optimized, aggressive);
661  } catch (std::bad_function_call e) {
662  throw std::invalid_argument("Malformed regular expression.");
663  }
664 }

◆ spawnFromString() [4/4]

expression::exptr reg::expression::spawnFromString ( std::string const &  utf8Re,
bool  optimized = false,
bool  aggressive = false 
)
static

Same as above for a UTF-8-encoded string.

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.

Definition at line 667 of file expression.cpp.

667  {
668  return spawnFromString(converter.from_bytes(utf8Re), optimized, aggressive);
669 }
std::wstring_convert< std::codecvt_utf8< char32_t >, char32_t > converter
Converts between UTF-8-encoded and UTF-32-encoded strings.
Definition: utils.cpp:4
static exptr spawnFromString(std::u32string const &re, literals lits, bool optimized=false, bool aggressive=false)
Definition: expression.cpp:609

◆ spawnKleene()

expression::exptr reg::expression::spawnKleene ( expression::exptr const &  b,
bool  optimized = true,
bool  aggressive = false 
)
static

Gives an RE representing the Kleene closure of a given RE.

More formally, the RE's language will be L(b*) = L(b)*.

Parameters
bexptr to the RE
optimizedwhether simplifications on the syntax level should be applied
aggressivewhether the simplifications should check the semantic level
Returns
exptr to the RE representing the Kleene closure of l

Definition at line 164 of file expression.cpp.

164  {
165  if (optimized) {
168  }
169  if (aggressive) {
170  if (*b == *expression::spawnEmptySet() || *b == *expression::spawnEmptyString()) {
172  }
173  if (*b == expression(b)) {
174  return b;
175  }
176  }
177  }
178  return expression::exptr(new expression(b));
179 }
static exptr const & spawnEmptyString()
Gives an RE representing the empty string ε.
Definition: expression.cpp:59
std::shared_ptr< expression const > exptr
This is the type used to handle regular expressions.
Definition: expression.h:40
static exptr const & spawnEmptySet()
Gives an RE representing the empty set ∅.
Definition: expression.cpp:50

◆ spawnSymbol() [1/2]

expression::exptr const & reg::expression::spawnSymbol ( char32_t  symbol)
static

Gives an RE representing the given UTF-32-encoded symbol.

More formally, the RE's language will be {<symbol>}.

Parameters
symbolthe symbol the RE should represent or "" for the empty string ε
Returns
exptr to the RE representing the symbol

Definition at line 69 of file expression.cpp.

69  {
70  return expression::symbols.insert(make_pair(symbol, exptr(new expression(symbol)))).first->second;
71 }
T make_pair(T... args)
std::shared_ptr< expression const > exptr
This is the type used to handle regular expressions.
Definition: expression.h:40

◆ spawnSymbol() [2/2]

expression::exptr const & reg::expression::spawnSymbol ( std::string const &  utf8Symbol)
static

Same as above for a UTF-8-encoded symbol.

This is an overloaded member function, provided for convenience. It differs from the above function only in what argument(s) it accepts.

Definition at line 74 of file expression.cpp.

74  {
75  char32_t u32Symbol(converter.from_bytes(utf8Symbol)[0]);
76  return expression::symbols.insert(make_pair(u32Symbol, exptr(new expression(u32Symbol)))).first->second;
77 }
std::wstring_convert< std::codecvt_utf8< char32_t >, char32_t > converter
Converts between UTF-8-encoded and UTF-32-encoded strings.
Definition: utils.cpp:4
T make_pair(T... args)
std::shared_ptr< expression const > exptr
This is the type used to handle regular expressions.
Definition: expression.h:40

◆ to_string()

string reg::expression::to_string ( ) const

Describes this RE in UTF-8-encoded human-readable form.

Definition at line 310 of file expression.cpp.

310  {
311  return converter.to_bytes(to_u32string());
312 }
std::wstring_convert< std::codecvt_utf8< char32_t >, char32_t > converter
Converts between UTF-8-encoded and UTF-32-encoded strings.
Definition: utils.cpp:4
std::u32string to_u32string() const
Describes this RE in UTF-32-encoded human-readable form.
Definition: expression.cpp:276

◆ to_u32string()

u32string reg::expression::to_u32string ( ) const

Describes this RE in UTF-32-encoded human-readable form.

Definition at line 276 of file expression.cpp.

276  {
277  switch (op) {
278  case operation::alternation : return subExpressions[0]->to_u32string() + A + subExpressions[1]->to_u32string();
279  case operation::concatenation : {
280  u32string concat;
281  if (subExpressions[0]->op >= operation::alternation) {
282  concat.append(L + subExpressions[0]->to_u32string() + R);
283  } else {
284  concat.append(subExpressions[0]->to_u32string());
285  }
286  if (subExpressions[1]->op >= operation::alternation) {
287  concat.append(L + subExpressions[1]->to_u32string() + R);
288  } else {
289  concat.append(subExpressions[1]->to_u32string());
290  }
291  return concat;
292  }
293  case operation::kleene : {
294  if (subExpressions[0]->op >= operation::concatenation) {
295  return (L + subExpressions[0]->to_u32string() + R) + K;
296  } else {
297  return subExpressions[0]->to_u32string() + K;
298  }
299  }
300  case operation::symbol : {
301  char32_t symbol = extractSymbol();
302  return u32string(1, symbol + (!symbol * E));
303  }
304  case operation::empty : return u32string(1, N);
305  default : return u32string();
306  }
307 }
static char32_t N
The symbol used to represent the Null/empty set in a regular expression.
Definition: expression.h:41
std::u32string to_u32string() const
Describes this RE in UTF-32-encoded human-readable form.
Definition: expression.cpp:276
char32_t extractSymbol() const
Reports this symbol expression's UTF-32-encoded symbol.
Definition: expression.cpp:249
static char32_t K
The symbol used to represent the Kleene star in a regular expression.
Definition: expression.h:41
static char32_t L
The symbol used to represent the Left parenthesis in a regular expression.
Definition: expression.h:41
static char32_t E
The symbol used to represent the Empty string in a regular expression.
Definition: expression.h:41
static char32_t R
The symbol used to represent the Right parenthesis in a regular expression.
Definition: expression.h:41
static char32_t A
The symbol used to represent the Alternation in a regular expression.
Definition: expression.h:41

Member Data Documentation

◆ A

char32_t reg::expression::A = U'+'
static

The symbol used to represent the Alternation in a regular expression.

Definition at line 41 of file expression.h.

◆ E

char32_t reg::expression::E = U'ε'
static

The symbol used to represent the Empty string in a regular expression.

Definition at line 41 of file expression.h.

◆ K

char32_t reg::expression::K = U'*'
static

The symbol used to represent the Kleene star in a regular expression.

Definition at line 41 of file expression.h.

◆ L

char32_t reg::expression::L = U'('
static

The symbol used to represent the Left parenthesis in a regular expression.

Definition at line 41 of file expression.h.

◆ N

char32_t reg::expression::N = U'∅'
static

The symbol used to represent the Null/empty set in a regular expression.

Definition at line 41 of file expression.h.

◆ R

char32_t reg::expression::R = U')'
static

The symbol used to represent the Right parenthesis in a regular expression.

Definition at line 41 of file expression.h.


The documentation for this class was generated from the following files: