使用Earley库进行带有特征和统一性的解析。

huangapple go评论59阅读模式
英文:

Using the Earley library to parse with features and unification

问题

The Earley parsing library is great for writing linguistic parsers in Haskell. CFGs can be specified in an intuitive way, and there is excellent support for backtracking and ambiguity. A simple example:

{-# LANGUAGE OverloadedStrings #-}

import Text.Earley

np = rule ("John" <|> "Mary")
vp = rule ("runs" <|> "walks")

sentence = do
  subj <- np
  pred <- vp
  return $ (++) <$> subj <*> pred

sentence can be used to parse ["John", "runs"] or ["Mary", "walks"], among other inputs.

It would be nice to be able to use Earley to write parsers for FCFGs, where nonterminals are complexes of a label and a feature bundle, and feature matching can happen via unification (for example, the Earley parser in NLTK parses FCFGs). However, it is not clear how to do this using Earley, or whether it can even be done. An example of something we might want in something like BNF:

np[sg] ::= "John" | "Mary"

np[?x] ::= det n[?x]
n[pl]  ::= "boys" | "girls"

det    ::= "the"

vp[sg] ::= "runs" | "walks"
vp[pl] ::= "run"  | "walk"

s ::= np[?x] vp[?x]

Under this FCFG, ["John", "runs"] is an s (since their number features match, as required by the s rule), and ["the", "boys", "walks"] isn't an s (since ["the", "boys"] parses to np[pl] and ["walks"] parses to vp[sg]).

One can in general rewrite an FCFG into an equivalent CFG, but this can be highly inconvenient, and result in a blowup of the grammar, especially when we have many possible features ranging over many possible values.

英文:

The Earley parsing library is great for writing linguistic parsers in Haskell. CFGs can be specified in an intuitive way, and there is excellent support for backtracking and ambiguity. A simple example:

{-# LANGUAGE OverloadedStrings #-}

import Text.Earley

np = rule (&quot;John&quot; &lt;|&gt; &quot;Mary&quot;)
vp = rule (&quot;runs&quot; &lt;|&gt; &quot;walks&quot;)

sentence = do
  subj &lt;- np
  pred &lt;- vp
  return $ (++) &lt;$&gt; subj &lt;*&gt; pred

sentence can be used to parse [&quot;John&quot;, &quot;runs&quot;] or [&quot;Mary&quot;, &quot;walks&quot;], among other inputs.

It would be nice to be able to use Earley to write parsers for FCFGs, where nonterminals are complexes of a label and a feature bundle, and feature matching can happen via unification (for example, the Earley parser in NLTK parses FCFGs). However, it is not clear how to do this using Earley, or whether it can even be done. An example of something we might want in something like BNF:

np[sg] ::= &quot;John&quot; | &quot;Mary&quot;

np[?x] ::= det n[?x]
n[pl]  ::= &quot;boys&quot; | &quot;girls&quot;

det    ::= &quot;the&quot;

vp[sg] ::= &quot;runs&quot; | &quot;walks&quot;
vp[pl] ::= &quot;run&quot;  | &quot;walk&quot;

s ::= np[?x] vp[?x]

Under this FCFG, [&quot;John&quot;, &quot;runs&quot;] is an s (since their number features match, as required by the s rule), and [&quot;the&quot;, &quot;boys&quot;, &quot;walks&quot;] isn't an s (since [&quot;the&quot;, &quot;boys&quot;] parses to np[pl] and [&quot;walks&quot;] parses to vp[sg]).

One can in general rewrite an FCFG into an equivalent CFG, but this can be highly inconvenient, and result in a blowup of the grammar, especially when we have many possible features ranging over many possible values.

答案1

得分: 2

以下是您提供的代码的翻译部分:

你实际上并没有进行任何特别有趣的统一操作,所以也许将自己的简单非确定性应用程序混合其中就足够了。标准的应用程序是 `[]`,但对于这种情况,甚至 `Maybe` 似乎足够了。就像这样:

```haskell
{-# Language OverloadedStrings #-}
{-# Language TypeApplications #-}

import Control.Applicative
import Control.Monad
import Data.Foldable
import Text.Earley

data Feature = SG | PL deriving (Eq, Ord, Read, Show)

(=:=) :: (Feature, a) -> (Feature, b) -> Maybe (a, b)
(fa, a) =:= (fb, b) = (a, b) <$ guard (fa == fb)

data NP = Name String | Determined String String deriving (Eq, Ord, Read, Show)

np :: Grammar r (Prod r e String (Feature, NP))
np = rule . asum $
	[ fmap (\name -> (SG, Name name)) ("John" <|> "Mary")
	, liftA2 (\det n -> (PL, Determined det n)) "the" ("boys" <|> "girls")
	]

vp :: Grammar r (Prod r e String (Feature, String))
vp = rule . asum $
	[ (,) SG <$> ("runs" <|> "walks")
	, (,) PL <$> ("run" <|> "walk")
	]

s :: Grammar r (Prod r e String (Maybe (NP, String)))
s = liftA2 (liftA2 (=:=)) np vp

test :: [String] -> IO ()
test = print . allParses @() (parser s)

请注意,我保留了原始的代码和注释,只翻译了代码本身。希望这对您有所帮助。

英文:

You're not actually doing any particularly interesting unification here, so perhaps it's enough to toss a very simple nondeterminism applicative of your own into the mix. The standard one is [], but for this case, even Maybe looks like enough. Like this:

{-# Language OverloadedStrings #-}
{-# Language TypeApplications #-}

import Control.Applicative
import Control.Monad
import Data.Foldable
import Text.Earley

data Feature = SG | PL deriving (Eq, Ord, Read, Show)

(=:=) :: (Feature, a) -&gt; (Feature, b) -&gt; Maybe (a, b)
(fa, a) =:= (fb, b) = (a, b) &lt;$ guard (fa == fb)

data NP = Name String | Determined String String deriving (Eq, Ord, Read, Show)

np :: Grammar r (Prod r e String (Feature, NP))
np = rule . asum $
	[ fmap (\name -&gt; (SG, Name name)) (&quot;John&quot; &lt;|&gt; &quot;Mary&quot;)
	, liftA2 (\det n -&gt; (PL, Determined det n)) &quot;the&quot; (&quot;boys&quot; &lt;|&gt; &quot;girls&quot;)
	]

vp :: Grammar r (Prod r e String (Feature, String))
vp = rule . asum $
	[ (,) SG &lt;$&gt; (&quot;runs&quot; &lt;|&gt; &quot;walks&quot;)
	, (,) PL &lt;$&gt; (&quot;run&quot; &lt;|&gt; &quot;walk&quot;)
	]

s :: Grammar r (Prod r e String (Maybe (NP, String)))
s = liftA2 (liftA2 (=:=)) np vp

test :: [String] -&gt; IO ()
test = print . allParses @() (parser s)

Try it out in ghci:

&gt; sequence_ [test (words n ++ [v]) | n &lt;- [&quot;John&quot;, &quot;the boys&quot;], v &lt;- [&quot;walks&quot;, &quot;walk&quot;]]
([(Just (Name &quot;John&quot;,&quot;walks&quot;),2)],Report {position = 2, expected = [], unconsumed = []})
([(Nothing,2)],Report {position = 2, expected = [], unconsumed = []})
([(Nothing,3)],Report {position = 3, expected = [], unconsumed = []})
([(Just (Determined &quot;the&quot; &quot;boys&quot;,&quot;walk&quot;),3)],Report {position = 3, expected = [], unconsumed = []})

So, the result needs a bit of interpretation -- a successful parse of Nothing really counts as a failed parse -- but perhaps that's not so bad? Not sure. Certainly it's unfortunate that you don't get to reuse Earley's error-reporting and nondeterminism machinery. Probably to get either thing, you'd have to fork Earley.

If you need to do real unification you could look into returning a IntBindingT t Identity instead of a Maybe, but at least until your features are themselves recursive this is probably enough and much, much simpler.

huangapple
  • 本文由 发表于 2023年2月18日 03:05:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/75488355.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定