英文:
Using the Earley library to parse with features and unification
问题
The Earley
parsing library is great for writing linguistic parsers in Haskell. CFGs can be specified in an intuitive way, and there is excellent support for backtracking and ambiguity. A simple example:
{-# LANGUAGE OverloadedStrings #-}
import Text.Earley
np = rule ("John" <|> "Mary")
vp = rule ("runs" <|> "walks")
sentence = do
subj <- np
pred <- vp
return $ (++) <$> subj <*> pred
sentence
can be used to parse ["John", "runs"] or ["Mary", "walks"], among other inputs.
It would be nice to be able to use Earley
to write parsers for FCFGs, where nonterminals are complexes of a label and a feature bundle, and feature matching can happen via unification (for example, the Earley parser in NLTK parses FCFGs). However, it is not clear how to do this using Earley
, or whether it can even be done. An example of something we might want in something like BNF:
np[sg] ::= "John" | "Mary"
np[?x] ::= det n[?x]
n[pl] ::= "boys" | "girls"
det ::= "the"
vp[sg] ::= "runs" | "walks"
vp[pl] ::= "run" | "walk"
s ::= np[?x] vp[?x]
Under this FCFG, ["John", "runs"] is an s
(since their number features match, as required by the s
rule), and ["the", "boys", "walks"] isn't an s
(since ["the", "boys"] parses to np[pl]
and ["walks"] parses to vp[sg]
).
One can in general rewrite an FCFG into an equivalent CFG, but this can be highly inconvenient, and result in a blowup of the grammar, especially when we have many possible features ranging over many possible values.
英文:
The Earley
parsing library is great for writing linguistic parsers in Haskell. CFGs can be specified in an intuitive way, and there is excellent support for backtracking and ambiguity. A simple example:
{-# LANGUAGE OverloadedStrings #-}
import Text.Earley
np = rule ("John" <|> "Mary")
vp = rule ("runs" <|> "walks")
sentence = do
subj <- np
pred <- vp
return $ (++) <$> subj <*> pred
sentence
can be used to parse ["John", "runs"]
or ["Mary", "walks"]
, among other inputs.
It would be nice to be able to use Earley
to write parsers for FCFGs, where nonterminals are complexes of a label and a feature bundle, and feature matching can happen via unification (for example, the Earley parser in NLTK parses FCFGs). However, it is not clear how to do this using Earley
, or whether it can even be done. An example of something we might want in something like BNF:
np[sg] ::= "John" | "Mary"
np[?x] ::= det n[?x]
n[pl] ::= "boys" | "girls"
det ::= "the"
vp[sg] ::= "runs" | "walks"
vp[pl] ::= "run" | "walk"
s ::= np[?x] vp[?x]
Under this FCFG, ["John", "runs"]
is an s
(since their number features match, as required by the s
rule), and ["the", "boys", "walks"]
isn't an s
(since ["the", "boys"]
parses to np[pl]
and ["walks"]
parses to vp[sg]
).
One can in general rewrite an FCFG into an equivalent CFG, but this can be highly inconvenient, and result in a blowup of the grammar, especially when we have many possible features ranging over many possible values.
答案1
得分: 2
以下是您提供的代码的翻译部分:
你实际上并没有进行任何特别有趣的统一操作,所以也许将自己的简单非确定性应用程序混合其中就足够了。标准的应用程序是 `[]`,但对于这种情况,甚至 `Maybe` 似乎足够了。就像这样:
```haskell
{-# Language OverloadedStrings #-}
{-# Language TypeApplications #-}
import Control.Applicative
import Control.Monad
import Data.Foldable
import Text.Earley
data Feature = SG | PL deriving (Eq, Ord, Read, Show)
(=:=) :: (Feature, a) -> (Feature, b) -> Maybe (a, b)
(fa, a) =:= (fb, b) = (a, b) <$ guard (fa == fb)
data NP = Name String | Determined String String deriving (Eq, Ord, Read, Show)
np :: Grammar r (Prod r e String (Feature, NP))
np = rule . asum $
[ fmap (\name -> (SG, Name name)) ("John" <|> "Mary")
, liftA2 (\det n -> (PL, Determined det n)) "the" ("boys" <|> "girls")
]
vp :: Grammar r (Prod r e String (Feature, String))
vp = rule . asum $
[ (,) SG <$> ("runs" <|> "walks")
, (,) PL <$> ("run" <|> "walk")
]
s :: Grammar r (Prod r e String (Maybe (NP, String)))
s = liftA2 (liftA2 (=:=)) np vp
test :: [String] -> IO ()
test = print . allParses @() (parser s)
请注意,我保留了原始的代码和注释,只翻译了代码本身。希望这对您有所帮助。
英文:
You're not actually doing any particularly interesting unification here, so perhaps it's enough to toss a very simple nondeterminism applicative of your own into the mix. The standard one is []
, but for this case, even Maybe
looks like enough. Like this:
{-# Language OverloadedStrings #-}
{-# Language TypeApplications #-}
import Control.Applicative
import Control.Monad
import Data.Foldable
import Text.Earley
data Feature = SG | PL deriving (Eq, Ord, Read, Show)
(=:=) :: (Feature, a) -> (Feature, b) -> Maybe (a, b)
(fa, a) =:= (fb, b) = (a, b) <$ guard (fa == fb)
data NP = Name String | Determined String String deriving (Eq, Ord, Read, Show)
np :: Grammar r (Prod r e String (Feature, NP))
np = rule . asum $
[ fmap (\name -> (SG, Name name)) ("John" <|> "Mary")
, liftA2 (\det n -> (PL, Determined det n)) "the" ("boys" <|> "girls")
]
vp :: Grammar r (Prod r e String (Feature, String))
vp = rule . asum $
[ (,) SG <$> ("runs" <|> "walks")
, (,) PL <$> ("run" <|> "walk")
]
s :: Grammar r (Prod r e String (Maybe (NP, String)))
s = liftA2 (liftA2 (=:=)) np vp
test :: [String] -> IO ()
test = print . allParses @() (parser s)
Try it out in ghci:
> sequence_ [test (words n ++ [v]) | n <- ["John", "the boys"], v <- ["walks", "walk"]]
([(Just (Name "John","walks"),2)],Report {position = 2, expected = [], unconsumed = []})
([(Nothing,2)],Report {position = 2, expected = [], unconsumed = []})
([(Nothing,3)],Report {position = 3, expected = [], unconsumed = []})
([(Just (Determined "the" "boys","walk"),3)],Report {position = 3, expected = [], unconsumed = []})
So, the result needs a bit of interpretation -- a successful parse of Nothing
really counts as a failed parse -- but perhaps that's not so bad? Not sure. Certainly it's unfortunate that you don't get to reuse Earley
's error-reporting and nondeterminism machinery. Probably to get either thing, you'd have to fork Earley
.
If you need to do real unification you could look into returning a IntBindingT t Identity
instead of a Maybe
, but at least until your features are themselves recursive this is probably enough and much, much simpler.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论