
huangapple go评论93阅读模式

Finding volume and item count using regular expressions



Coca cola(香草口味)12 x 330 mL

为了获取有关此产品的元数据,我编写了一个正则表达式。它将查找一个词边界,后面跟着一个1或2位数的数字,然后是空格、字符串 'x'、另一个空格,最后是1、2或3位数字:

  1. const filter = new RegExp(/\b\d{1,2}\sx\s\d{1,3}/);


  1. if (result.title.match(filter)) {
  2. result.itemCount = parseInt(result.title.match(/\d{1}\s/));
  3. result.itemVolume = parseInt(result.title.match(/\d{2,3}\s/));
  4. result.litreVolume = (result.itemCount * result.itemVolume) / 1000;
  5. result.pricePerLitre = +(result.price / result.litreVolume).toFixed(2);
  6. } else {
  7. result.itemCount = 1;
  8. result.itemVolume = parseInt(result.title.match(/\d{2,3}\s/));
  9. result.litreVolume = result.itemVolume / 1000;
  10. result.pricePerLitre = +(result.price / result.litreVolume).toFixed(2);
  11. }


  • 数量为NaN,这可能与一些标题包含更多数字有关(例如Coca Cola (4-Way) 12 x 330 mL
  • 体积为无穷大
  • 每升价格太高



I am currently building a JavaScript web scraper for a grocery store that processes a title of a product and then returns the item count, volume and price per litre of a product. Most of the product titles look something like this:

Coca cola (vanilla flavour) 12 x 330 mL

In order to obtain meta data about this product, I have written a Regular Expression. It will look for look for a word boundary followed by a 1 or 2 digit number, whitespace, the string 'x', another whitespace and finally a 1, 2 or 3 digit number:

  1. const filter = new RegExp(/\b\d{1,2}\sx\s\d{1,3}/);

I then test each result for a match with the Regular Expression and then calculate the item count, item volume, volume in litres and then the price per litre.

  1. if (result.title.match(filter)) {
  2. result.itemCount = parseInt(result.title.match(/\d{1}\s/));
  3. result.itemVolume = parseInt(result.title.match(/\d{2,3}\s/));
  4. result.litreVolume = (result.itemCount * result.itemVolume) / 1000;
  5. result.pricePerLitre = +(result.price / result.litreVolume).toFixed(2);
  6. } else {
  7. result.itemCount = 1;
  8. result.itemVolume = parseInt(result.title.match(/\d{2,3}\s/));
  9. result.litreVolume = result.itemVolume / 1000;
  10. result.pricePerLitre = +(result.price / result.litreVolume).toFixed(2);
  11. }

90% of the results look good, but sometimes I get unexpected results. For example:

  • an item count of NaN, which may have to do with the fact that some titles contain several more numbers (Coca Cola (4-Way) 12 x 330 mL))
  • a volume of Infinity
  • a price per litre that is way too high

Clearly I am doing something wrong with my approach to calculating the desired meta data. What would be a better way of doing calculations with RegEx? Am I missing something that would make my calculations less prone to errors?


得分: 1




  1. // 创建一个Pattern对象
  2. Pattern r = Pattern.compile(pattern);
  3. // 现在创建匹配器对象。
  4. Matcher m = r.matcher(line);
  5. if (m.find()) {
  6. System.out.println("Found value: " + m.group(0));
  7. System.out.println("Found value: " + m.group(1));
  8. }



If i understand correctly filter \b\d{1,2}\sx\s\d{1,3} works, but your sub filters do not (\d{1}\s)...

I only used to using regex in c# but, i saw you could use groups in java also.
change your pattern to (\b\d{1,2})\sx\s(\d{1,3}). When you put brackets in your regex, that part becomes a group that you can acces afterwards.

As i said, i haven't used java in a few years, but i picked this code snippet from the web. It shows how to use groups in java. As pattern you should use the (\b\d{1,2})\sx\s(\d{1,3}). If it is the same as in c# group(0) is the whole result, group(1) is your first actual group, group(2) is the second.

  1. // Create a Pattern object
  2. Pattern r = Pattern.compile(pattern);
  3. // Now create matcher object.
  4. Matcher m = r.matcher(line);
  5. if (m.find( )) {
  6. System.out.println("Found value: " + m.group(0) );
  7. System.out.println("Found value: " + m.group(1) );
  8. }

I think you can write it with less code than stated above, but you get the picture “使用正则表达式查找卷和项数”

  • 本文由 发表于 2020年1月6日 17:48:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/59609861.html



:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:
